{"id":1199,"date":"2016-01-13T08:26:46","date_gmt":"2016-01-13T14:26:46","guid":{"rendered":"http:\/\/www.designandexecute.com\/designs\/?p=1199"},"modified":"2021-03-27T23:20:44","modified_gmt":"2021-03-28T05:20:44","slug":"what-is-big-data","status":"publish","type":"post","link":"https:\/\/www.designandexecute.com\/designs\/what-is-big-data\/","title":{"rendered":"What is Big Data and How Can your Organization Make use of It?"},"content":{"rendered":"<p><span data-preserver-spaces=\"true\">The term big data is so common now at the time of this posting in 2015 that it deserves some mention in the data warehouse space, but where did it come from, and where should it fit in your technology infrastructure?<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">The personal computer and birth of networking via Local Area Networks (LAN) have allowed corporate enterprises to sprout up and its transaction data to dominated the early market. Enterprises and their transactional ERP systems or custom relational applications gave rise to relational data marts and data warehouses that dominated the 1990\u2019s and 2000\u2019s<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">2007-2008 brought the era of web 2.0 \u2013 this is significant since before the static web-only shared information on a one-way basis pushing info toward the user. Users typically could not share their responses or information back on the websites but only via forms. Web 2.0 gave birth to technology for social media and companies like Facebook and myspace.\u00a0<\/span><a class=\"editor-rtfLink\" href=\"http:\/\/www.designandexecute.com\/designs\/the-value-of-content\/\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-preserver-spaces=\"true\">\u00a0User information<\/span><\/a><span data-preserver-spaces=\"true\">\u00a0was now rampant and growing exponentially on the web for everyone to digest. This looks very similar to enterprise data, but there were some differences.<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">The first is the\u00a0<\/span><strong><span data-preserver-spaces=\"true\">variety<\/span><\/strong><span data-preserver-spaces=\"true\">\u00a0of the data types, which was much more varied, like structured\/unstructured text, video, and photos. Secondly, the\u00a0<\/span><strong><span data-preserver-spaces=\"true\">volume<\/span><\/strong><span data-preserver-spaces=\"true\">\u00a0of data was much more since millions of users registered to these systems compared to thousands in an enterprise, and usage was much higher<\/span><strong><span data-preserver-spaces=\"true\">. The velocity<\/span><\/strong><span data-preserver-spaces=\"true\">\u00a0at which the data is being captured could no longer be satisfied by traditional relational databases.<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Let see why the traditional ways to process these large volumes of data overpowered our current systems. We have data on the disc, Memory, aka RAM, and CPU to understand data. As the data got bigger, the first approach is to scale the hardware vertically to get more RAM and CPU power. This solution quickly hit a wall because we have physical limitations on how powerful we can make one computer.<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Google\u2019s paper on GFS and using a concept called map\/reduce was a stab to tackle the new growing issue with the current limitations. The basis was to use commodity hardware and scale outward instead of upward and take the data to the computing power instead of pushing all the data to a common CPU and RAM. The map-reduce process would then distribute and aggregate the results.<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">This spawned the thinking to get this revolution sparked to\u00a0<\/span><strong><em><span data-preserver-spaces=\"true\">capture, store and process these volumes of data<\/span><\/em><\/strong><span data-preserver-spaces=\"true\">. These 3 main factors of variety, volume, and velocity and this new way of thinking culminated in BIG DATA. This phenomenon now demanded a plethora of new technologies to guide this new approach and its needs.<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">This was the driver that gave birth to Hadoop (based on Hadoop Distributed File System aka HDFS\/map\/sort and shuffle then reduce) by team yahoo and, in turn, a variety of NoSQL Databases. These databases fall into 4 main categories depending on how the data is stored, and these are as follows:<\/span><\/p>\n<ol>\n<li><span data-preserver-spaces=\"true\">Graph<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">Document<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">Key-Value Pairs<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">Column Store<\/span><\/li>\n<\/ol>\n<h3>BIG DATA TOOLS<\/h3>\n<p><strong><span data-preserver-spaces=\"true\">Here is some of the ecosystem of new technology to manage these big data demands.<\/span><\/strong><\/p>\n<p><strong><span data-preserver-spaces=\"true\">STORE AND READ<\/span><\/strong><\/p>\n<p><strong><span data-preserver-spaces=\"true\">Hadoop<\/span><\/strong><span data-preserver-spaces=\"true\">\u00a0&#8211; an open-source implementation of this storage system using HDFS and map\/reduce for processing<\/span><\/p>\n<p><strong><span data-preserver-spaces=\"true\">Hive<\/span><\/strong><span data-preserver-spaces=\"true\">\u00a0&#8211; SQL-like query for Hadoop since it is, in fact, a NoSQL database. this allows us to capitalize on workforce skillsets<\/span><\/p>\n<p><strong><span data-preserver-spaces=\"true\">Apache Drill<\/span><\/strong><span data-preserver-spaces=\"true\">\u00a0is like google Dremel\/Dremel query language (DrQL), which powers google big data. It is the front end to query, plan, execute and store data. It allows nested document queries which are quite powerful. Its strength is getting to nested data.<\/span><\/p>\n<p><strong><span data-preserver-spaces=\"true\">Spark\u00a0<\/span><\/strong><span data-preserver-spaces=\"true\">looks like Hadoop architecture of cluster manager with worker nodes but with the added advantage of in-memory processing. Spark can be viewed as the next generation of Big Data tools as the spark exceeds Hadoop&#8217;s benchmarks using fewer machines and CPU processing power.<\/span><\/p>\n<p><strong><span data-preserver-spaces=\"true\">Shark<\/span><\/strong><span data-preserver-spaces=\"true\">\u00a0is to Spark what Hive is to Hadoop. Instead of map\/reduce to interface with HDFS, the fundamental difference is the Spark execution engine.<\/span><\/p>\n<p><strong><span data-preserver-spaces=\"true\">Presto<\/span><\/strong><span data-preserver-spaces=\"true\">\u00a0from Facebook is similar to the Hadoop architecture sitting on HDFS, and they all allow real-time querying of BIG DATA.<\/span><\/p>\n<p><strong><span data-preserver-spaces=\"true\">PROCESS using streams:\u00a0<\/span><\/strong><span data-preserver-spaces=\"true\">you may want to look at my thoughts on\u00a0<\/span><a class=\"editor-rtfLink\" href=\"http:\/\/www.designandexecute.com\/designs\/java-8-the-power-of-streams\/\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-preserver-spaces=\"true\">stream processing<\/span><\/a><span data-preserver-spaces=\"true\">, i.e., continuous and as soon as it enters the system. This is very different from the typical polling techniques that are traditionally used by many application architectures.<\/span><\/p>\n<p><strong><span data-preserver-spaces=\"true\">Apache Storm, owned by Twitter<\/span><\/strong><span data-preserver-spaces=\"true\">&#8211; Nimbus\/ Zookeeper\/Supervisor, is the same 3 level architecture to manage worker nodes. The key concepts are\u00a0<\/span><strong><span data-preserver-spaces=\"true\">Tuples<\/span><\/strong><span data-preserver-spaces=\"true\">\u00a0&#8211; ordered list of elements,\u00a0<\/span><strong><span data-preserver-spaces=\"true\">Streams<\/span><\/strong><span data-preserver-spaces=\"true\">\u00a0are an unbounded sequence of tuples,\u00a0<\/span><strong><span data-preserver-spaces=\"true\">Spout<\/span><\/strong><span data-preserver-spaces=\"true\">\u00a0are sources of streams in a computation.\u00a0<\/span><strong><span data-preserver-spaces=\"true\">Bolts<\/span><\/strong><span data-preserver-spaces=\"true\">\u00a0process input streams and produce output streams. They can run functions: filters: aggregate or join data or talk to databases.\u00a0<\/span><strong><span data-preserver-spaces=\"true\">Typologies<\/span><\/strong><span data-preserver-spaces=\"true\">\u00a0are the overall calculation representing visually as a network of spouts and bolts.<\/span><\/p>\n<h3><strong><span data-preserver-spaces=\"true\">How can you use BIG Data in your organization?<\/span><\/strong><\/h3>\n<p><span data-preserver-spaces=\"true\">The data owner must merge user interactions, demographic, geographic, psycho-graphic historical buying info, and behavioral profile information to make powerful insight and customer segments. The utopian dream of the business intelligence world is to reach a segment of 1, but in the meantime, we continue to make smaller and smaller segments to make offerings more custom. Sentimental analysis is then added over these growing and vast merged data sets. These transformed sets make this augmented data very powerful, but there is also the challenge to navigate the data and find the patterns. Clearly, the main benefit of this organized data is\u00a0<\/span><a class=\"editor-rtfLink\" href=\"http:\/\/www.designandexecute.com\/designs\/why-invest-in-predictive-models\/\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-preserver-spaces=\"true\">TARGETING<\/span><\/a><span data-preserver-spaces=\"true\">. We can quickly target the perfect customer and make a compelling strategy that will be impossible to ignore. Check out my article on &#8220;<\/span><a class=\"editor-rtfLink\" href=\"http:\/\/www.designandexecute.com\/designs\/why-invest-in-predictive-models\/\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-preserver-spaces=\"true\">why invest in predictive models<\/span><\/a><span data-preserver-spaces=\"true\">&#8221; for more on this. The main challenge has been putting the unstructured data and structured ERP data together, especially in real-time. It remains the challenge of this 2010 decade and beyond.<\/span><\/p>\n<ol>\n<li><span data-preserver-spaces=\"true\">Determine how much data you have, how many different data sources, and the grain and content that can be merged?<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">What new insight patterns can be gained by putting this data together?<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">Are there predictive models that you would like to calculate over this big data?<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">pick a NoSQL database that matches your needs<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">Set up the infrastructure<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">Merge all your data sets<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">Develop case studies of how you want to segment your customer profiles<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">Pull the matching qualifying rows<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">Create actionable strategies on these newly targeted data points<\/span><\/li>\n<\/ol>\n<p><span data-preserver-spaces=\"true\">\u00a0<\/span><\/p>\n<h3><span data-preserver-spaces=\"true\">Case Study: Here is an example of how Abercrombie (A&amp;F) makes me a loyal buyer<\/span><\/h3>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li class=\"ql-indent-1\"><strong><span data-preserver-spaces=\"true\">User interactions: <\/span><\/strong>A<span data-preserver-spaces=\"true\">&amp;<\/span>F <span data-preserver-spaces=\"true\">send daily email campaigns: read emails and click links let them know my interests.<\/span><\/li>\n<li class=\"ql-indent-1\"><span data-preserver-spaces=\"true\">Visit the A&amp;F website: time spent on site, entry pages, navigated path\/pages, exit page, sales funnel drop off, they study my navigation patterns, and it warms me to new products and allows me to know what a bargain price is.<\/span><\/li>\n<\/ul>\n<\/li>\n<li><strong><span data-preserver-spaces=\"true\">Demographic: they have my demographic info<\/span><\/strong>\n<ul>\n<li class=\"ql-indent-1\"><span data-preserver-spaces=\"true\">my Age<\/span><\/li>\n<li class=\"ql-indent-1\"><span data-preserver-spaces=\"true\">gender<\/span><\/li>\n<li class=\"ql-indent-1\"><span data-preserver-spaces=\"true\">marital status<\/span><\/li>\n<li class=\"ql-indent-1\"><span data-preserver-spaces=\"true\">possible occupation?<\/span><\/li>\n<li class=\"ql-indent-1\"><span data-preserver-spaces=\"true\">so they can derive my buying power<\/span><\/li>\n<\/ul>\n<\/li>\n<li><strong><span data-preserver-spaces=\"true\">Geographic: they know where I buy<\/span><\/strong>\n<ul>\n<li class=\"ql-indent-1\"><span data-preserver-spaces=\"true\">Region: North East, State<\/span><\/li>\n<li class=\"ql-indent-1\"><span data-preserver-spaces=\"true\">zip code: city dweller<\/span><\/li>\n<li class=\"ql-indent-1\"><strong><span data-preserver-spaces=\"true\">Psycho-graphic buying history based<\/span><\/strong><span data-preserver-spaces=\"true\">\u00a0on past SKU I bought: edgy youthful<\/span><\/li>\n<li class=\"ql-indent-1\"><span data-preserver-spaces=\"true\">they have an idea of my style: trendy<\/span><\/li>\n<li class=\"ql-indent-1\"><span data-preserver-spaces=\"true\">based on scraping social network sites: High social profile<\/span><\/li>\n<li class=\"ql-indent-1\"><strong><span data-preserver-spaces=\"true\">The behavioral profile\u00a0<\/span><\/strong><span data-preserver-spaces=\"true\">they know my brand loyalty with A&amp;F: buys this brand consistently<\/span><\/li>\n<li class=\"ql-indent-1\"><span data-preserver-spaces=\"true\">price point: buys heavy discounts or clearance<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span data-preserver-spaces=\"true\">Strategies Prescribed<\/span><\/h3>\n<ul>\n<li><span data-preserver-spaces=\"true\">Raise awareness of new products in my wheelhouse, same great brand but new fit and style<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">adopt edgy, youthful styles in my price range<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">buy TV time in the NE cities<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">Drive it on social media<\/span><\/li>\n<li><span data-preserver-spaces=\"true\">Drive direct contact via emails<\/span><\/li>\n<\/ul>\n<p>\/\/thoughts to develop<\/p>\n<p>Floom to listen &gt; Amazon with Storm topology for processing &gt; stored on Amazon S3<\/p>\n<p>Esper &#8211; query like<\/p>\n<p>Spark Streaming<\/p>\n<p>Apache S4<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The term big data is so common now at the time of this posting in 2015 that it deserves some mention in the data warehouse space, but where did it come from, and where should it fit in your technology infrastructure? The personal computer and birth of networking via Local Area Networks (LAN) have allowed [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":2314,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[31,19],"tags":[],"class_list":["post-1199","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bi-data-warehouse","category-digital-online-marketing"],"jetpack_featured_media_url":"https:\/\/www.designandexecute.com\/designs\/wp-content\/uploads\/2016\/01\/big-data.jpg","_links":{"self":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts\/1199","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/comments?post=1199"}],"version-history":[{"count":5,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts\/1199\/revisions"}],"predecessor-version":[{"id":11314,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts\/1199\/revisions\/11314"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/media\/2314"}],"wp:attachment":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/media?parent=1199"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/categories?post=1199"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/tags?post=1199"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}