{"id":15690,"date":"2022-09-14T09:34:48","date_gmt":"2022-09-14T15:34:48","guid":{"rendered":"http:\/\/www.designandexecute.com\/designs\/?p=15690"},"modified":"2022-09-14T16:06:52","modified_gmt":"2022-09-14T22:06:52","slug":"five-most-significant-issues-with-data-pipelines","status":"publish","type":"post","link":"https:\/\/www.designandexecute.com\/designs\/five-most-significant-issues-with-data-pipelines\/","title":{"rendered":"Five Most Significant Issues With Data Pipelines"},"content":{"rendered":"\n<ol class=\"wp-block-list\"><li><strong>Not testing your pipelines<\/strong>, get <a rel=\"noreferrer noopener\" aria-label=\"test cases for at least 90% coverage (opens in a new tab)\" href=\"https:\/\/www.designandexecute.com\/designs\/testing-data-validation\/\" target=\"_blank\">test cases for at least 90% coverage<\/a> <\/li><li><strong>Optimize for the wrong metric<\/strong>.  Are you optimizing cost or performance? If you optimize for performance your primary measure cannot be &#8220;cost.&#8221; Four standard metrics for any Data pipeline: <ol><li><em>Data quality metrics<\/em> to reduce data loss, increase accuracy and usability <\/li><li><em>Speed <\/em>of pipeline<\/li><li><em>Data Recovery time and Pipeline Health<\/em>.  The pipeline overhead to maintain pipelines. <\/li><li><em>Cost <\/em>to process each  Pipeline batch,  <\/li><\/ol><\/li><li><strong>Not having the correct controls<\/strong>, eg, mistakenly dropping the entire lake.  Managing failures in a run <\/li><li><strong>Not incrementally processing data<\/strong>. Re-running historical batches &#8211; reattribution pipeline fixes some data in the past for this measurement period run.  Parallel processing pipelines, after which you can do a swap on the production data for the reattribution fixes after validations. <\/li><li><strong>Not planning for the future<\/strong> and its growth.  Reading useless records can hurt performance over time.<\/li><\/ol>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p> The hardest thing about performance is knowing what you need to measure, that must be tied back to your mission statement <\/p><cite> \u2014 Peter Drucker  <\/cite><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Not testing your pipelines, get test cases for at least 90% coverage Optimize for the wrong metric. Are you optimizing cost or performance? If you optimize for performance your primary measure cannot be &#8220;cost.&#8221; Four standard metrics for any Data pipeline: Data quality metrics to reduce data loss, increase accuracy and usability Speed of pipeline [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":15692,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[31],"tags":[],"class_list":["post-15690","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bi-data-warehouse"],"jetpack_featured_media_url":"https:\/\/www.designandexecute.com\/designs\/wp-content\/uploads\/2022\/09\/data_pipelinei.jpg","_links":{"self":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts\/15690","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/comments?post=15690"}],"version-history":[{"count":4,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts\/15690\/revisions"}],"predecessor-version":[{"id":15706,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts\/15690\/revisions\/15706"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/media\/15692"}],"wp:attachment":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/media?parent=15690"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/categories?post=15690"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/tags?post=15690"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}