{"id":2176,"date":"2017-02-07T20:50:44","date_gmt":"2017-02-08T02:50:44","guid":{"rendered":"http:\/\/www.designandexecute.com\/designs\/?p=2176"},"modified":"2020-12-04T15:29:20","modified_gmt":"2020-12-04T21:29:20","slug":"sql-performance-tuning-and-rules-of-thumb-secrets-of-the-masters-revealed","status":"publish","type":"post","link":"https:\/\/www.designandexecute.com\/designs\/sql-performance-tuning-and-rules-of-thumb-secrets-of-the-masters-revealed\/","title":{"rendered":"SQL Performance Tuning and Rules of Thumb, Secrets of the Masters Revealed"},"content":{"rendered":"<p><span data-preserver-spaces=\"true\">I have interviewed hundreds of developers over the years, and it is incredible how topical is the knowledge that developers know about this essential critical skill called <\/span><strong><span data-preserver-spaces=\"true\">SQL.\u00a0<\/span><\/strong><span data-preserver-spaces=\"true\">It is the one skill that every full stack developer must learn and understand.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Transactional systems are not going away even though there is a lot of talk about big data and noSQL databases. Relational Database Management Systems (RDMS) will remain at the heart of transactions. RDMS fit the CAP theorem where you only get 2 of the three and, in this case, Consistency across reading and Performance.<\/span><\/p>\n<ul>\n<li><a class=\"editor-rtfLink\" href=\"https:\/\/en.wikipedia.org\/wiki\/Consistency_model\" target=\"_blank\" rel=\"noopener noreferrer\"><em><span data-preserver-spaces=\"true\">Consistency<\/span><\/em><\/a><span data-preserver-spaces=\"true\">: Every read receives the most recent write or an error<\/span><\/li>\n<li><a class=\"editor-rtfLink\" href=\"https:\/\/en.wikipedia.org\/wiki\/Availability\" target=\"_blank\" rel=\"noopener noreferrer\"><em><span data-preserver-spaces=\"true\">Availability<\/span><\/em><\/a><span data-preserver-spaces=\"true\">: Every request receives a (non-error) response, without the guarantee that it contains the most recent write<\/span><\/li>\n<li><a class=\"editor-rtfLink\" href=\"https:\/\/en.wikipedia.org\/wiki\/Network_partitioning\" target=\"_blank\" rel=\"noopener noreferrer\"><em><span data-preserver-spaces=\"true\">Partition tolerance<\/span><\/em><\/a><span data-preserver-spaces=\"true\">: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes<\/span><\/li>\n<\/ul>\n<p><span data-preserver-spaces=\"true\">RDMS drives many mission-critical ACID (Atomicity, Consistency, Isolation, Durability) database transactions. When you master SQL, you can dive into analytics and data warehousing, where Performance dealing with millions of rows is expected. Schema databases and SQL still drives analytics at the core. The objective is to squeeze a few more transactions per minute, to get data back to the user ASAP in this era of \u201cI want it now.\u201d\u00a0<\/span><\/p>\n<p>I am IBM DB2 certified and first developed many of my tricks geared toward the DB2 platform. Many of the points below hold on to many databases. I have translated the same theories to Sybase, Oracle, and MySQL. With that said, let us start exploring.<\/p>\n<h2>SELECT Clause<\/h2>\n<ul>\n<li><strong>Select only the columns needed by your program<\/strong>. Do not use SELECT *. \u00a0There are 2 main reasons, you are making the database force more network traffic to your application and you now have to use more CPU to move each column into your program.<\/li>\n<li><strong>AVOID USING SELECT DISTINCT\u00a0<\/strong>without a supporting index unless the number of qualifying rows is less than 5000. \u00a0The reason being you will be sorting all the qualifying rows and this can be expensive as the data set grows.<\/li>\n<li><strong>Use the AS feature<\/strong> when selecting derived columns. The reason is documentation.<\/li>\n<li><strong>Use the COALESECE feature<\/strong> to avoid null indicators variables for columns allowing nulls and results in sets that can be null. \u00a0SUM (COL1) is null if null rows qualify. \u00a0Here are facts from\u00a0the one IBM Gold Consultant and my mentor Richard Yevich (RIP). \u00a0It reduces the code to maintain and improves access paths if used in predicates.<\/li>\n<li><strong>Use CASE over DECODE <\/strong>because it is more meaningful and more feature rich.\u00a0 CASE complies with ANSI SQL. DECODE is proprietary to Oracle.\u00a0 CASE is a statement where as DECODE is a function and CASE is used in relational operators where as DECODE is used in equality operators.\u00a0 CASE can be used in a where clause but you cant use DECODE in the where clause.\u00a0 DECODE works with expressions that are scalar values only. CASE can work with predicates and subqueries in searchable form.\u00a0 <b>CASE expects datatype consistency, DECODE does not and CASE manages nulls in a special way.<\/b><\/li>\n<\/ul>\n<p><span class=\"pln\">SQL<\/span><span class=\"pun\">&gt;<\/span> <span class=\"kwd\">select<\/span> <span class=\"kwd\">case<\/span><\/p>\n<p><span class=\"lit\">2<\/span> <span class=\"kwd\">when<\/span> <span class=\"kwd\">null<\/span> <span class=\"kwd\">is<\/span> <span class=\"kwd\">null<\/span><\/p>\n<p><span class=\"lit\">3<\/span> <span class=\"kwd\">then<\/span> <span class=\"str\">&#8216;NULL&#8217;<\/span><\/p>\n<p><span class=\"lit\">4<\/span> <span class=\"kwd\">else<\/span> <span class=\"str\">&#8216;NOT NULL&#8217;<\/span><\/p>\n<p><span class=\"lit\">5<\/span> <span class=\"kwd\">end<\/span><span class=\"pln\"> null_test<\/span><\/p>\n<p><span class=\"lit\">6<\/span><span class=\"pun\">*<\/span> <span class=\"kwd\">from<\/span><span class=\"pln\"> dual<\/span><\/p>\n<p><span class=\"pln\">SQL<\/span><span class=\"pun\">&gt;<\/span> <span class=\"pun\">\/<\/span><\/p>\n<h2>FROM Clause<\/h2>\n<ul>\n<li><strong>USE TABLE EXPRESSION feature over VIEWS<\/strong>. The reason is for documentation and performance analysis.<\/li>\n<\/ul>\n<h2>WHERE Clause<\/h2>\n<ul>\n<li><strong>Use WHERE Clause to select the rows needed. \u00a0<\/strong>There are 2 main reasons, you are making the database force more network traffic to your application and you now have to use several times more CPU to process the result set.<\/li>\n<li><strong>Use &lt;&gt; rather than NOT EQUAL.<\/strong> The reason is &lt;&gt; is universally accepted.<\/li>\n<li><strong>Only use Literal in a SQL statement when it never changes. \u00a0<\/strong>If it changes then use a host variable to the move the literal to the host variable in the host language prior to the SQL statement. The reason is the program will be easier to maintain when the variable changes.<\/li>\n<li><strong>Avoid arithmetic expression (including concatenation) in predicates where there are no other stage 1 predicates.<\/strong> \u00a0Instead, store expression in host variable and use the host variable in the WHERE statement. \u00a0The Reason is to make the predicate stage 1. \u00a0Complex predicates are Stage 2. \u00a0If they only further filter then they are great however if they are the only predicate index\/table scans are the only access paths possible.<\/li>\n<li><strong>Avoid negative logic if possible. \u00a0<\/strong>The reason is that positive logic is much easier to comprehend.<\/li>\n<li><strong>If more than 25% of the rows will be returned, induce a tablespace scan<\/strong> by using the +0 or CONCAT &#8216; &#8216; techniques. The reason is that due to the prefetching, the database can often scan the entrie table faster than reading 25% of the index pages + 25% of the data pages.<\/li>\n<\/ul>\n<h2>JOINS<\/h2>\n<ul>\n<li><strong>Ensure the use of a join predicate to avoid a Cartesian product<\/strong> of rows returned. \u00a0Reason: Missing Join predicates cost CPU and it might not not give the expected results.<\/li>\n<li><strong>Use Joins instead of subselects. \u00a0<\/strong>The reason is that the database join operation algorithms are always faster.<\/li>\n<li>Do not code RIGHT OUTER JOINS. \u00a0Switch the order of the tables in the from clause and code LEFT OUTER JOINS. \u00a0This will reduce the confusion with EXPLAIN output (only displays Left and Full joins)<\/li>\n<li><strong>Do not code INNER JOINTS with the ON clauses. \u00a0<\/strong>Older version of many databases process the where clauses after the join is complete unless table expressions are used. \u00a0Stay with the &#8220;,&#8221; and no ON clause so that normal join algorithms are used.<\/li>\n<\/ul>\n<h2>ORDER BY\/GROUP BY<\/h2>\n<ul>\n<li><strong>Use the column name in the ORDER BY and GROUP BY clauses.\u00a0<\/strong>Do not use the column relative number from the SELECT clause. \u00a0Reason is documentation and ease of reading code.<\/li>\n<li><strong>Do not ORDER BY a column that has only one value in the return rows. \u00a0<\/strong>There is no need to use CPU and I\/O to sort something that does not need sorting.<\/li>\n<\/ul>\n<h2>UNION<\/h2>\n<ul>\n<li><strong>use UNION ALL when duplicates do not exist<\/strong> or when the number of duplicates is small and can be eliminated in the program of if duplicates are needed. \u00a0Reason is to eliminate the sort.<\/li>\n<\/ul>\n<h2>SUBQUERIES<\/h2>\n<ul>\n<li><strong>Use Select * for correlated subqueries when using EXISTS or NOT EXISTS.<\/strong> The reason is that no rows are returned, only a flag set to <em>true<\/em> or <em>false<\/em><\/li>\n<li><strong>Avoid using IN phrase. \u00a0Use JOIN or EXISTS in a correlated subquery. \u00a0<\/strong>The reason is to avoid creating and sorting a work file.<\/li>\n<\/ul>\n<h3>EXISTANCE CHECKING TECHNIQUES<\/h3>\n<p><strong>Do not use SELECT COUNT(*) to check for existance.<\/strong> \u00a0Reason is it cost CPU and I\/O to count all the rows when the actual count is not needed. \u00a0While COUNT(*) scans the entire table, counting up all entries matching your condition, EXISTS() will exit as soon as it sees the result it needs.<\/p>\n<blockquote><p>SELECT columns needed<\/p>\n<p>FROM tables needed<\/p>\n<p>WHERE EXISTS<\/p>\n<p>( SELECT * FROM table WHERE condition)<\/p><\/blockquote>\n<p>correlate the subquery if the checking should be done for each row. \u00a0Leave it non correlated if the checking only needs to be done once.<\/p>\n<h2>CURSOR PROCESSING<\/h2>\n<ul>\n<li><strong>Close cursor when the last row is processed rather than at the end of the program unless<\/strong> it is the last action of the program. \u00a0If no more processing happens after the cursor then do not close the cursor but let the program close the cursor to release the locks sooner and improve concurrency.<\/li>\n<li><strong>Retain cursor positioning after Commit processing<\/strong> to reduce program code by eliminating the need for browsing logic.<\/li>\n<li><strong>If cursors are use to drive updates use FOR UPDATE or WHERE CURRENT OF CURSOR to create row level processing. \u00a0<\/strong>If updates have to be ordered switch to WHERE KEY =:HVKEY. The reason FOR UPDATE OF and ORDER BY are mutually exclusive.<\/li>\n<li><strong>Always check the row counter or timestamp in the UPDATE statement to warn users if the data has changed.<\/strong> \u00a0The checking the counter or time stamp in a SELECT statement allows data integrity problems to slip in.<\/li>\n<\/ul>\n<h2>GENERAL TECHNIQUES<\/h2>\n<ul>\n<li><strong>Build small tables,<\/strong> such as those used for code definition, in memory or application tables if they are used repetitively. \u00a0The reason is that is save CPU and I\/O by having the database read it only once.<\/li>\n<li><strong>Ensure restart logic is coded for applications requiring high availability.<\/strong><\/li>\n<li><strong>For SQL used in applications requiring high concurrency, the SQL should be retried a few times<\/strong> before sending an error message. \u00a05 times is a good start.<\/li>\n<li><strong>Do not start coding SQL with out knowing all the indexes available on the table. \u00a0<\/strong>Reason is that all possible index-able local and join predicates should be coded to ensure index usage.<\/li>\n<li>Run an EXPLAIN on all SQL statements built prior to the release from development. Granted that production and development can have different plans depending on the data. \u00a0Analyze results. \u00a0The reason is that you can steer the optimizer in the right direction using the two tuning techniques of (OPTIMIZE FOR n ROWS and +0, CONCAT &#8216; &#8216;)<\/li>\n<\/ul>\n<h2>ADHOC QUERIES<\/h2>\n<p><strong>Views can be used for these requests. \u00a0<\/strong>The reasons is that in insulates the user from tables changes over time. \u00a0It can hide SQL complexity, enforce security or pre-join tables.<\/p>\n<p>Stephen Choo Quan<\/p>\n<p>Stephen is a double threat holding both SAP business objects certified architect as well as being a certified IBM DB2 Database Developer.<a href=\"http:\/\/www.designandexecute.com\/designs\/about-the-author-stephen-choo-quan\/\"> read More&#8230;<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have interviewed hundreds of developers over the years, and it is incredible how topical is the knowledge that developers know about this essential critical skill called SQL.\u00a0It is the one skill that every full stack developer must learn and understand.\u00a0 Transactional systems are not going away even though there is a lot of talk [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":2222,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32,31],"tags":[],"class_list":["post-2176","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bi-dashboards-analytics","category-bi-data-warehouse"],"jetpack_featured_media_url":"https:\/\/www.designandexecute.com\/designs\/wp-content\/uploads\/2017\/02\/sql.png","_links":{"self":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts\/2176","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/comments?post=2176"}],"version-history":[{"count":5,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts\/2176\/revisions"}],"predecessor-version":[{"id":10080,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/posts\/2176\/revisions\/10080"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/media\/2222"}],"wp:attachment":[{"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/media?parent=2176"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/categories?post=2176"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.designandexecute.com\/designs\/wp-json\/wp\/v2\/tags?post=2176"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}