Strangenut

Dissapointed with the quality of technical journalism

Ok, so this weekend I was reading up on some ways to write high performance T-SQL, as I am always on the hunt for new ideas.  I came across this link where the author, Tim Chapman, has written a series of articles on SQL Server programming, administration, performance tuning and other related topics. At first I was intrigued, surely this person must be knowledgeable on the subject if they have written this many articles for a large rag like Tech Republic.

Not so much. 

While I won't flat out say he's wrong on most of what he has written about, I will however say of what I've read so far conceptually he is on the right track, but most of his techniques are... misguided and aren't really backed up by a lot of detail just "this is the best way to do it, trust me", and doesn't even reference any other widely regarded sources (Books Online, other bloggers/speakers/experts such as Kimberly Tripp and other), published articles written by anyone other than himself, and so on.

In one such article Assess a SQL Server database developer applicant's skills with this TSQL test Tim gives a script to run to create a small set of test tables, then a short series of questions to pose to a potential SQL Server developer and have this individual write a query to satisfy the question.  Now, Tim DOES preface this with the statement

Most, if not all, of the questions in this TSQL test can be answered in different ways.

The answers I provide are the way in which I like to write queries, and the way I prefer to see them answered.

However, since the questions may be answered in various ways, it is important that the test is graded by someone who is very well versed in SQL programming so that any differences can be investigated and graded accordingly.

This is very true, and I don't want this post to seem to just rag on him and his articles, but then he follows it up with this statement at the end:

I've given a test similar to this one to many SQL Server database developer applicants. From what I can recall, only two applicants have been able to answer all of the questions correctly.

The overall average score is around 50 - 60%. If an applicant performs above that average, I would qualify him or her as a good TSQL programmer; if the applicant scores over 90%, that individual is an exceptional programmer.

This seems... contradictory. I guess this number is highly dependant on his interpretation of what "correct" is. 

The Question

This is question #2: Return the CustomerID, First Name, and Last Name of those individuals in the Customer table who have made no Sales purchases. Which his answer is this (in order to really understand/write/see these queries read his article, and one of the items is a script to create and populate the test tables)

 

SELECT c.CustomerID, c.FirstName, c.LastName
  FROM Sales s
 RIGHT OUTER JOIN Customers c ON s.CustomerID = c.CustomerID
 WHERE s.CustomerID IS NULL  



Ok, here is where my issue begins. Below is the execution plan as is for this query:

ssms screenshot 1
 
Ok, now, not all of you may know what an execution plan is, or if you do know, you may not understand how to read an execution plan. I'll be as succint and clear as I can without explaining in great detail what they are and how to read them.  In a nutshell, the total percentages by each icon, will always add up to 100% regardless of what you do, so all of your cost is going to be somewhere in the query, you can make some numbers go up when others go down and vice versa, but that doesn't mean that the amount of work done for the individual item changes.  The number displayed is always relative to the overall cost of the batch.

So what we see here is on the far right, two items labeled "Clustered Index Scan" these are basically items representing the tables Sales and Customers (technically the clustered indexes on them, but for the purposes of discussion, same thing). Then to their left, we see the query engine processing the right outer join. Notice the thicker line arrow coming from the lower right hand icon for "Sales", what the thickness of this line represents is the number of rows being returned from that item to be processed in the join.

Basically what this query is doing is taking all of the sales, and all of the customers, and joining them into one big list, then checking to see if the column returned from the joined entities representing the customerid in Sales is null. I'm mystified as to why he would want to see a query written like this honestly. If a dba wrote this query for me, I would not regard much of what that dba said, or be on very high alert from that point on.

Another Answer

Just so we're all on the same page here, its common sense that if you can eliminate as many rows as possible from the source objects (tables/views/etc) before the join occurs, that the join will consume far less resources and be far more performant.  To put my money where my mouth is, here is my revised version:

select customerid, firstname, lastname

  from customers

 where customerid not in (select customerid from sales)

There are a few fundamental differences here from Tim's version and mine.  One that pops out at me right away is that this way of doing it is much clearer to another developer (who didnt write it) to understand what it is I am trying to accomplish here.  You are getting the name and customerid of the customers who have not yet purchased anything, bada-bing, bada-boom.  Now, let's take a look at the query execution plan:

ssms screenshot 2

Notice now that the join process that used to account for 25% of the overall execution cost now accounts for a mere 1%, and the rows of data returned from Sales now is on par with what is returned from Customers.  The next step I would take to tune this query is to create nonclustered indexes on the customerid columns, anytime you see an index scan on an index, it means that the index is not being used  in the query processing.
 
 


Posted Feb 24 2008, 10:49 PM by dacrowlah
Powered by Community Server (Non-Commercial Edition), by Telligent Systems