In the world of RDBMS' you may hear the term "table scan" thrown around, this is a term used to describe an event that occurs when you search for data in a table, and your query either doesn't take advantage of an existing index, or there is no index on the table. Generally, a table scan is not a good thing to have happening on a frequent basis in your database. There are some instances where it's not a bad thing, and I will try now to explain what it is, why it's happening, and when its not that bad of an occurence.
What Happens During a Table Scan
Suppose you aren't much of an organized person, and your cubicle at work isn't really well organized. You've got papers laying all around, some filed, some stacked, some in the filing cabinet, some in folders. Generally speaking, it's hard to find data like that; then suppose your boss asks you for all of your notes related to a specific project that you are working on. Unless you work for a very large company with ample developers and enough resources to have you dedicate yourself to one project at a time, you probably have multiple projects going on, and notes on all of them are mixed around the cubicle.
So how do you find everything? Answer: you look through every stack, folder, drawer, and box you've got; and then hope against hope you have it all. Well, this is essentially a table scan, your cubicle is a table, and you my friend, are the Query Processing engine.
When a query is submitted to SQL Server, it attempts to determine the best way to execute a query, generating what is called a "Query Execution Plan", that describes how it will go about fulfilling the request to find every note related to the project, and return them to the boss. The reason these are typically considered a Very Bad Thing ® is because when SQL Server has to search through all of the data in a table to satisfy a query, it consumes more resources than it really needs to.
How to Avoid Table Scans
If you are an organized person, you keep all of your project notes neatly organized, filed away, and close to each other. SQL Server has its way of doing this as well, it's called an index; an index is a way of keeping track of what data you have,and where it resides in order to enable a fast retrieval of it when it is requested.
Now, building an index in SQL Server is easy, however knowing when and where to build them is a bit more of an art. SQL Server performance tuning is absolutely dependant on indexes, and being knowledgeable enough about how to write queries that take advantage of them is essential. Generally speaking, if a column in a table is referenced frequently in queries, then it should have an index on it, the more read intensive your database is, the more this holds true. It is not usually a good idea to have indexes on every column in a table for multiple reasons, chief among these is space considerations; under the covers, indexes are copies of the data from specific columns in a table extracted and organized for fast searching, so to build an index will take up space, roughly equivalent to the volume of data being indexed.
Another consideration to take in account when building indexes is whether or not your data is added to or updated very frequently; the more often data in a table changes, the greater the impact indexes will have in performance. Every time indexed data changes, SQL Server has to ensure that the index is kept up to date, which adds to the processing overhead which often times is overlooked in database design and performance considerations.
When a Table Scan Isn't Such a Bad Thing ®
As I mentioned previously, sometimes its not so bad to have table scans. If you are retrieving every row from a table, and plan on using it, it doesn't really matter that you are doing a table scan because there is really no other way to do it. Another time that it doesn't matter so much is when there is very little data in a table, like a type table or a small lookup table, because SQL Server can intelligently cache data when it needs to, and when there is very little data it can actually degrade performance some when it does have an index because of the overhead of maintaining the index. Striking a good balance of indexing vs. non-indexing is a skill that will come with time and experience.
It's a good idea to learn how to read an execution plan and know what your server is doing internally when it is running your queries; an upcoming post will cover this.
Feb 24 2008, 07:18 PM