Index Tuning to Improve Database Performance

Many modern systems today rely on databases, and databases rely on performance tuning to ensure that these systems work properly.  This article investigates some basic “rules of thumb” for properly tuning databases stored on Microsoft SQL Server.  There are two basic tools that ship with SQL Server that can be used to tune performance: the SQL Server Profiler and the SQL Query Analyzer.  The Profiler allows you to monitor a SQL Server for the queries that the server handles, and to collect performance metrics on particular queries.  Longer running queries will require more CPU, read, and write time, and will generally have a longer than average duration.  The Profiler will also provide you with the specific syntax executed against the server.

The SQL Query Analyzer allows you to examine individual queries.  The Analyzer provides you with a way to examine the execution plan of the server of the query that you are examining.  The execution plan is the process that the SQL Server itself will engage in order to return results for a particular query.  The server must take data from individual tables or indexes, combine these objects based on matching fields, and then collate the final query results for display to the requester.

The execution plan is read from right to left.  At the right most will be the starting indexes or tables that SQL Server will use in order to return results.  Some general rules of thumb:

1. The object or step that consumes a large percentage of the total processing time of the query should be examined first.

2. If a starting object is a table, analyze whether you an index can be declared that will “cover the query”[1] and will also be a feasible[2] index to create.

With regards to feasibility, there are two basic considerations.  First, are the fields that you plan to add into the index ones that will actually help improve performance if the index is used as compared to the underlying table?  Second, if the table is particularly large, are there already enough indexes being stored that the space for the new index is not worth the performance improvement likely to be gained from its creation?

As to the first issue, generally, text and nvarchar fields are poor choices to be included in an index (or any large byte field in a table).  The reason for this is that the index will store a copy of these values separate from the table, hopefully in an index that is smaller, in total bytes and pages of data, than the underlying table itself.  However, if one field in the table takes up a large portion of the total bytes per row, and you declare an index with that same field, the overall performance of scanning the resulting index and the underlying table will be similar, and there will be minimal performance improvements to the query you are executing.

As to the second issue, the storage space of a database is an important consideration.  The more indexes and the large the table indexed, the more space will be taken up by a new index.  This may cause the benefit of the new index to performance to be outweighed by the additional storage space, backup time, and other costs associated with the index.  For example, if a particularly large table requires a new index for a query that is run once per month, the incremental cost for storing the new index may not be worth the benefit of a query that is only marginally improved and is infrequently executed.

3. Examine the fields that are being used to join the tables together in the query.  Generally, the primary keys that join tables together are a part of a clustered index on the table; however, there may not be an index that includes a foreign key in a table.  As a result, the query performance may be degraded.

4. Examine the fields being selected in the query.  Are there fields that can be taken out of the select statement?  Are there any indexes that can cover the fields being selected?

5. When tables are being put together to get the results of the query, is the join process in the estimated execution plan a “merge-join,” “hash match” or “nested loop.”  Merge-join’s run the quickest when the tables/indexes so joined are joined and organized on the same field that is used to complete the join.  Hash match is the most common joining algorithm, and is a two step process where a hash table is created of the smaller table, and a probe is used from the smaller table to find matching records in the larger table.  Nested loops are usually the least efficient joins, as they require one table to run as the outer table, which is scanned row by row to see if each row matches any of the rows in the inner table.  If SQL is using a nested loop for two large tables, your query will be in trouble.  Sometimes, you can create an index on the joining tables that will help SQL Server to do a better job executing the query, especially if one of these joins is a bottleneck to performance.

6. If an index is being used but is taking up substantial processing time, you should evaluate whether the index has been properly maintained.  Indexes that are fragmented will have poor performance over time and require regular maintenance in order to perform optimally.

7. Can a view be declared to cover the query that is being used?  Views, being precompiled, will sometimes marginally improve the performance of a query that relies on the view, rather than one that runs against the underlying tables directly.

When conducting index tuning, it is important to the tuning process to have a larger picture of the indexes that exist presently in the database, and also a working understanding of the common joins between the tables in the database.


[1] Covering the query means creating an index that will have the fields in it that are being selected as a part of the tested query.  For example, if the query itself asks for pid, did, and sdid from the document table, an index that has pid, did, and sdid will cover the query.

[2] Feasible indexes are usually based on fields that are integers or other, small byte size fields, that do not substantially overlap with an existing index, and that will not consume a large amount of space within the database itself.  New indexes can increase the total amount of space required by the database by 10-20% each; this may limit the total number of indexes that can, cost effectively, be created for a particular table or database.  In addition, indexes that contain large byte fields, such as varchar or long nvarchar fields, will be unlikely to perform better than the underlying table.

Published by

faithatlaw

Maryland technology attorney and college professor.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.