Saturday, 17 February 2018

Quickly Checking for Stale Statistics in SQL Server

Monitoring the state of database statistics is crucial because they significantly influence overall performance. When statistics become outdated, performance degradation is inevitable, often leading to slow and inefficient systems—something nobody wants. This raises several important questions:
  • How often have you faced unforeseen performance issues, even after completing your regular database maintenance tasks?
  • Have you wondered why tasks like index rebuilding and statistics updates are sometimes insufficient?
  • If everything was running smoothly before, why has performance suddenly declined?
  • Why are indexes not being utilised as expected? Should more indexes be created?
The answer to many of these questions often lies in stale statistics. It's essential to remember that statistics contain vital information that the SQL Optimiser uses to generate optimal execution plans for queries. If these statistics are outdated, the impact on performance can be severe. Simply having the right indexes isn’t enough if the statistics are stale.

In general, it's advisable to update statistics when at least 20% of the data has changed—provided the table contains more than 500 rows. But how can you determine how much data has changed, and whether the index statistics need to be updated? Thankfully, in SQL Server, this is easy to check. Below is a script designed to identify which index statistics are outdated, allowing you to focus your efforts where they're most needed to maintain optimal performance. The script uses the STATS_DATE function, which returns the date of the most recent update for statistics on a table or indexed view:

select db_name() as databasename,
schema_name(t.[schema_id]) as schemaname,
t.name as tablename, i.name as indexname,
stats_date(i.id, i.indid) as stats_last_update, --using stats_date function
i.rowcnt as row_count, i.rowmodctr as rows_changed,
cast((cast(i.rowmodctr as decimal(20, 8)) / cast(i.rowcnt as decimal(20, 2)) * 100.0) as decimal(20, 2)) as percent_rows_changed
from sys.sysindexes as i
inner join sys.tables as t on t.[object_id] = i.[id]
where i.indid > 0 --excluding heaps or tables that do not have any indexes
and i.rowcnt >= 500 --only indexes with more than 500 rows
order by percent_rows_changed desc;
Note that there are two types of statistics: Column Statistics and Index Statistics. Column statistics are automatically created for columns based on queries, provided the AUTO_CREATE_STATISTICS option is enabled in the database. On the other hand, index statistics are created alongside indexes and are updated with a full scan when indexes are rebuilt. Therefore, there's usually no need to manually update index statistics after an index rebuild. In fact, doing so might be counterproductive, as using the UPDATE STATISTICS command without specifying parameters defaults to sampling, which is often not the best option in this case. 

Regarding column statistics, after rebuilding indexes, it is only necessary to update column statistics. In most cases, using the default sampling is sufficient, though a full scan is recommended if possible. 

A common question is how often statistics should be updated. The frequency depends on how often your data changes. For highly transactional environments, updating statistics once a day or once a week might be necessary, while for others, once a month may suffice.

I hope you find this script helpful. Feel free to share any comments or feedback. Stay tuned for more updates!

2 comments:

  1. How do I manually update statistics and why do I have to do it all? Shouldn't the DBMS do it without my interaction?

    ReplyDelete
    Replies
    1. Thanks for your comment. I have answered your question in this post https://www.percyreyes.com/2024/05/boosting-sql-server-efficiency-why-you.html

      Delete

Let me know any remarks or questions you may have. Please write down your name.

HELLO, I'M PERCY REYES! I've been working as a senior SQL Server Database Engineer for over 20 years; I'm a three-time Microsoft Data Platform MVP. I'm a cryptographer conducting research on cryptographic Boolean functions and their applications.