Percy Reyes: Statistics

Showing posts with label Statistics. Show all posts

Monday, 13 May 2024

Boosting SQL Server Efficiency: Why You Should Update Statistics Manually

While SQL Server provides automatic statistics management, manually updating statistics allows for greater control over database performance. Regularly monitoring the state of your statistics and updating them as necessary can lead to more efficient query execution, especially in environments with dynamic data and high transaction volumes. By implementing a proactive strategy for managing statistics—whether through scheduled jobs, maintenance plans, or manual updates—you can ensure that your SQL Server instance continues to perform optimally over time.

Understanding the SQL Server Query Optimiser: A Comprehensive Overview

The SQL Server Query Optimiser is a crucial component of Microsoft SQL Server, responsible for determining the most efficient way to execute a given query. The optimiser takes into consideration various factors, including the structure of the query, available indexes, statistics about the data, and the overall database schema. Here is a detailed breakdown of its functionalities, processes, and related concepts.

1. Purpose of the Query Optimiser

The primary goals of the Query Optimiser include:

Generating Execution Plans: The optimiser creates a roadmap that describes how SQL Server will access the data required by the query.
Maximising Efficiency: By evaluating different potential execution paths, the optimiser selects the one that minimises resource utilisation, such as CPU, memory, and disk I/O.
Improving Response Times: A well-optimised query results in faster execution, contributing to better overall system performance.

Quickly Checking for Stale Statistics in SQL Server

Monitoring the state of database statistics is crucial because they significantly influence overall performance. When statistics become outdated, performance degradation is inevitable, often leading to slow and inefficient systems—something nobody wants. This raises several important questions:

How often have you faced unforeseen performance issues, even after completing your regular database maintenance tasks?
Have you wondered why tasks like index rebuilding and statistics updates are sometimes insufficient?
If everything was running smoothly before, why has performance suddenly declined?
Why are indexes not being utilised as expected? Should more indexes be created?

The answer to many of these questions often lies in stale statistics. It's essential to remember that statistics contain vital information that the SQL Optimiser uses to generate optimal execution plans for queries. If these statistics are outdated, the impact on performance can be severe. Simply having the right indexes isn’t enough if the statistics are stale.

Dropping columns statistics after migrating to SQL Server 2014/2016/2017

While migrating databases to new versions of SQL Server, there are some tasks that need to be carried out in order to warrant strong consistent data and physical integrity of databases in tandem with their performance. For instance, it is common to run DBCC CHECKDB, rebuild all indexes, and update columns and index statistics. In some scenarios, we might consider dropping columns statistics and let SQL Server create them again according to new algorithms especially when the new cardinality estimator (CE) is going to be used after upgrading to SQL Server 2014/2016/2017. I personally do that, after restoring databases on the new version of SQL Server I proceed to drop all columns statistics (always having AUTO_CREATE_STATISTICS option enabled) and then SQL Server will definitely create them again based on the nature of database queries and the logic of THE new CE. In a long-term perspective this technique is the most recommendable from my point of view as not only will we have new versions of statistics but also purge old and unused statistics.
Here I will share a script to delete columns statistics. Be caution and only drop them if you have AUTO_CREATE_STATISTICS option enabled, otherwise no statistics will be created and the database performance will be affected tremendously. That is all for now. Let me know any remarks you may have.

SET NOCOUNT ON     
  IF db_name() NOT IN ('model','master','distribution','msdb','tempdb')
    BEGIN 
        DECLARE @schema_name varchar(max)      
        DECLARE @table_name varchar(max)     
        DECLARE @stat_name varchar(max)        
        DECLARE @update_stat_cmd varchar(max)        
        DECLARE @update_stat_msg_header   varchar(max)      
        DECLARE update_stat_cursor CURSOR FOR    
              select  schema_name(o.[schema_id]),  object_name(s1.[object_id]) , s1.name   
              from (  
                select s.[object_id], s.name from sys.stats s  
                left join sys.indexes i on s.name=i.name  
                where i.name is null) s1  
              inner join sys.objects o on o.[object_id]=s1.[object_id]  
              where  o.type='U'  
              order by schema_name(o.[schema_id]),  object_name(s1.[object_id]) , s1.name   
             
         OPEN update_stat_cursor        
         FETCH NEXT FROM update_stat_cursor INTO  @schema_name, @table_name,  @stat_name         
         WHILE (@@fetch_status = 0)        
         BEGIN    
           DECLARE @ini DATETIME, @fin DATETIME     
           SET @update_stat_msg_header =  '->Dropping ['+ RTRIM(@schema_name)  +'].[' + RTRIM(@table_name) + '].[' +@stat_name + ']'      
           PRINT @update_stat_msg_header    
           SET @update_stat_cmd ='DROP STATISTICS ['+ RTRIM(@schema_name)  +'].[' + RTRIM(@table_name) + '].[' +@stat_name + ']'    
           SET @ini=GETDATE()    
           EXEC (@update_stat_cmd)      
           SET @fin=GETDATE()    
          
           FETCH NEXT FROM update_stat_cursor INTO @schema_name, @table_name,  @stat_name            
         END        
     
         PRINT ' '       
         PRINT '----------------------------------------------------------------------------- '        
         SET @update_stat_msg_header = '*************  THERE ARE NO MORE STATISTICS TO BE UPDATED **************'         
         PRINT @update_stat_msg_header        
         PRINT ' '        
         PRINT 'All statistics not linked to any index were rebuilt!'        
          
        CLOSE update_stat_cursor        
        DEALLOCATE update_stat_cursor    
    
   END  
SET NOCOUNT OFF

Tuesday, 1 August 2017

Updating so-called 'column statistics' for the whole SQL Server database

Definitely, statistics in SQL Server are vital to ensure stable database performance as a whole. Every DBA must know the importance of keeping statistics up to date, however, not everyone who is in charge of a database knows much about it, especially, those who do not have this role or are DBA by accident. So, I will just mention that statistics are objects that contain data distribution, density, selectivity, cardinality, etc. for all columns that were at least once queried or linked to indexes. This information helps the SQL optimizer make the right decision of what indexes and resources should use so as to execute and process the queries as fast as possible. Put differently, statistics allow SQL optimizer to create the best possible execution plan to run queries efficiently, otherwise, when statistics are not up to date bad execution plans may be created which will make inefficient use of resources bringing down the performance of the whole database and, even worse the whole server. So, it is a basic task to routinely update statistics at least once a month as part of your database maintenance tasks.
It is worth noting that statistics that linked to indexes (so-called 'index statistics') are full updated automatically when indexes are rebuilt. So, those statistics do not need explicit update. Nevertheless, statistics linked to columns (so-called 'column statistics') still need of an explicit update and this is what we need to do after rebuilding all the indexes. Thinking about it, today I am going to share a script to only run the update of columns statistics:

SET NOCOUNT ON     
DECLARE @schema_name varchar(max)      
DECLARE @table_name varchar(max)     
DECLARE @stat_name varchar(max)        
DECLARE @update_stat_cmd varchar(max)        
DECLARE @update_stat_msg_header   varchar(max)      
DECLARE update_stat_cursor CURSOR FOR    
          select  schema_name(o.[schema_id]),  object_name(s1.[object_id]) , s1.name   
          from (  
            select s.[object_id], s.name from sys.stats s  
            left join sys.indexes i on s.name=i.name  
            where i.name is null) s1  
          inner join sys.objects o on o.[object_id]=s1.[object_id]  
          where  o.type='U'  
          order by schema_name(o.[schema_id]),  object_name(s1.[object_id]) , s1.name   
  
       
OPEN update_stat_cursor        
FETCH NEXT FROM update_stat_cursor INTO  @schema_name, @table_name,  @stat_name         
WHILE (@@fetch_status = 0)        
  BEGIN    
       DECLARE @ini DATETIME, @fin DATETIME     
       SET @update_stat_msg_header =  '->Updating ['+ RTRIM(@schema_name)  +'].[' + RTRIM(@table_name) + ']([' +@stat_name + '])'      
       PRINT @update_stat_msg_header    
       SET @update_stat_cmd ='UPDATE STATISTICS ['+ RTRIM(@schema_name)  +'].[' + RTRIM(@table_name) + ']([' +@stat_name + '])'    
       SET @ini=GETDATE()    
       EXEC (@update_stat_cmd)      
       SET @fin=GETDATE()    
      
       FETCH NEXT FROM update_stat_cursor INTO @schema_name, @table_name,  @stat_name            
  END        
 
PRINT ' '       
PRINT '----------------------------------------------------------------------------- '        
SET @update_stat_msg_header = '*************  THERE ARE NO MORE STATISTICS TO BE UPDATED **************'         
PRINT @update_stat_msg_header        
PRINT ' '        
PRINT 'All statistics not linked to any index were rebuilt!'        
  
CLOSE update_stat_cursor        
DEALLOCATE update_stat_cursor    
SET NOCOUNT OFF

That is all for now. Thanks for reading. Let me know any remarks you may have. Stay tuned.

Thursday, 20 July 2017

Detecting excessive compilation and recompilation issues

Undoubtedly, recompilation is a big topic to reckon with, more importantly, in database environments processing data that is changing rapidly over time and compounded by ad-hoc workloads which may cause CPU bottleneck, so it is of paramount importance to detect excessive compilation and recompilation issues and address them to warrant stable performance for the queries, and in this sense, there are some tools to be used to detect these issues such as Performance Monitor, Extended Events, SQL Server Profiler Trace, DMVs, etc. When it comes to using Performance Monitor we should concentrate the efforts on analising the performance counters SQL Server: SQL Statistics: Batch Requests/sec, SQL Server: SQL Statistics: SQL Compilations/sec, and SQL Server: SQL Statistics: SQL Recompilations/sec. What's more, you can save a trace file capturing the events SP:Recompile, SQL:StmtRecompile, and CursorRecompile, then you can use the following query to see all the recompilation events:

select spid, StartTime, Textdata, EventSubclass, ObjectID, DatabaseID, SQLHandle 
from fn_trace_gettable('C:\RecompilationTrace_01.trc', 1)
where EventClass in(37,75,166) -- 37 = Sp:Recompile, 75 = CursorRecompile, 166 = SQL:StmtRecompile

Furthermore, we can also capture the showplan XML for query compile, but doing that has significant performance overhead because it is captured for each compilation or recompilation. So, just do it for a very short time as long as you see a high value for the SQL Compilations/sec counter in Performance Monitor. Once you know where the problem is you can use Engine Tuning Advisor to see whether any indexing changes improve the compile time and the execution time of the query.
Talking of DMVs to diagnose recompilation issues, looking into 'sys.dm_exec_query_optimizer_info' is very helpful, in particular, look at the Elapsed Time, which is the time elapsed due to optimizations, and also Final Cost. If you see that Time Elapsed is very close to the CPU time, you might reach the conclusion that the compilation and recompilation time is attributable to that high CPU use. Another DMV to use is 'sys.dm_exec_query_stats' whose most important columns to look at are sql_handle, total worker time, plan generation number (the number of times the query has recompiled), and statement Start Offset. Here is an example to check the top 20 most procedures that have been recompiled.

select top 20 SQLText.text, sql_handle, plan_generation_num, execution_count, dbid, objectid 
from sys.dm_exec_query_stats
   cross apply sys.dm_exec_sql_text(sql_handle) as SQLText
where plan_generation_num >1
order by plan_generation_num desc

There is a plethora of recommendations to deal with recompilation and keep the performance in optimum conditions, but for now you can take into consideration the following options:

Check to see whether the stored procedure was created with the WITH RECOMPILE option or whether the RECOMPILE query hint was used. If a procedure was created with the WITH RECOMPILE option, since SQL Server 2005, you may be able to take advantage of a statement-level RECOMPILE hint if a particular statement within that procedure needs to be recompiled. Using this hint at the statement level avoids the need of recompiling the whole procedure each time it executes, while at the same time allowing the individual statement to be compiled.
Recompilations can occur due to changes in statistics, and you can use the KEEPFIXED PLAN query hint to make recompilations occur only when there is the need to ensure correctness and not to respond to changes in statistics. For instance, in this context recompilation can only occur if the underlying table structure or its schema that is referenced by a statement changes, or if a table is marked with the sp_recompile stored procedure, all resulting in the fact that the plan no longer applies and consequently triggering the recompilation event.
Using the KEEP PLAN query hint is useful to set the recompilation threshold of temporary tables to be the same as permanent tables. Take a look at the EventSubclass column which displays 'Statistics Changed' for an operation on a temporary table.
Turning off the automatic updates of statistics for indexes and statistics that are defined on a table or indexed view prevents recompilations that are due to statistics changes on that object. It is worth noting that turning off the auto-stats option is not always a good idea. This is because the query optimizer is no longer sensitive to data changes in those objects resulting in suboptimal query plans. To be honest, I never turned off this option because I always preferred trusting in SQL Server criteria and instead opting to work on optimising queries.
Keep in mind that recompilation thresholds for temporary tables are lower than for normal tables, so if the recompilations on a temporary table are due to statistics changes, you can change the temporary tables to table variables. A change in the cardinality of a table variable does not cause a recompilation. The side effect of this approach is that the query optimizer does not keep track of a table variable's cardinality because statistics are not created or maintained on table variables. This can result in less optimal query plans, however, you can test the different options and choose the best one. Generally, temporary tables provide much better performance than tables variables when lots of data is involved.
Recompilation might also occur as a result from SET option changes, so one can diagnose it by using SQL Server Profiler to determine which SET option changed. It is highly advisable to avoid changing SET options within stored procedures and it is much better to set them at the connection level, and for the overwhelmingly majority of cases the default SET options work well. It is also very important to ensure that SET options are not changed during the lifetime of the connection.
To avoid recompilations that are due to deferred compilations, do not combine DML with DDL and do not create the DDL as a result from conditional IF statements.
To avoid recompilation and also to avoid ambiguity between objects, batches should have qualified object names, for example, dbo.Table1, User1.MySP, etc.

That is all for now, let me know any remarks you may have.

Percy Reyes

Pages