Percy Reyes: Performance Tuning

Showing posts with label Performance Tuning. Show all posts

Monday, 13 May 2024

Boosting SQL Server Efficiency: Why You Should Update Statistics Manually

While SQL Server provides automatic statistics management, manually updating statistics allows for greater control over database performance. Regularly monitoring the state of your statistics and updating them as necessary can lead to more efficient query execution, especially in environments with dynamic data and high transaction volumes. By implementing a proactive strategy for managing statistics—whether through scheduled jobs, maintenance plans, or manual updates—you can ensure that your SQL Server instance continues to perform optimally over time.

Detailed Explanation of SQL Server Query Hints

SQL Server query hints are directives that can be included in your SQL queries to guide the SQL Server Query Optimiser in executing them. Typically, SQL Server automatically selects the most efficient execution plan based on the query and underlying data. However, query hints enable you to override the optimiser’s default choices, giving you greater control over specific aspects of query execution.

While these hints can significantly enhance performance in particular scenarios, they must be used cautiously and strategically. Incorrect application of hints can lead to degraded performance or unpredictable outcomes. Therefore, it is essential to thoroughly test your queries both with and without hints to evaluate their impact. Ensure that hints are employed appropriately and suited to your specific workloads to achieve optimal results.

Below is a detailed look at common SQL Server query hints, examples of their usage, and the potential risks associated with each.

Understanding the SQL Server Query Optimiser: A Comprehensive Overview

The SQL Server Query Optimiser is a crucial component of Microsoft SQL Server, responsible for determining the most efficient way to execute a given query. The optimiser takes into consideration various factors, including the structure of the query, available indexes, statistics about the data, and the overall database schema. Here is a detailed breakdown of its functionalities, processes, and related concepts.

1. Purpose of the Query Optimiser

The primary goals of the Query Optimiser include:

Generating Execution Plans: The optimiser creates a roadmap that describes how SQL Server will access the data required by the query.
Maximising Efficiency: By evaluating different potential execution paths, the optimiser selects the one that minimises resource utilisation, such as CPU, memory, and disk I/O.
Improving Response Times: A well-optimised query results in faster execution, contributing to better overall system performance.

Detecting Poor Cursor Usage in SQL Server

Undoubtedly, poor cursor usage is a critical issue to review in order to mitigate the risk of CPU bottlenecks and to assess whether cursors are the most appropriate means for processing or if set-based operations would be more effective. It is well known that set-based operations are generally more efficient; however, if you choose to use cursors, you must ensure that they do not adversely affect the performance of the database.

New Features in SQL Server 2022 for DBAs

SQL Server 2022 brings a wealth of new features that empower DBAs to enhance database performance, security, and usability. By leveraging these capabilities, organizations can optimise their data management strategies, ensuring better compliance, improved efficiency, and advanced analytics capabilities. The introduction of intelligent query processing, enhanced security features, and support for serverless architectures positions SQL Server 2022 as a robust choice for modern data solutions.

Here’s a summary of the key features and their benefits:

1. Intelligent Query Processing Enhancements

Overview: SQL Server 2022 expands on the Intelligent Query Processing (IQP) capabilities introduced in SQL Server 2019. New features include:

Parameter Sensitive Plan Optimization: This feature allows SQL Server to create multiple execution plans for a single query based on the specific parameter values used during execution, leading to better performance.

Case Scenario: In a scenario where a stored procedure is called with highly variable input parameters (e.g., a sales report for different regions), the parameter-sensitive optimisation ensures that SQL Server selects the best execution plan tailored to the input, reducing execution time and resource consumption.

2. SQL Server Ledger

Overview: The SQL Server Ledger feature provides blockchain-like capabilities, ensuring data integrity and immutability through ledger tables. This feature uses cryptographic methods to verify data integrity, making it ideal for applications requiring audit trails.

Case Scenario: For financial institutions that need to maintain accurate records of transactions, SQL Server Ledger enables the creation of a ledger table where each transaction is logged. Any modifications are tracked, ensuring an immutable record that can be audited for compliance purposes.

3. Enhanced Security Features

Overview: SQL Server 2022 includes several security enhancements:

Always Encrypted with Secure Enclaves: This allows more operations to be performed on encrypted data without exposing it in plaintext, enhancing security for sensitive information.
Dynamic Data Masking Enhancements: Provides more flexible masking options, allowing DBAs to control how sensitive data is displayed to different users.

Case Scenario: In a healthcare application, patient data can be stored securely using Always Encrypted. The application can perform queries and calculations on encrypted data without revealing sensitive information to unauthorised users.

4. Query Store Enhancements

Overview: The Query Store feature has been enhanced to provide deeper insights into query performance over time. It now includes capabilities for identifying and managing query performance regressions.

Case Scenario: DBAs can utilise the enhanced Query Store to track performance changes after application deployments. If a new release introduces slow-running queries, the Query Store can help identify regressions, allowing quick remediation by reverting to previously optimised execution plans.

5. Built-in Machine Learning Services

Overview: SQL Server 2022 integrates built-in support for machine learning services, allowing DBAs to run R and Python scripts directly in the database engine. This feature enables data scientists and analysts to perform advanced analytics without moving data outside the SQL Server environment.

Case Scenario: A retail company can analyse customer purchasing patterns using machine learning algorithms stored in SQL Server. DBAs can schedule these analyses to run during off-peak hours, optimizing resource usage while delivering timely insights.

6. Serverless SQL Database

Overview: SQL Server 2022 introduces serverless capabilities for databases, allowing automatic scaling and cost-effective resource usage based on demand. This feature is ideal for workloads that experience variable usage patterns.

Case Scenario: A startup that experiences fluctuating traffic on its web application can benefit from serverless SQL databases, which automatically scale up during peak traffic and scale down during low usage periods, ensuring cost-efficiency.

7. Enhanced Availability Groups

Overview: New features for Always On Availability Groups improve failover performance and management. This includes automatic failover for databases in availability groups without the need for manual intervention.

Case Scenario: In a mission-critical application, the DBA can configure availability groups with automatic failover capabilities, ensuring minimal downtime during server maintenance or unexpected failures, thus enhancing application reliability.

8. Improved Performance Insights and Monitoring Tools

Overview: SQL Server 2022 provides enhanced monitoring tools with better insights into system performance. Features like the Database Health Monitor offer real-time health checks and performance tuning recommendations.

Case Scenario: DBAs can utilize the Database Health Monitor to receive alerts and recommendations for performance tuning based on workload analysis, enabling proactive management and optimization of database performance.

9. Support for JSON and XML Enhancements

Overview: SQL Server 2022 introduces improvements to support for JSON and XML data types, including better indexing and querying capabilities.

Case Scenario: A content management system that relies on JSON documents can leverage the new indexing features to improve query performance on JSON fields, allowing faster data retrieval and reporting.

That's all for now.

Thursday, 4 March 2021

Don't Place the Blame on SQL Server

I have never worked for Microsoft, but SQL Server has given me a lot in terms of learning, community and opportunities, all these together have helped me do a great job as a Database Administrator (DBA) for many years. Since I started working with SQL Server nearly 15 years ago, I have heard a lot of complaints about SQL Server being nowhere near as good as Oracle. Much as I would have liked to ignore these fruitless discussions, I couldn't see the point of comparing products in such a compulsive way. Is it not true that our skills are more crucial than the technology itself? — Or perhaps some people just try to find something to blame. Whatever the case, I am convinced that we, as database professionals, are compelled to make the most out of any specific database technology.

No matter what technology we are working with, we are at the wheel — technology is just a tool — so it is not the best to blame technology on the ground of one's inefficiency.

Quickly Checking for Stale Statistics in SQL Server

Monitoring the state of database statistics is crucial because they significantly influence overall performance. When statistics become outdated, performance degradation is inevitable, often leading to slow and inefficient systems—something nobody wants. This raises several important questions:

How often have you faced unforeseen performance issues, even after completing your regular database maintenance tasks?
Have you wondered why tasks like index rebuilding and statistics updates are sometimes insufficient?
If everything was running smoothly before, why has performance suddenly declined?
Why are indexes not being utilised as expected? Should more indexes be created?

The answer to many of these questions often lies in stale statistics. It's essential to remember that statistics contain vital information that the SQL Optimiser uses to generate optimal execution plans for queries. If these statistics are outdated, the impact on performance can be severe. Simply having the right indexes isn’t enough if the statistics are stale.

Avoid changing default ANSI database options

Not having another way of fixing some specific errors, at times some people may consider turning off any ANSI database options as a final solution. To be perfectly honest, I do recommend getting to the bottom of each problem and then fixing it at that level instead of changing default ANSI settings (unless it is just a quick fix or is truly necessary because you verified the benefits are significant). For instance, it is by no means uncommon to turn ANSI_WARRINGS off to fix the following error:

Msg 8152, Level 16, State 14, Line 5
String or binary data would be truncated.

SQLServer Buffer Manager Buffer cache hit ratio, Page Life,Lazy Write/sec, Checkpoint Pages/sec

Without any shadow of a doubt, one of the procedures to follow in order to monitor and diagnose performance issues of a database server is to analyse meaningful performance counters related to SQL Server. As a Database Administrator, we need to get used to dealing with this sorts of problems so it truly helpful to know how to analyse and interpret them by contrasting with other windows performance counters.

To begin with, we can analyse SQLServer Buffer cache hit ratio performance counter which gives us the percentage of cache memory usage. This percentage should mostly be above 99% for OLTP database servers. Therefore, nowadays 4GB or 8GB of RAM is not good enough for the vast majority of them. The knock-on effect of this lack of RAM is that there will be a huge increase on the I/O disk activity as a result of using much more paging (data pages are moving in and out from memory very frequently because there is not enough space to locate new ones). This will also cause a huge disk queue which will keep disk subsystem busy, consequently, it will impact directly on the database performance detrimentally. In this context, we also need to look into other SQL Server performance counters like SQLServer: Buffer Manager: Page Life Expectancy, SQLServer: Buffer Manager: Lazy Write/sec, and SQLServer: Buffer Manager: Checkpoint Pages/sec.

As I mentioned before, there are other useful performance counters to diagnose and tune cache memory issues like Buffer Manager: Page Life Expectancy which indicates the time in seconds that a page has been inside of memory. This time should mostly be above 300 seconds, that is, 5 minutes. So, values below it should be considered as an alert, which means that SQL Server is under high memory pressure because bad-written queries may be using a lot. Clearly, the final solution is not to add more memory but identify and optimise those problematic queries. Only after that may you opt to add memory.

The other performance counter to check is SQLServer: Buffer Manager: Lazy Write/sec which gives the quantity of pages moved out per second from memory. Not only could it mean a lack of memory, but also there are copious amounts of Checkpoints which are not good at all because it will cause recompilation of stored procedures at the same time.The value of this counter should mostly be below 20. So, if you notice this value above 20, you may need to check SQLServer: Buffer Manager: Checkpoint Pages/sec counter as well. Checkpoints move out all pages from memory to disk.

Finally, keep in mind that it really important to do diagnostic tasks regularly which allow to take preventive actions so as to improve the whole performance over time. I do hope you find this post helpful. Let me know any remarks you may have. Thanks for reading. Stay tuned.

Thursday, 23 November 2017

Dropping columns statistics after migrating to SQL Server 2014/2016/2017

While migrating databases to new versions of SQL Server, there are some tasks that need to be carried out in order to warrant strong consistent data and physical integrity of databases in tandem with their performance. For instance, it is common to run DBCC CHECKDB, rebuild all indexes, and update columns and index statistics. In some scenarios, we might consider dropping columns statistics and let SQL Server create them again according to new algorithms especially when the new cardinality estimator (CE) is going to be used after upgrading to SQL Server 2014/2016/2017. I personally do that, after restoring databases on the new version of SQL Server I proceed to drop all columns statistics (always having AUTO_CREATE_STATISTICS option enabled) and then SQL Server will definitely create them again based on the nature of database queries and the logic of THE new CE. In a long-term perspective this technique is the most recommendable from my point of view as not only will we have new versions of statistics but also purge old and unused statistics.
Here I will share a script to delete columns statistics. Be caution and only drop them if you have AUTO_CREATE_STATISTICS option enabled, otherwise no statistics will be created and the database performance will be affected tremendously. That is all for now. Let me know any remarks you may have.

SET NOCOUNT ON     
  IF db_name() NOT IN ('model','master','distribution','msdb','tempdb')
    BEGIN 
        DECLARE @schema_name varchar(max)      
        DECLARE @table_name varchar(max)     
        DECLARE @stat_name varchar(max)        
        DECLARE @update_stat_cmd varchar(max)        
        DECLARE @update_stat_msg_header   varchar(max)      
        DECLARE update_stat_cursor CURSOR FOR    
              select  schema_name(o.[schema_id]),  object_name(s1.[object_id]) , s1.name   
              from (  
                select s.[object_id], s.name from sys.stats s  
                left join sys.indexes i on s.name=i.name  
                where i.name is null) s1  
              inner join sys.objects o on o.[object_id]=s1.[object_id]  
              where  o.type='U'  
              order by schema_name(o.[schema_id]),  object_name(s1.[object_id]) , s1.name   
             
         OPEN update_stat_cursor        
         FETCH NEXT FROM update_stat_cursor INTO  @schema_name, @table_name,  @stat_name         
         WHILE (@@fetch_status = 0)        
         BEGIN    
           DECLARE @ini DATETIME, @fin DATETIME     
           SET @update_stat_msg_header =  '->Dropping ['+ RTRIM(@schema_name)  +'].[' + RTRIM(@table_name) + '].[' +@stat_name + ']'      
           PRINT @update_stat_msg_header    
           SET @update_stat_cmd ='DROP STATISTICS ['+ RTRIM(@schema_name)  +'].[' + RTRIM(@table_name) + '].[' +@stat_name + ']'    
           SET @ini=GETDATE()    
           EXEC (@update_stat_cmd)      
           SET @fin=GETDATE()    
          
           FETCH NEXT FROM update_stat_cursor INTO @schema_name, @table_name,  @stat_name            
         END        
     
         PRINT ' '       
         PRINT '----------------------------------------------------------------------------- '        
         SET @update_stat_msg_header = '*************  THERE ARE NO MORE STATISTICS TO BE UPDATED **************'         
         PRINT @update_stat_msg_header        
         PRINT ' '        
         PRINT 'All statistics not linked to any index were rebuilt!'        
          
        CLOSE update_stat_cursor        
        DEALLOCATE update_stat_cursor    
    
   END  
SET NOCOUNT OFF

Friday, 3 November 2017

AUTO_CLOSE database option and its impact on the performance

The AUTO_CLOSE database option is only one of the many options related to performance and availability of the database in SQL Server. When AUTO_CLOSE is set to ON for a database, SQL Server closes all its files and releases the resources used for it shortly after the last connection is closed. This action will reduce the usage of memory, nevertheless, it is barely insignificant (12KB or 20KB). Furthermore, having this option turned on, the first connection will have to open the database again, as a result, it will experience a delay. I highly recommend having disabled this option all the time. Here is the code to turn this option off:

       ALTER DATABASE [UserDBInProduction] SET AUTO_CLOSE OFF WITH NO_WAIT

This option is turned off by default, but I have found many databases with this option turned on which also impacts on it performance. The other disadvantage of having enabled it is that when the last established connection to the database is closed, its files are accessible to be manipulated directly via Windows by some user, which means that someone is completely able to delete them while the database engine is running. So, we need to work with lots of cautiousness when it comes to changing not only this database option but also others.

That is all for now, let me know any remarks you may have. Thanks for reading again. Stay tuned.

Friday, 4 August 2017

Troubleshooting timeout expired errors

Having .NET Applications working with SQL Server, some unforeseen timeout errors can be raised for two possible causes: bad-written SQL code (especially Ad-Hoc queries) and application issues. Under this circumstance, developers will generally blame SQL Server at first glance because they may think it is a misconfiguration related to timeout in SQL Server. In my experience troubleshooting these issues, I would say that lots of them are not mostly related to SQL Server configuration itself. Additionally, when it comes to application as the root cause, reading the SQL Server error log, you will not find any error about timeout or failed login events. What does it mean? it simply means that the application never tried to connect to SQL Server because it was still working at application level when the error was raised. Thus, the cause of timeout error is inside the application at Net SqlClient Data Provider level (see the CommandTimeout property value configured for the application connection). This 'CommandTimeout' property specifies the number of seconds that a application provider should wait for result sets before sending a timeout signal. It is well known that the default is 30 seconds, therefore, it may not be good enough due to application performance problems.

Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.

In order to get it fixed, firstly, developers should check .NET code within applications. In some cases, the developers I worked with found infinite loops or slow codes which spent much time before sending SQL queries to the database engine, as a result of this, the timeout threshold was reached at application level and, obviously, the error was raised. By the way, the following error could also be related to timeout:

Unable to connect to SQL Server session database.

Having checked the application, you can increase the 'CommandTimeout' property value to 60 seconds (if it is not enough then change to 120 or 180, and also considerate using 0 (unlimited) which should be a workaround while the problem is being traced and fixed). That is all for now, let me know any remarks you may have. Thanks for reading. Stay tuned.

Thursday, 20 July 2017

Detecting excessive compilation and recompilation issues

Undoubtedly, recompilation is a big topic to reckon with, more importantly, in database environments processing data that is changing rapidly over time and compounded by ad-hoc workloads which may cause CPU bottleneck, so it is of paramount importance to detect excessive compilation and recompilation issues and address them to warrant stable performance for the queries, and in this sense, there are some tools to be used to detect these issues such as Performance Monitor, Extended Events, SQL Server Profiler Trace, DMVs, etc. When it comes to using Performance Monitor we should concentrate the efforts on analising the performance counters SQL Server: SQL Statistics: Batch Requests/sec, SQL Server: SQL Statistics: SQL Compilations/sec, and SQL Server: SQL Statistics: SQL Recompilations/sec. What's more, you can save a trace file capturing the events SP:Recompile, SQL:StmtRecompile, and CursorRecompile, then you can use the following query to see all the recompilation events:

select spid, StartTime, Textdata, EventSubclass, ObjectID, DatabaseID, SQLHandle 
from fn_trace_gettable('C:\RecompilationTrace_01.trc', 1)
where EventClass in(37,75,166) -- 37 = Sp:Recompile, 75 = CursorRecompile, 166 = SQL:StmtRecompile

Furthermore, we can also capture the showplan XML for query compile, but doing that has significant performance overhead because it is captured for each compilation or recompilation. So, just do it for a very short time as long as you see a high value for the SQL Compilations/sec counter in Performance Monitor. Once you know where the problem is you can use Engine Tuning Advisor to see whether any indexing changes improve the compile time and the execution time of the query.
Talking of DMVs to diagnose recompilation issues, looking into 'sys.dm_exec_query_optimizer_info' is very helpful, in particular, look at the Elapsed Time, which is the time elapsed due to optimizations, and also Final Cost. If you see that Time Elapsed is very close to the CPU time, you might reach the conclusion that the compilation and recompilation time is attributable to that high CPU use. Another DMV to use is 'sys.dm_exec_query_stats' whose most important columns to look at are sql_handle, total worker time, plan generation number (the number of times the query has recompiled), and statement Start Offset. Here is an example to check the top 20 most procedures that have been recompiled.

select top 20 SQLText.text, sql_handle, plan_generation_num, execution_count, dbid, objectid 
from sys.dm_exec_query_stats
   cross apply sys.dm_exec_sql_text(sql_handle) as SQLText
where plan_generation_num >1
order by plan_generation_num desc

There is a plethora of recommendations to deal with recompilation and keep the performance in optimum conditions, but for now you can take into consideration the following options:

Check to see whether the stored procedure was created with the WITH RECOMPILE option or whether the RECOMPILE query hint was used. If a procedure was created with the WITH RECOMPILE option, since SQL Server 2005, you may be able to take advantage of a statement-level RECOMPILE hint if a particular statement within that procedure needs to be recompiled. Using this hint at the statement level avoids the need of recompiling the whole procedure each time it executes, while at the same time allowing the individual statement to be compiled.
Recompilations can occur due to changes in statistics, and you can use the KEEPFIXED PLAN query hint to make recompilations occur only when there is the need to ensure correctness and not to respond to changes in statistics. For instance, in this context recompilation can only occur if the underlying table structure or its schema that is referenced by a statement changes, or if a table is marked with the sp_recompile stored procedure, all resulting in the fact that the plan no longer applies and consequently triggering the recompilation event.
Using the KEEP PLAN query hint is useful to set the recompilation threshold of temporary tables to be the same as permanent tables. Take a look at the EventSubclass column which displays 'Statistics Changed' for an operation on a temporary table.
Turning off the automatic updates of statistics for indexes and statistics that are defined on a table or indexed view prevents recompilations that are due to statistics changes on that object. It is worth noting that turning off the auto-stats option is not always a good idea. This is because the query optimizer is no longer sensitive to data changes in those objects resulting in suboptimal query plans. To be honest, I never turned off this option because I always preferred trusting in SQL Server criteria and instead opting to work on optimising queries.
Keep in mind that recompilation thresholds for temporary tables are lower than for normal tables, so if the recompilations on a temporary table are due to statistics changes, you can change the temporary tables to table variables. A change in the cardinality of a table variable does not cause a recompilation. The side effect of this approach is that the query optimizer does not keep track of a table variable's cardinality because statistics are not created or maintained on table variables. This can result in less optimal query plans, however, you can test the different options and choose the best one. Generally, temporary tables provide much better performance than tables variables when lots of data is involved.
Recompilation might also occur as a result from SET option changes, so one can diagnose it by using SQL Server Profiler to determine which SET option changed. It is highly advisable to avoid changing SET options within stored procedures and it is much better to set them at the connection level, and for the overwhelmingly majority of cases the default SET options work well. It is also very important to ensure that SET options are not changed during the lifetime of the connection.
To avoid recompilations that are due to deferred compilations, do not combine DML with DDL and do not create the DDL as a result from conditional IF statements.
To avoid recompilation and also to avoid ambiguity between objects, batches should have qualified object names, for example, dbo.Table1, User1.MySP, etc.

That is all for now, let me know any remarks you may have.

Friday, 26 May 2017

How to know which non-clustered indexes are not being used any longer

It is said that indexes are extremely useful to boost the performance of databases, especially indexes on huge tables. I am in agreement with that and I also support the idea of having the right indexes to achieve the best performance if they are created and used properly. Consequently, we have to vigilant of it, indexes may have been created to solve a particular problem in a moment, however, it does not mean that they are going to be useful forever but only for some days or months while data does not change too much. So, I should say as well that indexes are not the be-all and end-all for every business case, and we must always monitor the index usage and should consider the choose of dropping them if we verify they are not being using any longer. Keep in mind, not only could indexes be big and occupy a lot of space, but also impact on write operations negatively. Therefore, we should avoid wasting space, and having done that we can also optimise preventive indexes maintenance time.

This post will show a code to know exactly which non-clustered indexes are being used since the last restart of SQL Server engine. This code also filters out the indexes which are not related to primary keys. As you have seen in the code, we figure out the indexes by checking the value of columns user_seeks, user_scans, and user_lookups, if they are equal to zero then is says to us they were not used until this moment. Please be cautious, before using this code to get the indexes to be deleted, you should make sure that the database engine is not been restarted recently, otherwise, you will delete the wrong indexes and will impact on the performance tremendously. I strongly suggest using this code only if the latest restart of SQL Server has been done three months ago, more and less.

SELECT  i.type_desc,last_user_seek, last_user_scan, last_user_lookup,
        OBJECT_NAME(s.[object_id]) AS [Table Name] ,
        i.name AS [Index Name] ,
        i.index_id ,
        user_updates AS [Total Writes] ,
        user_seeks + user_scans + user_lookups AS [Total Reads] ,
        user_updates - ( user_seeks + user_scans + user_lookups ) AS [Difference]
, 'DROP INDEX ['+ schema_name(o.[schema_id]) +'].['+ OBJECT_NAME(s.[object_id]) +'].[' + i.name  + '];' as Statement_Drop_Index
,i.is_unique, i.is_unique_constraint, i.is_primary_key
FROM    sys.dm_db_index_usage_stats AS s WITH ( NOLOCK )
INNER JOIN sys.indexes AS i WITH ( NOLOCK )
INNER JOIN sys.objects o on o.object_id = i.object_id
    ON s.[object_id] = i.[object_id]  AND i.index_id = s.index_id
WHERE OBJECTPROPERTY(s.[object_id], 'IsUserTable') = 1
      AND s.database_id = DB_ID()     
      AND i.index_id > 1 and i.is_primary_key=0 and i.is_unique=0 and i.is_unique_constraint=0
      AND isnull(user_seeks, 0) + isnull(user_scans, 0) + isnull(user_lookups,0) =0 
ORDER BY  OBJECT_NAME(s.[object_id]), [Difference] DESC, [Total Writes] DESC,[Total Reads] ASC ;

When you execute it, you will also get a column containing the T-SQL code to delete the indexes which have not been used since the latest restart of SQL Server, and then you can execute it to delete them. To sum up, do not get me wrong, I am not against using indexes, on the contrary, I am in favour of using them properly, and if there were indexes that are not being using since a long time ago then they should be deleted. That's all for now. Let me know any remarks you may have. Thanks for reading.

Monday, 26 December 2016

How to delete large quantity of data with no growth of the Transaction Log and no blocking issue

Without a shadow of a doubt, at times while working on optimising not only database performance but also hard disk usage, we have to face with challenging tasks which need to be completed rapidly without impacting on production environment. It is well-known that deleting data from large tables could be a truly hard task to complete as the much bigger they are the much more time they will take. Consequently, this task will also block the tables much time causing performance issues and stopping systems from working. Today's post is going to show one technique to delete historic data so as to only keep recent data and boost the performance without a hitch. (This technique applies for SQL Server 2005 Enterprise Edition up to SQL Server 2014 Enterprise. SQL Server 2016 supports TRUNCATE table with partitions.)

To begin with, the technique consists in using SQL Server partition switching which allows to access and transfer subsets of data quickly and efficiently. In this way, we may need to alter the large table and make it partitioned but in this example we are going to create a new one. Now let's say there is a need to keep only data from the latest three months. Here is the structure of the table 'LogTracker' which will be partitioned shortly after. It is also included a specific column named 'MonthNumber' to save the number of the month which at the same time will be the partition number.

CREATE TABLE [dbo].[LogTracker](
    [LogID] [uniqueidentifier] NOT NULL,
    [CreateDate] [datetime]  NULL,
    [ErrorDesc] [nvarchar](MAX)  NULL,
    [User] [nvarchar](40) NULL,
    [MonthNumber]  AS DATEPART(MONTH, [CreateDate]) PERSISTED NOT NULL
 CONSTRAINT [PK_LogTracker] PRIMARY KEY CLUSTERED 
  (
    [LogID] ASC,
    [MonthNumber] ASC
  )
)

There must also be a non-partitioned table (with the same structure) to move the data from the partitioned one. This table will hold the data during a very short time and then it will be truncated. Because it does not have any foreign keys or constraints the truncate operation will be completed successfully.

CREATE TABLE [dbo].[LogTracker_Temp](
    [LogID] [uniqueidentifier] NOT NULL,
    [CreateDate] [datetime]  NULL,
    [ErrorDesc] [nvarchar](MAX)  NULL,
    [User] [nvarchar](40) NULL,
    [MonthNumber]  AS DATEPART(MONTH, [CreateDate]) PERSISTED NOT NULL
 CONSTRAINT [PK_LogTracker_Temp] PRIMARY KEY CLUSTERED 
  (
    [LogID] ASC,
    [MonthNumber] ASC
  )
)

Now it is time to create the Function and the Scheme partition we will use to partition the table 'LogTracker'. For this particular example, the data will be classified by month, which means that each partition will contain data from one month.

CREATE PARTITION FUNCTION [upf_LogTracker](INT) AS RANGE LEFT FOR VALUES (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
GO
CREATE PARTITION SCHEME [ups_LogTracker] AS PARTITION [upf_LogTracker] ALL TO ([PRIMARY])
GO

Finally, we will partition the table.

ALTER TABLE dbo.LogTracker DROP CONSTRAINT PK_LogTracker
WITH (MOVE TO ups_LogTracker([MonthNumber]))
GO
ALTER TABLE dbo.LogTracker ADD CONSTRAINT PK_LogTracker PRIMARY KEY ([LogID],[MonthNumber])
GO

Having successfully completed everything, we are able to insert some data for the purpose of this illustration. Obviously, we can insert millions of rows and the performance of deletion by using this technique will be the same, it will practically be instantaneous. There is no difference between deleting some rows or millions as they are using switching partition which is essentially a logical transfer of data between two tables.

INSERT INTO LogTracker SELECT NEWID(), '2016-06-23 13:13:56', 'Error A Desc', 'preyes'
INSERT INTO LogTracker SELECT NEWID(), '2016-07-24 04:18:12', 'Error C Desc', 'preyes'
INSERT INTO LogTracker SELECT NEWID(), '2016-08-20 01:34:57', 'Error B Desc', 'preyes'
INSERT INTO LogTracker SELECT NEWID(), '2016-09-21 08:09:16', 'Error D Desc', 'preyes'
INSERT INTO LogTracker SELECT NEWID(), '2016-11-22 05:26:45', 'Error A Desc', 'preyes'
INSERT INTO LogTracker SELECT NEWID(), '2016-11-23 15:43:56', 'Error A Desc', 'preyes'
INSERT INTO LogTracker SELECT NEWID(), '2016-12-24 23:28:12', 'Error C Desc', 'preyes'

As you can see, the data is classified by month.

SELECT $PARTITION.[upf_LogTracker](MonthNumber),* FROM LogTracker

With this code we are going to delete the historic data and keep only the most recent three months. It moves out the data of each month from 'LogTracker' to the stage table 'LogTracker_Temp' which in no time is truncated. It starts moving data from the month number one until 'the current month - 3'.

DECLARE @M INT = CASE WHEN DATEPART(MONTH, GETDATE()) - 3 >0 THEN DATEPART(MONTH, GETDATE()) - 3  ELSE 12 + DATEPART(MONTH, GETDATE()) - 3 END
DECLARE @PartitionNumberToArchive INT=1
WHILE (@PartitionNumberToArchive<=@M)
BEGIN
    ALTER TABLE dbo.LogTracker SWITCH PARTITION @PartitionNumberToArchive TO dbo.LogTracker_temp
    TRUNCATE TABLE dbo.LogTracker_temp
    UPDATE STATISTICS dbo.LogTracker WITH FULLSCAN
    SET @PartitionNumberToArchive=@PartitionNumberToArchive+1
END

The previous code will finish in one o some seconds, instantaneously.

Naturally, the code needs to be scheduled and executed via a SQL Job at the end of each month. To be perfectly honest, not only will this deletion finish in one or two seconds, but also it will not cause any impact on the performance and the Transaction Log will not grow as it does by using traditionally techniques. I hope this post is truly useful for you and you make the most out of it. Let me know any remarks you may have. That is all for now. Thanks for reading.

Wednesday, 7 December 2016

New per-operator level performance stats included in SQL2016 SP1 and SQL2014 SP2

Traditionally, we were used to getting only basic runtime performance stats per operator regarding to CPU, I/O and Memory, which were not good enough to allow us to automate recollecting tasks of them. I strongly believe that DBAs are always overwhelmingly interested in diagnosing some issues proactively so as to prevent bad-written queries from using hardware resources inefficiently. Luckily, this convoluted situation has been changed radically in SQL Server 2016 and SQL Server 2014 SP2 since Microsoft has included very useful information per operator (and also per thread in the scope of each operator) which can be seen looking into the Showplan XML. For instance, we are going to see that information for the following query (in this case it has run under a serial plan, that is one thread).

This is its actual execution plan.

It is highlighted all what is new in SQL2016, SQL2016 SP1 and SQL2014 SP2. Therefore, we can appreciate CPU and I/O runtime performance stats for the Clustered Index Scan operator on 'PurchaseOrderDetail' table such as I/O reads, CPU time and Elapsed time. In a parallel plan you will be able to see information per thread in each operator.

And now we will see not only CPU and I/O performance stats for the Hash operator but also Memory performance stats.

All this information can also be seen in the Properties window in the context of each operator inside the Actual Execution Plan by using the latest version of SSMS. If you display the detail of each statistic, the detail per thread will be seen when working with parallel execution plan.

Having seen this, not only are we now able to look into useful performance stats by using Showplan XML and SSMS, but also we can create some scripts for administration and optimization purposes in order to pick up queries with poor performance and then optimise them. That is all for now. Let me know any remark you may have. Thanks for reading.

Friday, 4 March 2016

How to split the tempdb database into more files

Naturally, as DBAs we do know is mandatory to modify data and log properties of the tempdb database. Unless we do a customized configuration of it, SQL Server will create only one data file and log file by default. The reality is that we will mostly need to create more files on production environments. There are many recommendations not only about how to create them but also the quantity of files on OLTP environments. To be perfectly honest, I do not believe that the number of data files only depends on the number of cores, but it also depends on concurrency, tempdb contention issues, the workloads on your server and, clearly, the performance of your queries. So, there is no a rule for it. Moreover, depending on the SQL Server version we are working on, we may NOT need to split the tempdb into many data files. There are some situations where it will work splendidly with only one data file since each database environment is unique and, therefore, we need to determine the best for it.

Today we are not going to discuss more details about it. I just would like to suggest splitting your tempdb database into four data files (if you have four or eight core) or eight data files (for 16, 32, 64, or more cores) and ONLY one log file. Ideally, we also need to locate them in different drives RAID1, RAID5 or RAID10. Now having very clear the situation, I will show you one small script to split your default tempdb database into 8 data files and remaining the only one log file.

USE [master]
GO
ALTER DATABASE [tempdb] MODIFY FILE (NAME=N'templog', NEWNAME= N'tempdev_Log')
GO
ALTER DATABASE [tempdb] MODIFY FILE (NAME=N'tempdev', NEWNAME=N'tempdev_Data01')
GO
ALTER DATABASE [tempdb] MODIFY FILE (NAME=N'tempdev_Log',  FILENAME= N'D:SQLTempDBtempdev_Log.ldf', SIZE = 2048MB , FILEGROWTH = 2048MB)
GO
ALTER DATABASE [tempdb] MODIFY FILE (NAME=N'tempdev_Data01',  FILENAME= N'D:SQLTempDBtempdev_Data01.mdf', SIZE = 512MB , FILEGROWTH = 1024MB )
GO
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_Data02', FILENAME = N'D:SQLTempDBtempdev_Data02.ndf' , SIZE = 512MB , FILEGROWTH = 1024MB )
GO
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_Data03', FILENAME = N'D:SQLTempDBtempdev_Data03.ndf' , SIZE = 512MB , FILEGROWTH = 1024MB )
GO
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_Data04', FILENAME = N'D:SQLTempDBtempdev_Data04.ndf' , SIZE = 512MB , FILEGROWTH = 1024MB )
GO
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_Data05', FILENAME = N'D:SQLTempDBtempdev_Data05.ndf' , SIZE = 512MB , FILEGROWTH = 1024MB )
GO
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_Data06', FILENAME = N'D:SQLTempDBtempdev_Data06.ndf' , SIZE = 512MB , FILEGROWTH = 1024MB )
GO
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_Data07', FILENAME = N'D:SQLTempDBtempdev_Data07.ndf' , SIZE = 512MB , FILEGROWTH = 1024MB )
GO
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_Data08', FILENAME = N'D:SQLTempDBtempdev_Data08.ndf' , SIZE = 512MB , FILEGROWTH = 1024MB )

After having successfully executed, we must restart the database engine in order to get the new files created and all changes done. I hope you find this post interesting. Let me know any remark if you may have. Thanks for reading!

Saturday, 13 February 2016

Looking into some concepts about the well-known tempdb database

Today's post is going to look into some concepts about the well-known tempdb database. As DBAs, we have to keep an eye on it not only at the beginning of the installation, but also as part of our monitoring tasks on a daily basis. To do so, we really need to understand how it works and what enhancements have come from SQL Server 2005 to forward. To begin with, the tempdb is the shared database per instance in SQL Server which is used to stored and manage temporary objects. It has a number of changes since SQL Server 2005, that is, there are new tempdb usages and internal optimizations enhancements. Nevertheless, the tempdb architecture is mostly unchanged since SQL Server 2000.
In a user database, the transactions have the ACID attributes: atomicity, concurrency, isolation, and durability whereas in the tempdb database the transactions lose the durability attribute which basically means that they do not persist after a SQL Server shut down event. Most of these internal operations on tempdb do not generate log records as there is no need to roll back. So, these operations are faster. Furthermore, some of the database options cannot be modified for tempdb and others are limited or restricted. Here they are:

Auto Shrink is not allowed for tempdb.
Database Shrink and File Shrink capabilities are also limited.
The database CHECKSUM option cannot be enabled.
A database snapshot cannot be created on tempdb.
DBCC CHECKALLOC and DBCC CHECKCATALOG are not supported.
Only offline checking for DBCC CHECKTABLE is performed.

Tempdb Space Usage: The following types of objects can occupy tempdb space: Internal Objects, Version Stores, and User Objects.

Internal Objects: Internal Objects metadata is stored in memory (it means that the metadata is hidden in the tempdb and it does not appear in catalog views such as 'sys.all_objects') and each of them occupies at least nine pages (one IAM page and eight data pages) in tempdb. Page Allocations and Update operations on it does not generate log records. The Internal Objects are used

to store intermediate runs for sort
to store intermediate results for hash joins and hash aggregates
to store XML variables or other large object (LOB) data type variables (text, image, ntext, varchar(max), varbinary(max), and all others)
by queries that need a spool to store intermediate results
by keyset cursors to store the keys
by static cursors to store a query result
by Service Broker (Query Notification and Event Notification) to store messages in transit
by INSTEAD OF triggers to store data for internal processing
by DBCC CHECKDB (it internally uses a query that may need to spool intermediate results)

Version Stores: They do not appear in catalog views such as 'sys.all_objects'. Here are more concepts:

Version stores are used to store row versions generated by transactions for features such as snapshot isolation, triggers, MARS (multiple active result sets), and online index build.
The online index build version store is for row versions from tables that have online index build operations on them.
The common version store is for row versions from all other tables in all databases.
The version store consists of append-only store units which are highly optimized for sequential inserts and random look up. Inserts into the version store do not generate log records.
Each unit can store many row versions. If there are versions to be stored, a new store unit is created about every minute.

In the following cases the versions of rows are generated for

SNAPSHOT isolation and read committed snapshot isolation (the versions are generated by DML operations in the database when the respective database options are enabled)
AFTER triggers (the versions are generated for all the update operations by the transaction that fired the trigger during the INSERT, DELETE, or UPDATE statement in any database, independent of database options. INSTEAD OF triggers do not generate versions)
MARS (the versions are generated by the UPDATE or DELETE statement when there is a pending SELECT on the same transaction and the same connection)
building an ONLINE index

User Objects: They do appear in catalog views such as 'sys.all_objects. Now some details about it:

The 'sp_spaceused' system stored procedure can show the size occupied by these objects.
User Objects include both user-defined tables and indexes, and system catalog tables and indexes.
Operations on User Objects in tempdb are mostly logged. Bulk copy program (BCP), bulk insert, SELECT INTO, and index rebuild operations are bulk logged.
User-defined tables include the global temporary tables such as ##t, and local temporary tables such as #t.
Local temporary tables also include table variables such as @t and the mapping index for online clustered index build with the SORT_IN_TEMPDB option.

Performance Enhancements in SQL Server 2005 or later

The tempdb logging optimization avoids logging the “after value” in certain log records in tempdb.
Instant data file initialization works by not zeroing out the NTFS file when the file is created or when the size of the file is increased.
There is less use of the UP type page latch when allocating pages and extents in tempdb. Proportional fill has been optimized to reduce UP latch contention.
Proportional fill has been optimized to reduce UP latch contention.
There is now deferred drop in tempdb.
Worktable caching is improved.
SQL Server 2005 or later caches the temporary table that is created by using a CREATE TABLE or SELECT INTO statement.

Recommendations for managing

The tempdb files must be configured with initial size and auto-growth based on your workloads. Do not let with the default sizes.
The tempdb files must be located on RAID0 (for better performance) or RAID1 (if you need have more writes than reads) or RAID5 (if you have more reads than writes). RAID10 is the best option but no all companies can justify this.
The tempdb files must be located on separated disks to avoid contention issues and improves the performance.
Tempdb database must be created with one data file per physical processor if the quantity of CPUs is eight or less. If there are more then eight CPUs then you can start off with eight data files and after increase the number of files by four in case there was PAGELATCH contentions on one of the allocation bitmap pages (including PFS pages, GAM pages and SGAM pages) until the issue is solved, if not, add four files more, and so on.
Do not forget, tempdb data files or log file should not grow so much in a short time, if this happens then something is working wrong, so you need to analyze the workloads performance and detect the query that you have to optimize.

That is all for now. Let me know any remarks you may have about tempdb database. Thanks for reading.

Thursday, 6 November 2014

Microsoft SQL Server 2014 In-Memory OLTP: How fast is it?

My latest tip was published today at MSSQLTips.com, you can read it at http://www.mssqltips.com/sqlservertip/3392/microsoft-sql-server-2014-inmemory-oltp-how-fast-is-it/ . Thanks for reading again!

Friday, 12 November 2010

SQL11 (Denali): OFFSET and FETCH (New options for ORDER BY)

It is known that ORDER BY clause has been improved in 'Denali' (code name of the upcoming version SQL Server 2012) adding more options as a result from looking for new solutions to work on specific business cases such as fetching top N rows per window of results from the result set which we used to do by combining TOP, OVER, and ORDER BY. Nevertheless, the performance of this approach was poor and not even close to being as good as OFFSET and FETCH in 'Denali'.

OFFSET keyword is used for excluding first N rows (default value is zero) whereas FETCH is used for reading/fetching M rows per each window of results starting from position (N+1). Therefore, OFFSET and FETCH allow to get only rows from position (N+1) to (N+M). I do recommend creating one Index (preferably Index Clustered) on ORDER BY columns and using these columns on WHERE clause as much as possible in order to avoid costly operations such as Index Scans or Table Scans. I have performed many tests on 10 millions of rows, and to be to honest, it worked very well. To illustrate, the following query will start fetching from position 4+1=5 (OFFSET position=4 for SQL Server) to the end of the table.

SELECT CustomerID, CompanyName, ContactName, ContactTitle, Address   
FROM dbo.Customers    
ORDER BY CustomerID 
   OFFSET (4) ROWS

Now with DESC the results will be different (excluding last 4 rows)

SELECT CustomerID, CompanyName, ContactName, ContactTitle, Address   
FROM dbo.Customers    
ORDER BY CustomerID  DESC
   OFFSET (4) ROWS

Basically, each execution of the query is done independently, which means that not only are the results sent back to the client side, but also immediately the resources are released. Here is another example how to get only the top 3 rows excluding first 4 rows (OFFSET=4), which means that it only reads the fifth, sixth, and seventh rows.

SELECT CustomerID, CompanyName, ContactName, ContactTitle, Address --/..*.../  
FROM dbo.Customers    
ORDER BY CustomerID  
   OFFSET (4) ROWS 
   FETCH NEXT 3 ROWS ONLY --  FETCH FIRST 3 ROWS ONLY
GO

Having a need to read the next three rows (eighth, ninth, and tenth rows), we have to increase OFFSET value in three.

SELECT CustomerID, CompanyName, ContactName, ContactTitle, Address --/..*.../  
FROM dbo.Customers    
ORDER BY CustomerID  
   OFFSET (7) ROWS ---ó OFFSET (7) ROW 
   FETCH NEXT 3 ROWS ONLY -- ó FETCH FIRST 3 ROWS ONLY
GO

Here the script to read dynamically each result by using variables (and optionally OPTIMIZE FOR hint to optimise this Ad-Hoc query).

DECLARE @Start INT, @Next INT
SET @Start=0 

SET @Next=3
SELECT CustomerID, CompanyName, ContactName, ContactTitle, Address --/..*.../  
FROM dbo.Customers    
ORDER BY CustomerID 
     OFFSET (@Start) ROWS    
     FETCH NEXT @Next ROWS ONLY 

 OPTION(OPTIMIZE FOR (@Start=4,@Next=5))
GO

Now the same code inside an stored procedure.

CREATE PROCEDURE dbo.PageFetch(@PageNumber INT, @PageSize INT)
AS
BEGIN
SELECT ROW_NUMBER() OVER (ORDER BY CustomerID) AS 'Nro', 
CustomerID, CompanyName, ContactName, ContactTitle, Address --/..*.../  
FROM dbo.Customers    
ORDER BY CustomerID 
     OFFSET (@PageSize * (@PageNumber -1) ) ROWS
     FETCH NEXT @PageSize ROWS ONLY 
END

Querying the results for second, third and fifth windows of results with OFFSET.

EXEC dbo.PageFetch 2,3;
GO
EXEC dbo.PageFetch 3,3;
GO
EXEC dbo.PageFetch 5,3;

Luckily, OFFSET and FETCH are supported by subqueries, functions, derived tables, but not by indexed views and using directly with TOP, OVER, INSERT, UPDATE, MERGE, o DELETE (except inside of independent queries at lower level such as subqueries). Here an illustration.

SELECT  TOP(2) ROW_NUMBER() OVER(ORDER BY CustomerID) AS 'NRO', CompanyName, ContactName, ContactTitle  FROM (
        SELECT  CustomerID, CompanyName, ContactName, ContactTitle, Address --/..*.../  
        FROM dbo.Customers    
        ORDER BY CustomerID  
           OFFSET (4) ROWS ---ó OFFSET (4) ROW 
           FETCH NEXT 3 ROWS ONLY -- ó FETCH FIRST 3 ROWS ONLY
) AS T1

Finally, using it with Views.

CREATE VIEW dbo.v1
as
SELECT CustomerID, CompanyName, ContactName, ContactTitle, Address --/..*.../  
FROM dbo.Customers  
ORDER BY CustomerID 
   OFFSET (0) ROWS ---ó OFFSET (4) ROW 
GO
SELECT CustomerID, CompanyName, ContactName FROM dbo.v1
WHERE  CustomerID LIKE 'AN%'

To sum up, OFFSET and FETCH are excellent improvements which allow to implement solutions to fetch only specific pages/windows of results from the result set with ease. Let me know any remarks you may have. Thanks for reading!

Pages