Tuesday, 20 February 2018

Looking deeper into the physical & logical architecture - Transaction Log File

Beyond all doubt, it is essential to have a good understanding of the Transaction Log (T-Log) so that we can diagnose unforeseen performance issues related to it and I am sure that almost everyone had at least one. The T-Log is basically a record of all transactions happening to the database. All these transactions are actually first written to the physical T-Log file, and then after a CHECKPOINT, is written to the Data File via the Lazy Writer process. Some of the uses of T-Log are: as a point in time recovery (full recovery model), to record the Start and End of each transaction, every data modification (insert, update, delete) including system SP's, DDL statements to any table including system tables, every extent and page allocation and de-allocation operation, and creation or drop of tables and indexes.

More specifically, SQL Server uses a Write-Ahead Log Algorithm whereby the Log Buffer Manager always guarantees to write the change descriptions (log records) to the T-Log on disk before it writes the changed data pages to the physical Data Files. Furthermore, Write Ahead Logging also allows SQL Server to ensure some of the ACID properties of database transactions, most significantly durability. It is of paramount importance to keep in mind that there is no performance benefit from having multiple physical log files for the T-Log because it is written sequentially and in a round-robin fashion. Sometimes, we can have many T-Log files for a database as long as there is no enough space on the partition disk, so we can add one more T-Log file on another partition disk so that SQL Server does not stop working until the final solution is provided.

Instead of having various T-Log files, it is highly advisable to work on preventive actions to optimise the T-Log file usage in terms of performance and availability, for instance, make sure only one Log File is used, use a dedicated drive/array for the T-Log file (use RAID 10 for log drives, if possible. RAID 1 is also acceptable), avoid Log Fragmentation (avoid frequent small auto-grow events, set 2GB, 4GB or 8GB auto-grows for example), Fragmented Log File can slow down the performance of operations that read the T-Log file such as Backup Tasks, Crash Recovery Process, Database start up, Transactional Replication, Database Mirroring, creation of a Database Snapshot, DBBC CHECKDB, DBCC LOGINFO, Change Data Capture, etc.

So, how can we control the size of the T-Log File? it depends on the database recovery model, in other words, with SIMPLE recovery model: SQL Server auto-truncates and reuse the space after committing the transaction whereas with either FULL or BULK_LOGGED recovery model, a Log Backup is the only action that can truncate the T-Log, nevertheless, you might have seen that after taking Log Backup, the Log space was not truncated yet, this is because some internal T-Log records may still be required by some other database operations like Open transaction, AlwaysOn AG, Replication, CDC, Mirroring, Backup, etc. It is well worth noting that we must try to keep VLF's as less as possible and each VLF of an acceptable size because the more you have the worse the performance will be. I recommend having VLFs of no more of 512MB per one, but it depends entirely on the use of the T-Log file and its growth settings.

Internally, SQL Server divides a Transaction Log file into a number of sections called Virtual Log Files (VLFs). By default, a T-Log will have 4 VLF's when created. After that, the next VLF's will get individually created with each auto grow or manual grow. Here is the base on what SQL Server starts creating the next VLF's:  < 64MB = 4VLF's //  >= 64MB < 1GB = 8 VLF's // > 1GB = 16VLF's. In SQL2014, 8MB <= 1VLF.

Finally, if we need to figure out how many VLF's are, we can use the command DBCC LOGINFO which is an undocumented command but is safe to run because it just reads the T-Log, and the most important columns to look for are the following: FileSize (VLF size in bytes), Status (0 Not in use, 2 In use), CreateLSN - it can be used to determine how many VLF’s were created in a log file growth operation, that may be 4, 8 or 16, and this also indicates the Log Sequence Number.
That is all for now, taking this insight into consideration will stand you in good stead for when you have to fix some internal T-Log issues. Thanks for reading. Let me know any remarks you may have. Stay tuned.

Saturday, 17 February 2018

Checking SQL Server stale statistics

Clearly, it is of paramount importance to monitor the state of statistics because they always play a significant role on the performance of the whole database. When statistics are not up to date the performance will indeed decrease tremendously over time and I am sure nobody wants to have a sluggish system. How many times did you find yourself in a big problem related to unforeseen performance issues even some days after having carried out dutifully the respective database maintenance tasks? did you wonder why rebuilding index and statistics update tasks are not being good enough? if everything was working smoothly, why did the performance start going down unexpectedly? Why are the indexes not being used as usual? do I need to create more indexes? the answer might be found by digging deeper into the stale statistics. It is worth noting that Statistics contain all the information that SQL Optimizer needs to generate the best possible execution plans for your queries. If they are not up to date then the impact might be disastrous. So, having the right indexes are not good enough when the statistics went stale.

Generally, it is advisable to update statistics if it is verified that at least 20 per cent of the data changed (as long as they have more than five hundred rows). But can we know how many rows changed for a table so that we can just update its Index Statistics? In SQL Server almost everything can be checked and this is not an exception. Today I an coming with a script to check which Index Statistics have gone stale so that we can focus our work on them and keep expected performance going as well. The script is built on the base of STATS_DATE function which retuns the date of the most recent update for statistics on a table or indexed view.

SELECT DB_NAME() AS DatabaseName,
SCHEMA_NAME(t.[schema_id]) AS SchemaName,
t.name AS TableName,
ix.name AS IndexName,
STATS_DATE(ix.id,ix.indid) AS 'StatsLastUpdate', -- using STATS_DATE function
ix.rowcnt AS 'RowCount',
ix.rowmodctr AS '#RowsChanged',
CAST((CAST(ix.rowmodctr AS DECIMAL(20,8))/CAST(ix.rowcnt AS DECIMAL(20,2)) * 100.0) AS DECIMAL(20,2)) AS '%RowsChanged'
FROM sys.sysindexes ix
INNER JOIN sys.tables t ON t.[object_id] = ix.[id]
WHERE ix.id > 100 -- excluding system object statistics
AND ix.indid > 0 -- excluding heaps or tables that do not any indexes
AND ix.rowcnt >= 500 -- only indexes with more than 500 rows
ORDER BY  [%RowsChanged] DESC
Bearing in mind there are two types of statistics: Column Statistics and Index Statistics. The first type are statistics linked to columns and created automatically as result of querying the columns, so SQL Server normally creates them unless you have AUTO_CREATE_STATISTICS database option disable, whereas the second one are statistics linked to indexes and these statistics are created at the same time when the indexes are. Likewise, Index Statistics are always updated with FULLSCAN while indexes are being rebuilt. So, there is no need to update Index Statistics explicitly after rebuilding the indexes, if so, it might be harmful because UPDATE STATISTICS command without pamareters is based on default sampling and it is not the best option for the vast majority of cases.
Talking of a little more of Column Statistics, after rebuilding indexes we just have to update Column Statistics and for this case it might be good enough to use default sampling, however, it is also advisable to use FULLSCAN if your environment is worthy of it.
Many of us may be wondering how many times the statistics need to be updated per month? the answer depends entirely on how frequently your data is being changed (updated, inserted, deleted). Updating statistics once a week might be enough for highly transactional environments whereas for others once a month would be more than good. That is all for now, I hope you find the script helpful. Le me know any remarks you may have. Stay tuned.

Tuesday, 13 February 2018

Avoid changing default ANSI database options

Not having another way of fixing some specific errors, at times some people may consider turning off any ANSI database options as a final solution. To be perfectly honest, I do recommend getting to the bottom of each problem and then fixing it at that level instead of changing default ANSI settings (unless it is just a quick fix or is truly necessary because you verified the benefits are significant). For instance, it is by no means uncommon to turn ANSI_WARRINGS off to fix the following error:

Msg 8152, Level 16, State 14, Line 5
String or binary data would be truncated.

Surprisingly, it may be misunderstood that there is no more to do. First of all, this error occurs because the new data to be inserted or updated cannnot be placed into the column due to the data size is beyond the column size. This situation may only be a consequence of a poor database design, so should it be altered the size of the column? if so, are there indexes for this column? what happens with the statistics linked to the column? many things to take into consideration if we wanted to alter the column to give a final solution to this issue, and obviously, that is why the impact on the performance is of paramount importance. By and large, this is not highly advisable turning off ANSI options as it may cause RECOMPILATION for each stored procedure where it was turned off, therefore, it may impact on the performance detrimentally and the knock-on effect might be regrettable. So, we should avoid turning off not only ANSI_WARNINGS but also the following ones which will cause the same effect.
Furthermore, recompilation may sometimes be used as a way to 'optimise' specific stored procedures or Ad-Hoc queries because of parameter sniffing issues or bad-written code, but this should be done explicitly and concientiously by using WITH RECOMPILE option,  RECOMPILE query hint, sp_recompile system stored procedure or another optimisation strategy. That is all for now, let me know any remarks you may have. Thanks for reading. Stay tuned.

Friday, 2 February 2018

Some common reasons why the transaction log cannot be reused automatically

Managing SQL Server databases may sometimes be challenging especially when it comes to dealing with transaction log file internal space usage. More specifically, today I am going to talk about SQL Server wait types that may prevent SQL Server from automatically reusing transaction log internal space and therefore resulting in running out of space and affecting detrimentally on the availability of the database. For instance, if the recovery model of the database is either Full or Bulk-Logged then you may see a LOG_BACKUP wait type while the next backup log is waited to be run. Consequently, if those databases do not have Log Backups tasks, their transaction log files will be growing without control and when they take all the disk space available the databases will stop working until the internal space of the transaction logs is truncated by executing log backups manually.
Furthermore, even using SIMPLE recovery model the transaction log file might not be truncated automatically because of open transactions, that is why it is essential to execute a COMMIT explicitly and not to use IMPLICIT transactions. Being cognisant of this fact, I am sold on the idea of having total control of the scope of transactions so as to avoiding these issues. Another reason why the transaction log file internal space may not be reused is due to CHECKPOINT wait type when the database has AUTOMATIC CHECKPOINT disable. In this case it is needed to execute a CHECKPOINT command manually. I heartedly recommend not disabling AUTOMATIC CHECKPOINT for any database. 
Additionally, with the arrival of SQL Server AlwaysOn, the HADR_SYNC_COMMIT wait type arrived as well. This wait type will be found while an AlwaysOn Availability Group secondary replica is trying to apply or is applying transaction log records of this database to a corresponding secondary databases. However, this wait type might cause a major problem if it is unusually long because the AlwaysOn Availability Group is not working properly as a result of either network issues or at last one secondary replica is slow in log hardening. Thus, it is of paramount importance to ensure Always On Availability Group as a whole is working smoothly and as expected, otherwise there will be no way to truncate the transaction log file and it will indeed risk the availability of the database. Using this code, we can check the wait types for all databases and particularly the HADR_SYNC_COMMIT wait type:

select session_id, status,command,blocking_session_id, wait_type, wait_time, last_wait_type 
from sys.dm_exec_requests where session_id>=50
select wait_type, waiting_tasks_count, wait_time_ms
from sys.dm_os_wait_stats 
and wait_type = 'HADR_SYNC_COMMIT'

Here I share with you the complete list of wait types available in many modern SQL Server engines:
0 = Nothing
1 = Checkpoint (When a database uses a recovery model and has a memory-optimized data filegroup, you should expect to see the log_reuse_wait column indicate checkpoint or xtp_checkpoint.) Applies to SQL Server 2008 through SQL Server 2017
2 = Log Backup. Applies to SQL Server 2008 through SQL Server 2017
3 = Active backup or restore. Applies to SQL Server 2008 through SQL Server 2017
4 = Active transaction Applies to SQL Server 2008 through SQL Server 2017
5 = Database mirroring. Applies to SQL Server 2008 through SQL Server 2017
6 = Replication. Applies to SQL Server 2008 through SQL Server 2017
7 = Database snapshot creation. Applies to SQL Server 2008 through SQL Server 2017
8 = Log scan Applies to
9 = An Always On Availability Groups secondary replica is applying transaction log records of this database to a corresponding secondary database. Applies to SQL Server 2012 through SQL Server 2017. In earlier versions of SQL Server, 9 = Other (Transient).
10 = For internal use only Applies to SQL Server 2012 through SQL Server 2017
11 = For internal use only Applies to SQL Server 2012 through SQL Server 2017
12 = For internal use only Applies to SQL Server 2012 through SQL Server 2017
13 = Oldest page Applies to SQL Server 2012 through SQL Server 2017
14 = Other Applies to SQL Server 2012 through SQL Server 2017
16 = XTP_CHECKPOINT (When a database uses a recovery model and has a memory-optimized data filegroup, you should expect to see the log_reuse_wait column indicate checkpoint or xtp_checkpoint.) Applies to SQL Server 2014 through SQL Server 2017
More info: https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-databases-transact-sql

That is all for now, let me know any remarks you may have. Stay tuned.