Percy Reyes

Tuesday 20 March 2018

Configuring Read-Only Routing and load-balancing across Read-Only replicas

With the arrival of AlwaysOn Availability Group in SQL Server 2012, implementing HA+DR solutions have been an easier and not expensive task in comparison to legacy architectures such as Database Mirroring for HA and Log Shipping for DR, and FCI for HA and Database Mirroring for DR. Nevertheless, at the beginning not everyone has been fully aware of all the power of this technology so that some might not have made the most out of it. Naturally, this technology has been improved over the years, for instance, load-balancing across readable secondary replicas was added, and today in this post, I am coming with a script to configure it.

Looking deeper into the physical & logical architecture - Transaction Log File

Beyond all doubt, it is essential to have a good understanding of the Transaction Log (T-Log) so that we can diagnose unforeseen performance issues related to it and I am sure that almost everyone had at least one. The T-Log is basically a record of all transactions happening to the database. All these transactions are actually first written to the physical T-Log file, and then after a CHECKPOINT, is written to the Data File via the Lazy Writer process. Some of the uses of T-Log are: as a point in time recovery (full recovery model), to record the Start and End of each transaction, every data modification (insert, update, delete) including system SP's, DDL statements to any table including system tables, every extent and page allocation and de-allocation operation, and creation or drop of tables and indexes.

Checking SQL Server stale statistics

Clearly, it is of paramount importance to monitor the state of statistics because they always play a significant role in the performance of the whole database. When statistics are not up to date the performance will indeed decrease tremendously over time and I am sure nobody wants to have a sluggish system. So we possibly have the following questions:

How many times did you find yourself in a big problem related to unforeseen performance issues even some days after having carried out dutifully the respective database maintenance tasks?
Did you wonder why rebuilding index and statistics update tasks are not good enough?
If everything was working smoothly, why did the performance start going down unexpectedly?
Why are the indexes not being used as usual? do I need to create more indexes?

The answer to all these questions might be found by digging deeper into the stale statistics. It is worth noting that Statistics contain all the information that SQL Optimizer needs to generate the best possible execution plans for your queries. If they are not up to date then the impact might be disastrous. So, having the right indexes is not good enough when the statistics went stale.

Avoid changing default ANSI database options

Not having another way of fixing some specific errors, at times some people may consider turning off any ANSI database options as a final solution. To be perfectly honest, I do recommend getting to the bottom of each problem and then fixing it at that level instead of changing default ANSI settings (unless it is just a quick fix or is truly necessary because you verified the benefits are significant). For instance, it is by no means uncommon to turn ANSI_WARRINGS off to fix the following error:

Msg 8152, Level 16, State 14, Line 5
String or binary data would be truncated.

Some common reasons why the transaction log cannot be reused automatically

Managing SQL Server databases may sometimes be challenging especially when it comes to dealing with transaction log file internal space usage. More specifically, today I am going to talk about SQL Server wait types that may prevent SQL Server from automatically reusing transaction log internal space and therefore resulting in running out of space and affecting detrimentally on the availability of the database. For instance, if the recovery model of the database is either Full or Bulk-Logged then you may see a LOG_BACKUP wait type while the next backup log is waited to be run. Consequently, if those databases do not have Log Backups tasks, their transaction log files will be growing without control and when they take all the disk space available the databases will stop working until the internal space of the transaction logs is truncated by executing log backups manually.
Furthermore, even using SIMPLE recovery model the transaction log file might not be truncated automatically because of open transactions, that is why it is essential to execute a COMMIT explicitly and not to use IMPLICIT transactions. Being cognisant of this fact, I am sold on the idea of having total control of the scope of transactions so as to avoiding these issues. Another reason why the transaction log file internal space may not be reused is due to CHECKPOINT wait type when the database has AUTOMATIC CHECKPOINT disable. In this case it is needed to execute a CHECKPOINT command manually. I heartedly recommend not disabling AUTOMATIC CHECKPOINT for any database.
Additionally, with the arrival of SQL Server AlwaysOn, the HADR_SYNC_COMMIT wait type arrived as well. This wait type will be found while an AlwaysOn Availability Group secondary replica is trying to apply or is applying transaction log records of this database to a corresponding secondary databases. However, this wait type might cause a major problem if it is unusually long because the AlwaysOn Availability Group is not working properly as a result of either network issues or at last one secondary replica is slow in log hardening. Thus, it is of paramount importance to ensure Always On Availability Group as a whole is working smoothly and as expected, otherwise there will be no way to truncate the transaction log file and it will indeed risk the availability of the database. Using this code, we can check the wait types for all databases and particularly the HADR_SYNC_COMMIT wait type:

select session_id, status,command,blocking_session_id, wait_type, wait_time, last_wait_type 
from sys.dm_exec_requests where session_id>=50
Go
select wait_type, waiting_tasks_count, wait_time_ms
from sys.dm_os_wait_stats 
and wait_type = 'HADR_SYNC_COMMIT'

Here I share with you the complete list of wait types available in many modern SQL Server engines:
0 = Nothing
1 = Checkpoint (When a database uses a recovery model and has a memory-optimized data filegroup, you should expect to see the log_reuse_wait column indicate checkpoint or xtp_checkpoint.) Applies to SQL Server 2008 through SQL Server 2017
2 = Log Backup. Applies to SQL Server 2008 through SQL Server 2017
3 = Active backup or restore. Applies to SQL Server 2008 through SQL Server 2017
4 = Active transaction Applies to SQL Server 2008 through SQL Server 2017
5 = Database mirroring. Applies to SQL Server 2008 through SQL Server 2017
6 = Replication. Applies to SQL Server 2008 through SQL Server 2017
7 = Database snapshot creation. Applies to SQL Server 2008 through SQL Server 2017
8 = Log scan Applies to
9 = An Always On Availability Groups secondary replica is applying transaction log records of this database to a corresponding secondary database. Applies to SQL Server 2012 through SQL Server 2017. In earlier versions of SQL Server, 9 = Other (Transient).
10 = For internal use only Applies to SQL Server 2012 through SQL Server 2017
11 = For internal use only Applies to SQL Server 2012 through SQL Server 2017
12 = For internal use only Applies to SQL Server 2012 through SQL Server 2017
13 = Oldest page Applies to SQL Server 2012 through SQL Server 2017
14 = Other Applies to SQL Server 2012 through SQL Server 2017
16 = XTP_CHECKPOINT (When a database uses a recovery model and has a memory-optimized data filegroup, you should expect to see the log_reuse_wait column indicate checkpoint or xtp_checkpoint.) Applies to SQL Server 2014 through SQL Server 2017
More info: https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-databases-transact-sql

That is all for now, let me know any remarks you may have. Stay tuned.

Sunday 21 January 2018

How to make uniform all the collations of table columns for all databases

Clearly, it is of paramount importance to standardise the collations for all databases in a SQL Server instance in order to avoid dealing with unforeseen conflicts of page code compatibility. We may find ourselves in complicated situations because databases with different collations were migrated from other environments and the new consolidated environment was not prepared to host those new databases, but in one way or another we may need to consolidate them in only one server. It may be compounded by the fact that there may be many character columns of a database using different collations and another collation at database level. What’s more, the tempdb database may be using another different collation. So, it may turn out to be not only a complex issue but also time-consuming.
To give you just an example, we can start finding out what character columns are using different collations from SQL_Latin1_General_CP1_CI_AS that we need to change in order to make everything uniform. I am going to display a useful script to do it. In this example I am assuming that we want to use SQL_Latin1_General_CP1_CI_AS for all objects in the database server.

EXEC sp_MSforeachdb '
USE [?]
select db_name(),c.name
from sys.columns c
inner join sys.types t on t.user_type_id= c.user_type_id
inner join sys.tables tb on  c.object_id=tb.object_id
where c.collation_name is not null 
and t.is_user_defined=0 and tb.is_ms_shipped=0 and tb.name<>''sysdiagrams''
and c.collation_name<>''SQL_Latin1_General_CP1_CI_AS''
order by tb.name, c.column_id'

After that,we can make the decision of changing all character columns for all databases and use SQL_Latin1_General_CP1_CI_AS, and then make that change at database level. Be cautious and make sure as well you have tempdb’s collation set SQL_Latin1_General_CP1_CI_AS.

EXEC sp_MSforeachdb '
USE [?]
if db_name() not in (''master'',''tempdb'',''msdb'',''model'')
begin
    select replace( REPLACE( ''ALTER TABLE '' + QUOTENAME(SCHEMA_NAME(tb.schema_id)) + ''.'' 
    + QUOTENAME(tb.name) + '' ALTER COLUMN '' + QUOTENAME(c.name) +  '' '' 
    + QUOTENAME(t.name) + ''('' + CAST( case when T.NAME=''NVARCHAR'' THEN  c.max_length/2 
      WHEN  T.NAME=''NCHAR'' THEN  c.max_length/2 ELSE c.max_length  END  AS VARCHAR(10)) +'')''  
    + '' COLLATE SQL_Latin1_General_CP1_CI_AS'' + CASE WHEN c.is_nullable =1 THEN '' NULL '' 
     else '' NOT NULL ;'' END, ''-1'', ''MAX'' ), ''[text](16)'', ''[varchar](max)'') as cmd
    INTO #TblTMP
    from sys.columns c
    inner join sys.types t on t.user_type_id= c.user_type_id
    inner join sys.tables tb on  c.object_id=tb.object_id
    where c.collation_name is not null 
    and t.is_user_defined=0 and tb.is_ms_shipped=0 and tb.name<>''sysdiagrams''
    and c.collation_name<>''SQL_Latin1_General_CP1_CI_AS''
    order by tb.name, c.column_id
    
  declare @cmd varchar(max)
  declare c_cmd cursor for 
     select cmd from  #TblTMP
  open c_cmd
  fetch next from c_cmd into @cmd
  while (@@fetch_status=0)
  begin    
    exec( @cmd)
    fetch next from c_cmd into @cmd
  end
  close c_cmd
  deallocate c_cmd
  drop table #TblTMP
end'

It is worth noting that while running the script above some errors may arise because of some indexes might be using one of the columns we are trying to alter. So, in this likely event, it is recommendable to drop those indexes and then run the script again. That is all for the time being. Let me know any remarks you may have.

Friday 12 January 2018

MSSQL_ENG003165: An error was encountered while replication was being restored/removed. The database has been left offline

While restoring a replicated database without KEEP_REPLICATION option, SQL Server will remove replication settings by executing sp_restoredbreplication at the end of the process. The 'sp_restoredbreplication' system stored procedure will delete all replication metadata, that is, deletion of 'tr_MStran_alterschemaonly', 'tr_MStran_altertable', 'tr_MStran_altertrigger' and 'tr_MStran_alterview' tiggers (which were created to validate alterations on the replication of tables, triggers, views), disable user tables for replication, and deletion of subscription/publications/articles. Nevertheless, there might be some cases where 'sp_restoredbreplication' cannot be executed successfully and ends up leaving the database OFFLINE. I personally experienced that case and the error was something like this:

Msg 3165, Level 16, State 1, Line 1
Database ‘MyDB’ was restored, however an error was encountered while replication was being restored/removed. The database has been left offline. See the topic MSSQL_ENG003165 in SQL Server Books Online.
Msg 3167, Level 16, State 1, Line 1
RESTORE could not start database ‘MyDB’.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.

Looking into this case, I could see that the cause was a DDL database trigger which existed inside the database. Let me expand on what I am saying. The database had that trigger to audit some schema changes which were supposed to save into an auditing table. Unfortunately, that auditing table did not exist in the server where the database was being restored, and the deletion of objects of replication settings were not completed, which means that 'sp_restoredbreplication' was not executed correctly. Consequently, the restoration was stopped and SQL Server decided to leave the database OFFLINE.

In order to restore a copy of this database, we need to disable all DDL database triggers before taking its backup. Only then will the database be restored successfully. The other method to deal with this issue is to change the status to ONLINE manually after the restoration finishes unsuccessfully and also execute 'sp_restoredbreplication'.

To sum up, we need to proceed with more cautiousness while working with databases linked to replication. That is all for now. Let me know any remarks you may have. Thanks for reading. Stay tuned.

Thursday 4 January 2018

Table-valued user-defined functions and the database collation

When it comes to altering database collations we may face some problems that tend to slow us down at the beginning. Nevertheless, digging into the message errors we might not spot the causes easily. For instance, while executing the following script to change the collation at database level an error may arise informing that some objects depends on it and it is not possible to make that change. Here is the script.

ALTER DATABASE MyDBUser SET RESTRICTED_USER WITH ROLLBACK IMMEDIATE; 
ALTER DATABASE MyDBUser COLLATE SQL_Latin1_General_CP1_CI_AS;  
ALTER DATABASE MyDBUser SET MULTI_USER WITH ROLLBACK IMMEDIATE;

As far as we know collations are heavily linked to character columns, and it includes columns of table-valued user-defined functions as they may have character columns on its definition. When theses functions are created they inherit the database collation by design for their columns. Here the error message:

Msg 5075, Level 16, State 1, Line 1
The object 'TVFUserTable' is dependent on database collation. The database collation cannot be changed if a schema-bound object depends on it. Remove the dependencies on the database collation and then retry the operation.
Msg 5072, Level 16, State 1, Line 1
ALTER DATABASE failed. The default collation of database 'MyDBUser' cannot be set to SQL_Latin1_General_CP1_CI_AS.

What we just need to do to be able to change the collation at database level is firstly drop every schema-bound objects, then make the change and finally create the objects again. That is all for now. Let me know any remarks you may have.