Friday, 30 September 2016

Creating alerts for monitoring proactively SQL Server AlwaysOn Availability Groups

When it comes to monitoring databases we have to do it proactively by using alerts implemented natively or via T-SQL code inside SQL Jobs running recurrently (I know we do have the AlwaysOn Dashboard, but personally I do not have time to check it all the time). As a result of this situation, so rapidly do we have databases working on production environment that we may need to have alerts working as well. In this post I am going to show how to have the most important AlwaysOn Availability Group alerts implemented. To begin with, I would basically suggest creating alerts only for these error messages.

Undoubtedly, we may need more alerts. Therefore, we will need to look into ‘sys.messages’ system object and look for error messages closely related to AlwaysOn Availability Group feature. Only after identifying them will we be able to create particular alerts taking the ‘message_id’ number.  To illustrate, I will query ‘sys.messages’ by using keywords like ‘Availability’, ‘suspended’, ‘failed’, ‘disconnected’, etc.
SELECT * FROM sys.messages  
WHERE (  [text] LIKE '%Availability%Disconnected%' OR   [text] LIKE '%Availability%not%synchroni%'
OR   [text] LIKE  '%Availability%suspended%'    OR   [text] LIKE '%Availability%failed%' OR   [text] LIKE '%Availability%chang%role%'
OR   [text] LIKE '%Availability%resumed%' )  AND language_id=1033

As you have seen, they are about 63 messages (depending on the SQL Server version you are working on). I do believe that these messages are the be-all and end-all so we might not need to implement additional alerts. Put differently, they are enough. Nevertheless, we might filter out some messages we do not consider important and include others to monitor AlwaysOn Failover Cluster instead.
SELECT * FROM sys.messages  
WHERE (  [text] LIKE '%Availabiliy%replica%' OR   [text] LIKE '%Database%replica%' OR   [text] LIKE '%primary%role%'
    OR   [text] LIKE '%secondry%role%' OR   [text] LIKE '%availability%group%' OR   [text] LIKE '%WSFC%'
    OR   [text] LIKE '%primary%replica%' OR   [text] LIKE '%secondary%replica%' OR   [text] LIKE '%alwayson%')
    AND    [text] NOT LIKE '%No user action is required.%'
    AND severity<>10
    AND language_id=1033
After having got the error number, the next step is to create the alert for the error by using ‘sp_add_alert’ system stored procedure which accepts basic parameters like the name for the alert and ‘message_id’ (taken from ‘sys.messages’). The rest of parameters will be the same for each alert. So, you can create the other alerts by replacing the name and ‘message_id’, for instance, the following example is based on the error number 41418. (We also create the notification to send an email to us when the alert is raised. Obviously, not until having successfully configured SQL Agent service to use Database Mail will you be able to receive the email.)

USE [msdb]
EXEC msdb.dbo.sp_add_alert @name=N'41418 HADR Alert', 
        @notification_message=N'Contact your Database Administrator urgently.', 
EXEC msdb.dbo.sp_add_notification @alert_name=N'41418 HADR Alert', @operator_name=N'DBA', @notification_method = 1
To try out to see whether or not the alert is working, we will need to raise the error manually via RAISERROR command with ‘LOG’ parameter (to see the message logged in the SQL Server error log).

Having done that, we can see the following message in error log:

Likewise, you will get the email.
To sum up, I am convinced that these sorts of alert will always work for monitoring Availability Groups and others. You just need to spend some time on figuring out about the error messages and then go ahead and create the right alert for you. I hope you enjoy this post and let me know any remark you may have. Thanks for reading again.

Thursday, 29 September 2016

Using ‘sp_browsereplcmds’ to diagnose SQL Server Replication issues

While diagnosing SQL Server transactional replications issues, we may need to look into some pending commands inside the Distribution database. In other words, not only do we have to monitor pending commands, but also take some actions so as to keep working the replication. For instance, at times we may have to kill some specific commands because of errors which do not allow other commands to be moved to subscribers. Before doing that, we firstly need to identify which commands have to be removed from the queue by using ‘sp_browsereplcmds’ system stored procedure. This basically accepts many input parameters like article_id:


After executing it, we are going to filter only the pending commands for the Article in question. (Remember that an article in replication is directly related to a table. You can query ‘sysarticles’ system table inside the published database.)

Another parameter we can use in order to get more specific information is the transaction sequence number which is essentially the identifier for the transaction. Luckily, when reading some errors, we can see the sequence number and command ID which allow us to identify exactly the root cause we need to work on with ease.

EXEC SP_BROWSEREPLCMDS @xact_seqno_start = '0x00000027000000B50008',@xact_seqno_end = '0x00000027000000B50008' 

There are other parameters like command ID to get only the command we need to look into, and also the database ID to get all commands for that database.

EXEC SP_BROWSEREPLCMDS @xact_seqno_start = '0x00000027000000B50008',@xact_seqno_end = '0x00000027000000B50008' , @publisher_database_id = 33, @article_id = 1,@command_id= 1

Be cautious, do not execute ‘sp_browsereplcmds’ without any parameter on production database environments as they can have millions of commands inside Distribution database and as a result of this we will not get what we need rapidly and at the same time we will affect the database server performance. I hope you can find this post interesting when it comes to troubleshooting replication issues. Let me know any remark you may have. Thanks for reading.