Showing posts with label SQL Internals. Show all posts
Showing posts with label SQL Internals. Show all posts

Monday 21 August 2017

Dealing with physical database corruptions

Beyond all doubt, not every single database in the world has a DBA dedicated to monitoring it 24x7 hours. Moreover, many database environments are unwittingly implemented/installed where there was no consideration to meet the basic software and hardware minimum requirements. This situation is compounded by the fact that many companies do not put much attention on databases from the beginning, maybe because at first the database are quite small and serving not many transactions per second. Unsurprisingly, all databases are becoming bigger and bigger with the passing of time so it is not rocket science to foresee that everything will get worse in terms of performance and physical integrity. In this new context, it is much more critical to have everything in place so as to prevent databases from getting damaged. However, many times it is too late when one realises that the database got damaged because of poor implementation. The overwhelmingly majority of physical corruption issues are not sparked by SQL Server on its own, but poor hardware implementation. Thus, if you find yourself working with databases struggling this problem, I wholeheartedly recommend reinstalling the whole server from scratch taking minimum requirements on board.
After having a good implementation of a database server, it is of paramount importance to carry out maintenance tasks at least once a month, and it should include executing full DBCC CHECKDB, and if possible DBCC CHECKDB WITH PHYSICAL_ONLY option once a week. It is also understandable that DBCC CHECKDB makes intensive use of resources, especially CPU and Disk, nevertheless it is possible to use MAXDOP option with DBCC CHECKDB to limit CPU usage and ease the pain. So, nowadays there is no feasible excuse to avoid executing it regularly.

Having said that, I would like to share with you some steps to follow in order to try to fix a physical database corruption:

1. To begin with, try to back your database up.
2. Set your database to SINGLE_USER mode, and then execute DBCC CHECKDB WITH PHYSICAL_ONLY to determine whether physical corruption exists or not.

ALTER DATABASE MyEnterpriseDB SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
GO
DBCC CHECKDB(MyEnterpriseDB) WITH PHYSICAL_ONLY 

If there is physical corruption, SQL Server will display some errors like these:

Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown),
page ID (1:219356) contains an incorrect page ID in its page header.
The PageId in the page header = (0:0).
Msg 8928, Level 16, State 1, Line 1
Object ID 501576825, index ID 1, partition ID 72057594040549376,
alloc unit ID 72057594044940288 (type In-row data): Page (1:1408252) could not be processed.

3. Sometimes only non-clustered indexes are damaged, and luckily in this case, you just need to recreate those indexes. If you want to figure out which indexes are damaged so that you can recreate them, you can have a look at this article I wrote many years ago.
4. However, if the clustered index or heap structure is damaged then you may need set the database to EMERGENCY mode and execute DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS option which may not always be the best option for bringing a database to a physically consistent state but when the clustered index or heap structure of a table is heavily damaged there is no other option, just do it at the cost of some data loss. It is also worth noting that if you want to find out which objects are damaged so that only execute DBCC CHECKDB repair process on them then you can also check out the same article. Here is this example, we are going to execute on the whole database.

ALTER DATABASE MyEnterpriseDB SET EMERGENCY;
GO
DBCC CHECKDB (MyEnterpriseDB, REPAIR_ALLOW_DATA_LOSS) WITH PHYSICAL_ONLY 
GO
ALTER DATABASE MyEnterpriseDB SET ONLINE WITH ROLLBACK IMMEDIATE;
GO
ALTER DATABASE MyEnterpriseDB SET MULTI_USER WITH ROLLBACK IMMEDIATE;

5. Having executed DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS option does not always ensure that the database will be repaired, however, many times it is enough. So, after getting your database repaired, try to go to bottom of the problem, and fix the problem at hardware level because it is most likely that disk storage is not working properly. Bearing in mind that database corruption issue is just the tip of the iceberg so consider DBCC CHECKDB as a quick fix, not a final solution.
6. Finally, you should take a full database backup.

That is all for now, I hope you find this post helpful. Let me know any remarks you may have. Stay tuned.

Friday 2 June 2017

sys.dm_exec_requests: total_elapsed_time column might return inconsistent data

Developing useful scripts for administration purposes, I have found a bug with respect to data recollected by 'sys.dm_exec_requests' dynamic management view on SQL Server 2008 R2 SP2 while I was analysing the total elapsed time for a particular SQL query. Let me expand on what I am saying. I did detect an error in the 17th second of the execution with 'session_id' equal to 63 (see it in the picture). Following the sequence of each result in the execution of queries, I need to draw your attention to the second query and its second column where the total elapsed time according to 'sys.dm_exec_requests' should be 17 and not 938 seconds since the previous one was 16. Now, checking the value of the third column (calculated by subtracting the 'start_time' value from GETDATE function), you will verify that this time is accurate whereas the second one is false. The value for the second and third columns should be the same but they are different.



To sum up, for this particular and real case, the column 'total_elapsed_time' returns inconsistent information about elapsed execution time of a process when it exceeds 16 seconds. Despite of the fact that I have not seen the same issue in other versions like SQL Server 2012/2014/2016/2017, it is better not to trust in DMVs so much. Therefore, I suggest working with caution. That is all for now, let me know any remark you may have. Stay tuned.

Wednesday 5 April 2017

Looking into Modern_Spanish and Latin1_General

The collations Modern_Spanish and Latin1_General are both Windows collations and support the same character set (Code Page 1252), but there are some differences between both collations related to the treatment of some characters in terms of sorting/doing comparisons. Some of the differences are noticeable especially when Accent-Insensitive is used, in Modern_Spanish n and ñ are considered different characters whereas in Latin1_General they are considered a character and an accented version of the same character.

Today I am going to show off an example to make it clear so that you can take it on board to pre-empt unforeseen results or conflict of collations. This example is based on Modern_Spanish_CI_AI and Latin1_General_CI_AI.

create table TempTable (
   Text_in_Latin1  varchar(100) collate Latin1_General_CI_AI,
   Text_in_Modern varchar(100) collate Modern_Spanish_CI_AI
)
go
insert into TempTable values ('Español', 'Español')
insert into TempTable values ('Espanol', 'Espanol')
insert into TempTable values ('Espanól', 'Espanól')
go
select Text_in_Latin1 from TempTable where Text_in_Latin1='Espanol'
go
select Text_in_Modern from TempTable where Text_in_Modern='Espanol'
go
drop table TempTable

Text_in_Latin1
--------------------------------------
Español
Espanol
Espanól

Text_in_Modern
--------------------------------------
Espanol
Espanól

As you can see, the first result demonstrates that Español, Espanol and Espanól are treated as the same word when we use Accent-Insensitive whereas in the second one Español and Espanol are treated as different words. It is worth noting that you only must change the collation if you really understand the behaviour change. For example, one might think that Latin1_General_CI_AS (Windows Collation) and SQL_Latin1_General_CP1_CI_AS (SQL Collation) are the same, but they are not the same, there is a slight difference in the short behaviour between both for specific special characters like '-'. Here another example about it.

create table TempTable (
   Text_in_Latin1  varchar(100) collate Latin1_General_CI_AS,
   Text_in_SQL_Latin1 varchar(100) collate SQL_Latin1_General_CP1_CI_AS
)
go
insert into TempTable values ('es-pe', 'es-pe')
insert into TempTable values ('espe', 'espe')
go
select Text_in_Latin1 from TempTable order by Text_in_Latin1
go
select Text_in_SQL_Latin1 from TempTable order by Text_in_SQL_Latin1
go
drop table TempTable

Text_in_Latin1
---------------------------------------
espe
es-pe

Text_in_SQL_Latin1
---------------------------------------
es-pe
espe

To be honest, I personally like having the same collation for all databases, but it has to be done carefully and, more importantly, if you really want that change. That is all for now, let me know any remarks you may have.

Monday 15 February 2016

Transactional Replication and Change Data Capture: The Log Reader Agent Conflict

Behind close doors of SQL Server, the following issue may be raised when Transactional Replication and Change Data Capture (CDC) are deployed and running together in the same database server, and because something was done incorrectly managing CDC jobs. We do know that two SQL Jobs are created for the CDC process when CDC is deployed which are 'cdc.MyDB_capture' and 'cdc.MyDB_cleanup'.
Looking into the first one, I would like to say that the 'cdc.MyDB_capture' job executes 'sys.sp_MScdc_capture_job' system stored procedure and it invokes 'sp_cdc_scan' to read internally the Transaction Log and capture the changes done in the database via the Log Reader Agent (created initially for Transactional Replication purposes). In other words, the 'cdc.MyDB_capture' job is the agent of CDC process which reads the Transaction Log by using the Log Reader Agent. Therefore, Transaction Replication and CDC running for the same database cannot use the same Log Read Agent at the same time. Otherwise, we will get this error:

The capture job cannot be used by Change Data Capture to extract changes from the log when transactional replication is also enabled on the same database. When Change Data Capture and transactional replication are both enabled on a database, use the logreader agent to extract the log changes.

The error message is really clear. Put differently, it is not possible that two Log Reader Agent instances are running on your database at the same time. When transactional replication is configured then the cdc.MyDB_capture job is (or should have been) dropped automatically and, if you uninstall Transactional Replication then cdc.MyDB_capture job is created again. To be perfectly honest, this behaviour is because Transactional Replication has the highest priority to use the Log Agent Reader. So, if you have transactional replication running for your database and cdc.MyDB_capture job is still enabled (and running) then you will have to disable or drop it manually since it will be failing and raising the error above. Thanks for reading.

Saturday 13 February 2016

Looking into some concepts about the well-known tempdb database

Today's post is going to look into some concepts about the well-known tempdb database. As DBAs, we have to keep an eye on it not only at the beginning of the installation, but also as part of our monitoring tasks on a daily basis. To do so, we really need to understand how it works and what enhancements have come from SQL Server 2005 to forward. To begin with, the tempdb is the shared database per instance in SQL Server which is used to stored and manage temporary objects. It has a number of changes since SQL Server 2005, that is, there are new tempdb usages and internal optimizations enhancements. Nevertheless, the tempdb architecture is mostly unchanged since SQL Server 2000. 
In a user database, the transactions have the ACID attributes: atomicity, concurrency, isolation, and durability whereas in the tempdb database the transactions lose the durability attribute which basically means that they do not persist after a SQL Server shut down event. Most of these internal operations on tempdb do not generate log records as there is no need to roll back. So, these operations are faster. Furthermore, some of the database options cannot be modified for tempdb and others are limited or restricted. Here they are:
  • Auto Shrink is not allowed for tempdb.
  • Database Shrink and File Shrink capabilities are also limited.
  • The database CHECKSUM option cannot be enabled. 
  • A database snapshot cannot be created on tempdb. 
  • DBCC CHECKALLOC and DBCC CHECKCATALOG are not supported. 
  • Only offline checking for DBCC CHECKTABLE is performed.

Tempdb Space Usage: The following types of objects can occupy tempdb space: Internal Objects, Version Stores, and User Objects.

Internal Objects: Internal Objects metadata is stored in memory (it means that the metadata is hidden in the tempdb and it does not appear in catalog views such as 'sys.all_objects') and each of them occupies at least nine pages (one IAM page and eight data pages) in tempdb. Page Allocations and Update operations on it does not generate log records. The Internal Objects are used
  • to store intermediate runs for sort
  • to store intermediate results for hash joins and hash aggregates
  • to store XML variables or other large object (LOB) data type variables (text, image, ntext, varchar(max), varbinary(max), and all others)
  • by queries that need a spool to store intermediate results
  • by keyset cursors to store the keys
  • by static cursors to store a query result
  • by Service Broker (Query Notification and Event Notification) to store messages in transit
  • by INSTEAD OF triggers to store data for internal processing
  • by DBCC CHECKDB (it internally uses a query that may need to spool intermediate results)
Version Stores: They do not appear in catalog views such as 'sys.all_objects'. Here are more concepts:
  • Version stores are used to store row versions generated by transactions for features such as snapshot isolation, triggers, MARS (multiple active result sets), and online index build.
  • The online index build version store is for row versions from tables that have online index build operations on them.
  • The common version store is for row versions from all other tables in all databases.
  • The version store consists of append-only store units which are highly optimized for sequential inserts and random look up. Inserts into the version store do not generate log records.
  • Each unit can store many row versions. If there are versions to be stored, a new store unit is created about every minute.
In the following cases the versions of rows are generated for
  • SNAPSHOT isolation and read committed snapshot isolation (the versions are generated by DML operations in the database when the respective database options are enabled)
  • AFTER triggers (the versions are generated for all the update operations by the transaction that fired the trigger during the INSERT, DELETE, or UPDATE statement in any database, independent of database options. INSTEAD OF triggers do not generate versions)
  • MARS (the versions are generated by the UPDATE or DELETE statement when there is a pending SELECT on the same transaction and the same connection)
  • building an ONLINE index
User Objects: They do appear in catalog views such as 'sys.all_objects. Now some details about it:
  • The 'sp_spaceused' system stored procedure can show the size occupied by these objects.
  • User Objects include both user-defined tables and indexes, and system catalog tables and indexes.
  • Operations on User Objects in tempdb are mostly logged. Bulk copy program (BCP), bulk insert, SELECT INTO, and index rebuild operations are bulk logged.
  • User-defined tables include the global temporary tables such as ##t, and local temporary tables such as #t.
  • Local temporary tables also include table variables such as @t and the mapping index for online clustered index build with the SORT_IN_TEMPDB option.
Performance Enhancements in SQL Server 2005 or later
  1. The tempdb logging optimization avoids logging the “after value” in certain log records in tempdb.
  2. Instant data file initialization works by not zeroing out the NTFS file when the file is created or when the size of the file is increased.
  3. There is less use of the UP type page latch when allocating pages and extents in tempdb. Proportional fill has been optimized to reduce UP latch contention.
  4. Proportional fill has been optimized to reduce UP latch contention.
  5. There is now deferred drop in tempdb.
  6. Worktable caching is improved.
  7. SQL Server 2005 or later caches the temporary table that is created by using a CREATE TABLE or SELECT INTO statement.
Recommendations for managing
  1. The tempdb files must be configured with initial size and auto-growth based on your workloads. Do not let with the default sizes.
  2. The tempdb files must be located on RAID0 (for better performance) or RAID1 (if you need have more writes than reads) or RAID5 (if you have more reads than writes). RAID10 is the best option but no all companies can justify this.
  3. The tempdb files must be located on separated disks to avoid contention issues and improves the performance.
  4. Tempdb database must be created with one data file per physical processor if the quantity of CPUs is eight or less. If there are more then eight CPUs then you can start off with eight data files and after increase the number of files by four in case there was PAGELATCH contentions on one of the allocation bitmap pages (including PFS pages, GAM pages and SGAM pages) until the issue is solved, if not, add four files more, and so on.
  5. Do not forget, tempdb data files or log file should not grow so much in a short time, if this happens then something is working wrong, so you need to analyze the workloads performance and detect the query that you have to optimize.
That is all for now. Let me know any remarks you may have about tempdb database. Thanks for reading.

Thursday 21 August 2014

Error 601: Could not continue scan with NOLOCK due to SQL Server data movement

Just one hour ago one of my tips was published at MSSQLTips.com related to “Error 601: Could not continue scan with NOLOCK due to SQL Server data movement”. You can read it at http://www.mssqltips.com/sqlservertip/3289/error-601-could-not-continue-scan-with-nolock-due-to-sql-server-data-movement/ . Thanks for reading!

Saturday 16 August 2014

SQL Server Replication Error – The specified LSN for repldone log scan occurs before the current start of replication in the log

My latest tip has been published today at mssqltips.com about “SQL Server Replication Error – The specified LSN for repldone log scan occurs before the current start of replication in the log” and you can read it at http://www.mssqltips.com/sqlservertip/3288/sql-server-replication-error–the-specified-lsn-for-repldone-log-scan-occurs-before-the-current-start-of-replication-in-the-log . Thanks for reading!
HELLO, I'M PERCY REYES! — a book lover, healthy lifestyle lover... I've been working as a senior SQL Server Database Administrator (DBA) for over 20 years; I'm a three-time awarded Microsoft Data Platform MVP. I'm currently doing a PhD in Computer Science (cryptography) at Loughborough University, England — working on cryptographic Boolean functions, algorithmic cryptanalysis, number theory, and other algebraic aspects of cryptography. READ MORE