TSQLMaster: It works better when you do the basics right...

So, I was doing some tuning as usual and I always report things in writing as I find them so I can justify the hours I put in. On one of these reports, I ended up identifying on one of the companys application, tables using VARCHAR(200) fields as primary keys.

Ok, I have been a developer once, and I know that sometimes development can be a messy business. You have little time (sometimes no time), few resources, almost no documentation, etc.

I was asked if there was actually any proof that integer, self generating columns were a better deal for PKs.

So here is what I came up with: I created a two table sample database, using at first a varchar(200) column as a PK and I created a FOREIGN KEY relationship for them. I measured the time it took to insert 100.000 row in them, the time and statitstics to perform a simple select on them and index sizing. The sample was done in SQL SERVER 2014.

Scripts do create the tables using VARCHAR as PK:

DROP TABLE customerdetails;

DROP TABLE customer;

CREATE TABLE customer
  (
     id        VARCHAR(250) PRIMARY KEY,
     NAME      VARCHAR(100),
     birthdate DATETIME,
     mmo       VARCHAR(max)
  )

go

CREATE TABLE customerdetails
  (
     customerid VARCHAR(250) REFERENCES customer(id),
     detaildate DATETIME,
     mmo        VARCHAR(max),
     auditvalue BIT
  )

go

DECLARE @dtinicio DATETIME = Getdate();
DECLARE @cont INT = 0;
DECLARE @str VARCHAR(250)

WHILE ( @cont < 100000 )
  BEGIN
      SET @str=Cast(@cont AS VARCHAR);

      INSERT INTO customer
                  (id,
                   NAME,
                   birthdate,
                   mmo)
      SELECT CASE
               WHEN Len(@str) < 250 THEN Replicate('0', 250-Len(@str)) + @str
               ELSE @str
             END,
             'Customer ' + @str,
             Getdate() + @cont,
             CASE WHEN Len(@str) < 250 THEN Replicate('0', 250-Len(@str))+@str
             ELSE
             @str END
             + 'Customer ' + @str;

      INSERT INTO customerdetails
                  (customerid,
                   detaildate,
                   mmo,
                   auditvalue)
      SELECT CASE
               WHEN Len(@str) < 250 THEN Replicate('0', 250-Len(@str)) + @str
               ELSE @str
             END,
             Getdate(),
             CASE
               WHEN Len(@str) < 250 THEN Replicate('0', 250-Len(@str)) + @str
               ELSE @str
             END,
             1;

      INSERT INTO customerdetails
                  (customerid,
                   detaildate,
                   mmo,
                   auditvalue)
      SELECT CASE
               WHEN Len(@str) < 250 THEN Replicate('0', 250-Len(@str)) + @str
               ELSE @str
             END,
             Getdate(),
             CASE
               WHEN Len(@str) < 250 THEN Replicate('0', 250-Len(@str)) + @str
               ELSE @str
             END,
             1;

      INSERT INTO customerdetails
                  (customerid,
                   detaildate,
                   mmo,
                   auditvalue)
      SELECT CASE
               WHEN Len(@str) < 250 THEN Replicate('0', 250-Len(@str)) + @str
               ELSE @str
             END,
             Getdate(),
             CASE
               WHEN Len(@str) < 250 THEN Replicate('0', 250-Len(@str)) + @str
               ELSE @str
             END,
             1;

      SET @cont=@cont + 1;
  END;

SELECT Datediff(millisecond, @dtinicio, Getdate()) AS
       [Tempo para inserir 100.000];

Scripts to create the same tables as integer:

DROP TABLE customerdetails;

DROP TABLE customer;

CREATE TABLE customer
  (
     id        INT IDENTITY(1, 1) PRIMARY KEY,
     NAME      VARCHAR(100),
     birthdate DATETIME,
     mmo       VARCHAR(max)
  )

go

CREATE TABLE customerdetails
  (
     customerid INT REFERENCES customer(id),
     detaildate DATETIME,
     mmo        VARCHAR(max),
     auditvalue BIT
  )

go

DECLARE @dtinicio DATETIME = Getdate();
DECLARE @cont INT = 0;
DECLARE @id INT = 0;
DECLARE @str VARCHAR(250)

WHILE ( @cont < 100000 )
  BEGIN
      SET @str=Cast(@cont AS VARCHAR);

      INSERT INTO customer
                  (NAME,
                   birthdate,
                   mmo)
      SELECT 'Customer ' + @str,
             Getdate() + @cont,
             CASE WHEN Len(@str) < 250 THEN Replicate('0', 250-Len(@str))+@str
             ELSE
             @str END
             + 'Customer ' + @str;

      SELECT @id = @@IDENTITY;

      INSERT INTO customerdetails
                  (customerid,
                   detaildate,
                   mmo,
                   auditvalue)
      SELECT @id,
             Getdate(),
             CASE
               WHEN Len(@str) < 250 THEN Replicate('0', 250-Len(@str)) + @str
               ELSE @str
             END,
             1;

      INSERT INTO customerdetails
                  (customerid,
                   detaildate,
                   mmo,
                   auditvalue)
      SELECT @id,
             Getdate(),
             CASE
               WHEN Len(@str) < 250 THEN Replicate('0', 250-Len(@str)) + @str
               ELSE @str
             END,
             1;

      INSERT INTO customerdetails
                  (customerid,
                   detaildate,
                   mmo,
                   auditvalue)
      SELECT @id,
             Getdate(),
             CASE
               WHEN Len(@str) < 250 THEN Replicate('0', 250-Len(@str)) + @str
               ELSE @str
             END,
             1;

      SET @cont=@cont + 1;
  END;

SELECT Datediff(millisecond, @dtinicio, Getdate()) AS
       [Tempo para inserir 100.000];

1. The first thing that draw my attention the most: the time it took to load the data:

Using VARCHAR: 82996
Using INT IDENTITY: 60796

Wow... that's a lot if may be considering bulk loading some data on your database.

2. Second thing, I measured: statistics and time on a simple SELECT on these tables. I used this query for testing:

SET statistics io ON;

SELECT *
FROM   customer c
       JOIN customerdetails cd
         ON ( c.id = cd.customerid );

SET statistics io OFF;

And these were the results, using varchar:

Ten seconds to execute the query.

Table 'Customer'. Scan count 0, logical reads 1218759, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CustomerDetails'. Scan count 1, logical reads 23085, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

ObjectName IndexDescription   IndexSize(MB)
Customer   CLUSTERED INDEX 54.4
CustomerDetails HEAP   149.3

Using INT

Five Seconds to execute the query.

Table 'Customer'. Scan count 0, logical reads 918759, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'CustomerDetails'. Scan count 1, logical reads 10715, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

ObjectName IndexDescription IndexSize(MB)

Customer CLUSTERED INDEX 28.9

CustomerDetails HEAP 78.4

Numbers dont lie folks.

Enough said by me... what is your opinion?

Check out my new book about R Language http://www.amazon.com/dp/B00SX6WA06

Um comentário:

Rodrigo Ribeiro4 de abril de 2016 às 05:02
This is the query I used to measure index size ...

SELECT [DatabaseName]
,[ObjectId]
,[ObjectName]
,[IndexId]
,[IndexDescription]
,CONVERT(DECIMAL(16, 1), (SUM([avg_record_size_in_bytes] * [record_count]) / (1024.0 * 1024))) AS [IndexSize(MB)]
--,[lastupdated] AS [StatisticLastUpdated]
--,[AvgFragmentationInPercent]
FROM (
SELECT DISTINCT DB_Name(Database_id) AS 'DatabaseName'
,OBJECT_ID AS ObjectId
,Object_Name(Object_id) AS ObjectName
,Index_ID AS IndexId
,Index_Type_Desc AS IndexDescription
,avg_record_size_in_bytes
,record_count
,STATS_DATE(object_id, index_id) AS 'lastupdated'
,CONVERT([varchar](512), round(Avg_Fragmentation_In_Percent, 3)) AS 'AvgFragmentationInPercent'
FROM sys.dm_db_index_physical_stats(db_id(), NULL, NULL, NULL, 'detailed')
WHERE OBJECT_ID IS NOT NULL
AND Avg_Fragmentation_In_Percent <> 0
) T
GROUP BY [DatabaseName]
,[ObjectId]
,[ObjectName]
,[IndexId]
,[IndexDescription]
ResponderExcluir
Respostas

Adicionar comentário

Leave your comment here!

quinta-feira, 24 de março de 2016

It works better when you do the basics right...

Um comentário: