terça-feira, 29 de março de 2016

It works better when you do the basics right [Part 2]

On my previous post, I wrote a comparison on Database Design using varchar and int (identity) primary key columns, showing how that especific choice could hurt performance. One thing I didnt show, was a comparison on how it would affect performance on update statements.

I made the tests using the scripts from the previous posts to drop and recreate the tables using VARCHAR and INT columns, check the previous post to understand it.

So, here are the results for an UPDATE showing IO STATS when you use VARCHAR columns:

SET statistics io ON;

IF EXISTS(SELECT 0
          FROM   customer
          WHERE  id =
'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001'
         )
  BEGIN
      UPDATE customerdetails
      SET    detaildate = detaildate,
             mmo = mmo,
             auditvalue = auditvalue
      WHERE  customerid =
'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001'
END

SET statistics io OFF;
/*
Usando VARCHAR

Table 'Customer'.      Scan count 0, logical reads 4,    physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'CustomerDetails'.  Scan count 5, logical reads 23088,  physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'.      Scan count 0, logical reads 0,    physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

*/


And here are the results, using INT IDENTITY(1,1) columns:

SET statistics io ON;

IF EXISTS(SELECT 0
          FROM   customer
          WHERE  id = 1)
  BEGIN
      UPDATE customerdetails
      SET    detaildate = detaildate,
             mmo = mmo,
             auditvalue = auditvalue
      WHERE  customerid = 1
  END

SET statistics io OFF;
/*
Usando inteiros
Table 'Customer'.      Scan count 0, logical reads 3,    physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CustomerDetails'.  Scan count 5, logical reads 10718,  physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'.      Scan count 0, logical reads 0,    physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/
 


So we can see a huge improvement on the logical reads (on customerdetails, almost 50%). Now, let's say that we want to update 1000 records on a loop, using VARCHAR pks, how long does it take?

DECLARE @dtinicio DATETIME = Getdate();
DECLARE @cont INT = 0;
DECLARE @str VARCHAR(250)

WHILE ( @cont < 1000 )
  BEGIN
      IF EXISTS(SELECT 0
                FROM   customer
                WHERE  id = @cont)
        BEGIN
            UPDATE customerdetails
            SET    detaildate = detaildate,
                   mmo = mmo,
                   auditvalue = auditvalue
            WHERE  customerid = @cont
        END

      SET @cont=@cont + 1;
  END;

SELECT Datediff(millisecond, @dtinicio, Getdate()) AS
       [Tempo para Atualizar 100.000];


So this takes aprox a 13 seconds. But using varchar...

DECLARE @dtinicio DATETIME = Getdate();
DECLARE @cont INT = 0;
DECLARE @str VARCHAR(250)

WHILE ( @cont < 1000 )
  BEGIN
      SET @str=Cast(@cont AS VARCHAR);

      IF EXISTS(SELECT 0
                FROM   customer
                WHERE  id = Replicate('0', 250-Len(@str)) + @str)
        BEGIN
            UPDATE customerdetails
            SET    detaildate = detaildate,
                   mmo = mmo,
                   auditvalue = auditvalue
            WHERE  customerid = Replicate('0', 250-Len(@str)) + @str
        END

      SET @cont=@cont + 1;
  END;

SELECT Datediff(millisecond, @dtinicio, Getdate()) AS
       [Tempo para Atualizar 100.000];  


It's painfull to watch SQL SERVER drag throught this. It takes 4 minutes and 5 seconds to perform the same Business Logic, but using tables that were built on VARCHAR pks.

Still in doubt?

Check out my new book about R Language http://www.amazon.com/dp/B00SX6WA06

Nenhum comentário:

Postar um comentário

Leave your comment here!