sexta-feira, 1 de abril de 2016

Fighting Database Circular References

So one of the problems I came across Database Administration that is hard to discover and even harder to avoid is Circular Reference. 

A Circular Reference happens when you create foreign key references between objects, creating "circle references". This can happen on one table that references its own Primary Key, which is not unusual when you need to create a hierarchy type of relation.

But it can get weird, when you have a table1 that references table2 and another fk from table2 to table1. So you already have a reference between the two tables, why you need another? This is a messy mistake that generates redundant and innacurate information, creating huge mistakes on reports and audits.

A harder level of circular reference is when you create a circular reference between three or four tables. This will lead to so many levels of redundancy and creates infinite paths to retrieve information. Finally, no one knows the right path to the actual real data. 

And don't bother asking the devs: chances are the original developer is working on another project in another company or he probably doesn't know what he did wrong or what is the right path to it.

On CodeProject.com, there is this great article on Circular Reference, writen by Pham Dinh Truong. Its full of pictures, quite detailed and simple enough so a baby can understand it.

But, let's move to the harder part: locating these circular references. I wrote a few scripts to help me out on that, but as it turns out it can be a hassle to collect this information from the system views. One script I came up is this:

SELECT Object_schema_name(fk1.parent_object_id) 
       + '.' + Object_name(fk1.parent_object_id), 
       Object_schema_name(fk2.parent_object_id) 
       + '.' + Object_name(fk2.parent_object_id) 
--OBJECT_SCHEMA_NAME(fk3.parent_object_id) 
-- + '.' + OBJECT_NAME(fk3.parent_object_id) 
FROM   sys.foreign_keys AS fk1 
       JOIN sys.foreign_keys AS fk2 
         ON fk1.parent_object_id = fk2.referenced_object_id 
            AND fk2.parent_object_id = fk1.referenced_object_id 

It's accurate and it works only for two leves (table1, table2 stuff), but you can have some duplicates rows (table1>table2 table2>table1). I tried writing more scripts, but I just found another great reference. It was writen by Amir Mirkamali on his blog, this is the original post. I'm republishing the script here, hopefully the author won't mind =)

In case you are intending on confronting developers that built c.r. masterpieces, be prepared for war. Really.

SET nocount ON 

-- WWB: Create a Temp Table Of All Relationship To Improve Overall Performance 
CREATE TABLE #tablerelationships 
  ( 
     fk_schema NVARCHAR(max), 
     fk_table  NVARCHAR(max), 
     pk_schema NVARCHAR(max), 
     pk_table  NVARCHAR(max) 
  ) 

-- WWB: Create a List Of All Tables To Check 
CREATE TABLE #tablelist 
  ( 
     [schema] NVARCHAR(max), 
     [table]  NVARCHAR(max) 
  ) 

-- WWB: Fill the Table List 
INSERT INTO #tablelist 
            ([table], 
             [schema]) 
SELECT table_name, 
       table_schema 
FROM   information_schema.tables 
WHERE  table_type = 'BASE TABLE' 

-- WWB: Fill the RelationShip Temp Table 
INSERT INTO #tablerelationships 
            (fk_schema, 
             fk_table, 
             pk_schema, 
             pk_table) 
SELECT FK.table_schema, 
       FK.table_name, 
       PK.table_schema, 
       PK.table_name 
FROM   information_schema.referential_constraints C 
       INNER JOIN information_schema.table_constraints FK 
               ON C.constraint_name = FK.constraint_name 
       INNER JOIN information_schema.table_constraints PK 
               ON C.unique_constraint_name = PK.constraint_name 
       INNER JOIN information_schema.key_column_usage CU 
               ON C.constraint_name = CU.constraint_name 
       INNER JOIN (SELECT i1.table_name, 
                          i2.column_name 
                   FROM   information_schema.table_constraints i1 
                          INNER JOIN information_schema.key_column_usage i2 
                                  ON i1.constraint_name = i2.constraint_name 
                   WHERE  i1.constraint_type = 'PRIMARY KEY') PT 
               ON PT.table_name = PK.table_name 

CREATE TABLE #stack 
  ( 
     [schema] NVARCHAR(max), 
     [table]  NVARCHAR(max) 
  ) 

go 

-- WWB: Drop SqlAzureRecursiveFind 
IF EXISTS (SELECT * 
           FROM   sys.objects 
           WHERE  object_id = Object_id(N'[dbo].[SqlAzureRecursiveFind]') 
                  AND type IN ( N'P', N'PC' )) 
  DROP PROCEDURE [dbo].[SqlAzureRecursiveFind] 

go 

-- WWB: Create a Stored Procedure that Recursively Calls Itself 
CREATE PROC Sqlazurerecursivefind @BaseSchmea NVARCHAR(max), 
                                  @BaseTable  NVARCHAR(max), 
                                  @Schmea     NVARCHAR(max), 
                                  @Table      NVARCHAR(max), 
                                  @Fail       NVARCHAR(max) output 
AS 
    SET nocount ON 

    -- WWB: Keep Track Of the Schema and Tables We Have Checked 
    -- Prevents Looping           
    INSERT INTO #stack 
                ([schema], 
                 [table]) 
    VALUES      (@Schmea, 
                 @Table) 

    DECLARE @RelatedSchema NVARCHAR(max) 
    DECLARE @RelatedTable NVARCHAR(max) 
    -- WWB: Select all tables that the input table is dependent on 
    DECLARE table_cursor CURSOR local FOR 
      SELECT pk_schema, 
             pk_table 
      FROM   #tablerelationships 
      WHERE  fk_schema = @Schmea 
             AND fk_table = @Table 

    OPEN table_cursor; 

    -- Perform the first fetch. 
    FETCH next FROM table_cursor INTO @RelatedSchema, @RelatedTable; 

    -- Check @@FETCH_STATUS to see if there are any more rows to fetch. 
    WHILE @@FETCH_STATUS = 0 
      BEGIN 
          -- WWB: If We have Recurred To Where We Start This 
          -- Is a Circular Reference 
          -- Begin failing out of the recursions 
          IF ( @BaseSchmea = @RelatedSchema 
               AND @BaseTable = @RelatedTable ) 
            BEGIN 
                SET @Fail = @RelatedSchema + '.' + @RelatedTable 

                RETURN 
            END 
          ELSE 
            BEGIN 
                DECLARE @Count INT 

                -- WWB: Check to make sure that the dependencies are not in the stack 
                -- If they are we don't need to go down this branch 
                SELECT @Count = Count(1) 
                FROM   #stack 
                WHERE  #stack.[schema] = @RelatedSchema 
                       AND #stack.[table] = @RelatedTable 

                IF ( @Count = 0 ) 
                  BEGIN 
                      -- WWB: Recurse 
                      EXECUTE Sqlazurerecursivefind 
                        @BaseSchmea, 
                        @BaseTable, 
                        @RelatedSchema, 
                        @RelatedTable, 
                        @Fail output 

                      IF ( Len(@Fail) > 0 ) 
                        BEGIN 
                            -- WWB: If the Call Fails, Build the Output Up 
                            SET @Fail = @RelatedSchema + '.' + @RelatedTable + 
                                        ' -> ' 
                                        + @Fail 

                            RETURN 
                        END 
                  END 
            END 

          -- This is executed as long as the previous fetch succeeds. 
          FETCH next FROM table_cursor INTO @RelatedSchema, @RelatedTable; 
      END 

    CLOSE table_cursor; 

    DEALLOCATE table_cursor; 

go 

SET nocount ON 

DECLARE @Schema NVARCHAR(max) 
DECLARE @Table NVARCHAR(max) 
DECLARE @Fail NVARCHAR(max) 
-- WWB: Loop Through All the Tables In the Database Checking Each One 
DECLARE list_cursor CURSOR FOR 
  SELECT [schema], 
         [table] 
  FROM   #tablelist 

OPEN list_cursor; 

-- Perform the first fetch. 
FETCH next FROM list_cursor INTO @Schema, @Table; 

-- Check @@FETCH_STATUS to see if there are any more rows to fetch. 
WHILE @@FETCH_STATUS = 0 
  BEGIN 
      -- WWB: Clear the Stack (Don't you love Global Variables) 
      DELETE #stack 

      -- WWB: Initialize the Input 
      SET @Fail = '' 

      -- WWB: Check the Table 
      EXECUTE Sqlazurerecursivefind 
        @Schema, 
        @Table, 
        @Schema, 
        @Table, 
        @Fail output 

      IF ( Len(@Fail) > 0 ) 
        BEGIN 
            -- WWB: Failed, Output 
            SET @Fail = @Schema + '.' + @Table + ' -> ' + @Fail 

            PRINT @Fail 
        END 

      -- This is executed as long as the previous fetch succeeds. 
      FETCH next FROM list_cursor INTO @Schema, @Table; 
  END 

-- WWB: Clean Up 
CLOSE list_cursor; 

DEALLOCATE list_cursor; 

DROP TABLE #tablerelationships 
DROP TABLE #stack 
DROP TABLE #tablelist 
DROP PROC sqlazurerecursivefind 


Check out my new book about R Language http://www.amazon.com/dp/B00SX6WA06

Nenhum comentário:

Postar um comentário

Leave your comment here!