In Part One I set up the solution that cleans our data. In this part I will expand the solution to catalog tables and identify columns that may contain PI data and activate ones I want cleaned when a database is RESTORED to QA/DEV.
I installed the following FUNCTION on all my production databases, so the cleaning solution would already have access to it once RESTORED. Alternatively, I could just install it on the databases AFTER RESTORE each week, but that would mean another weekly task to perform.
-- ================ CREATE ON ALL DATRABASES ========================
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: David Speight
-- Create date: 20230510
-- Description: Returns the Primary Key COLUMN NAME of a table in the current database.
-- =============================================
CREATE FUNCTION dbo.TablePrimaryKeyColumnNameGet
(
@TableName varchar(250),
@SchemaName varchar(100)
)
RETURNS varchar(250)
AS
BEGIN
DECLARE @ColumnIDName varchar(250);
SELECT @ColumnIDName = c.name
FROM sys.tables t
INNER JOIN sys.schemas s on t.schema_id = s.schema_id
INNER JOIN sys.indexes ix on t.object_id = ix.object_id
INNER JOIN sys.index_columns ixc ON ix.object_id = ixc.object_id and ix.index_id = ixc.index_id
INNER JOIN sys.columns c ON ixc.object_id = c.object_id and ixc.column_id = c.column_id
WHERE ix.type > 0 and ix.is_primary_key = 1
and t.name = @TableName
and s.name = @SchemaName;
RETURN @ColumnIDName;
END
GO
I added this UNIQUE INDEX to the CleaningColumns table, just to make sure I don’t get duplicates.
-- Ensure only ONE entry for each COLUMN is allowed by setting up this UNIQUE index.
CREATE UNIQUE NONCLUSTERED INDEX [IX_Unique_ColumnCombo] ON [dbo].[CleaningColumns]
(
[DatabaseName] ASC,
[SchemaName] ASC,
[TableName] ASC,
[ColumnName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
TEMPORAL TABLES
IN Part 1, I mentioned how I have to handle TEMPORAL tables differently. I want to make sure I’ve captured a list of all temporal tables in each database. I’ve created this StoredProc to do that.
/****** Object: StoredProcedure [dbo].[TemporalListAddNew] ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[TemporalListAddNew]
AS
BEGIN
-- ADD NEW
DECLARE @iCount INT = 0;
DECLARE @tSql varchar(4000);
DECLARE @DatabaseName varchar(250);
SELECT @DatabaseName = ''
DECLARE @tDatabases AS TABLE ( [ID] [int] IDENTITY(1,1) NOT NULL,
[DatabaseName] [varchar](200) NULL);
INSERT INTO @tDatabases (DatabaseName)
SELECT DISTINCT DatabaseName
FROM dbo.CleaningColumns
WHERE IsActive=1;
SELECT @iCount = MAX(ID) FROM @tDatabases;
WHILE @iCount > 0
BEGIN
SELECT @DatabaseName = DatabaseName
FROM @tDatabases
WHERE ID = @iCount;
SELECT @tSql = CONCAT(
'INSERT INTO dbo.CleaningTemporal (DatabaseName,TableSchema,TableName,HistorySchema,HistoryTableName,IsActive)
SELECT ''',@DatabaseName,''', schema_name(t.schema_id), t.name, schema_name(h.schema_id), h.name,1
FROM ',@DatabaseName,'.sys.tables t inner join ',@DatabaseName,'.sys.tables h on t.history_table_id = h.object_id
WHERE t.temporal_type = 2
and not exists (SELECT *
FROM dbo.CleaningTemporal
WHERE DatabaseName = ''',@DatabaseName,'''
AND TableSchema = schema_name(t.schema_id)
AND TableName = t.name
AND HistorySchema = schema_name(h.schema_id)
AND HistoryTableName = h.name)')
EXEC(@tSql)
SELECT @iCount = @iCount - 1, @DatabaseName = '', @tSql = '';
END;
SELECT * FROM dbo.CleaningTemporal;
END
Now I will create a table where I can store a list of objects I may wish to validate. This will basically be a catalog of all tables and columns in each database on my instance.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Validator](
[ID] [int] IDENTITY(1,1) NOT NULL,
[DatabaseName] [varchar](250) NULL,
[SchemaName] [varchar](150) NULL,
[TableName] [varchar](250) NULL,
[ColumnName] [varchar](250) NULL,
[ObjectTypeName] [varchar](50) NULL,
[Status] [varchar](50) NULL,
[DataType] [varchar](50) NULL,
[GeneratorDataType] [varchar](50) NULL,
[IsActive] [bit] NULL,
[dtCreated] datetime NULL,
[dtLastModified] datetime NULL,
CONSTRAINT [PK_Validator] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
ALWAYS EXCLUDE
UPDATE: The code in GIT will include my most recent addition of a VALIDATOR ALWAYS IGNORE table. This will allow me to use the PREVIEW and GENERATOR to REFRESH the column list and ALWAYS ignore objects in this table, so they won’t have to be reevaluated each time.
/****** Object: Table [dbo].[ValidatorAlwaysIgnore] ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[ValidatorAlwaysIgnore](
[ID] [int] IDENTITY(1,1) NOT NULL,
[DatabaseName] [varchar](250) NULL,
[SchemaName] [varchar](150) NULL,
[TableName] [varchar](250) NULL,
[ColumnName] [varchar](250) NULL,
[dtCreated] [datetime] NULL,
CONSTRAINT [PK_ValidatorAlwaysIgnore] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
Next, I will populate the VALIDATOR table with a list of databases and mark EXCLUDE for the ones that do not have PI data, or I don’t want cleaned. MY EXCLUDE LIST identifies ALL Third Party application databases as well, such as Jira, Hangfire, etc. In this sample, I’ve also marked to exclude any database used by AWS RDS, SSRS, and GDPR as well.
INSERT INTO dbo.Validator
([DatabaseName], [SchemaName], [TableName], [ColumnName], [ObjectTypeName], [Status], [DataType], [IsActive], [dtCreated], [dtLastModified])
select name, null, null, null, 'database', 'parent', null, 1, getutcdate(), getutcdate()
from sys.databases;
UPDATE dbo.Validator
SET status = 'exclude'
WHERE DatabaseName in ('master', 'tempdb', 'model','msdb', 'DB_Clean_DataGeneration','Audit','AuditDW','Hangfire','Jiradb','Maintenance','ServerAnalysis','ServerAnalysisDW','Test','ServerAnalysisDWTestData','UPIC_Mobile','SSISDB','ReportServer','ReportServerTempDB')
or DatabaseName like 'GDPR%' or DatabaseName like 'rds%';
Here I will populate the VALIDATOR with a list of TABLES from the databases I just cataloged.
DECLARE @tblLoop INT = 0;
DECLARE @tblSql nvarchar(3000);
DECLARE @dbTable as TABLE (ID INT IDENTITY(1,1), DatabaseName VARCHAR(250))
INSERT INTO @dbTable
SELECT DatabaseName
FROM dbo.Validator
WHERE ObjectTypeName = 'database' and IsActive=1 and Status != 'exclude'
SELECT @tblLoop = MAX(ID) FROM @dbTable;
WHILE @tblLoop > 0
BEGIN
SELECT @tblSql = CONCAT('INSERT INTO dbo.Validator
([DatabaseName], [SchemaName], [TableName], [ColumnName], [ObjectTypeName], [Status], [DataType], [IsActive], [dtCreated], [dtLastModified])
SELECT ''',DatabaseName,''', sc.name, tbl.name, null, ''table'',''parent'',null,1,getutcdate(),getutcdate()
FROM ',DatabaseName,'.sys.tables tbl
inner join ',DatabaseName,'.sys.schemas sc on tbl.schema_id = sc.schema_id
WHERE NOT EXISTS (
SELECT ''',DatabaseName,''', sc.name, tbl.name, null, ''table''
FROM DB_Clean_DataGeneration.dbo.Validator v
INNER JOIN ',DatabaseName,'.sys.tables tbl on v.TableName = tbl.name
INNER JOIN ',DatabaseName,'.sys.schemas sc on v.SchemaName = sc.name
WHERE v.DatabaseName = ''',DatabaseName,'''
)
FROM @dbTable
WHERE ID = @tblLoop;
BEGIN TRY
EXEC(@tblSql)
END TRY
BEGIN CATCH
PRINT @tblSql;
END CATCH;
-- reset variables
SELECT @tblLoop = @tblLoop - 1, @tblSql ='';
END;
To populate the VALIDATOR with a list of COLUMNS from the tables I wish to consider, I’ve written a Stored PROC that will handle this. My proc may also be reused in the future to identify and catalog new columns I’ve added to the databases.
/****** Object: StoredProcedure [dbo].[GeneratorUpdateBaseColumns] Script Date: 5/19/2023 8:48:37 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[GeneratorUpdateBaseColumns]
AS
BEGIN
/*
Remove columns where the database is exclude
*/
DELETE FROM dbo.Validator
WHERE ObjectTypeName in ('table','column')
and DatabaseName in (
SELECT DatabaseName
FROM dbo.Validator
WHERE ObjectTypeName = 'database' and status = 'exclude');
/*
Check for new TABLES from our ACTIVE databases we may wish to consider
*/
DECLARE @tblLoop INT = 0;
DECLARE @tblSql nvarchar(3000);
DECLARE @dbTable as TABLE (ID INT IDENTITY(1,1), DatabaseName VARCHAR(250))
INSERT INTO @dbTable
SELECT DatabaseName
FROM dbo.Validator
WHERE ObjectTypeName = 'database' and IsActive=1 and Status != 'exclude'
SELECT @tblLoop = MAX(ID) FROM @dbTable;
WHILE @tblLoop > 0
BEGIN
SELECT @tblSql = CONCAT('INSERT INTO dbo.Validator
([DatabaseName], [SchemaName], [TableName], [ColumnName], [ObjectTypeName], [Status], [DataType], [IsActive], [dtCreated], [dtLastModified])
SELECT ''',DatabaseName,''', sc.name, tbl.name, null, ''table'',''parent'',null,1,getutcdate(),getutcdate()
FROM ',DatabaseName,'.sys.tables tbl
inner join ',DatabaseName,'.sys.schemas sc on tbl.schema_id = sc.schema_id
left join DB_Clean_DataGeneration.dbo.Validator v on tbl.name = v.TableName and sc.name = v.SchemaName
WHERE v.TableName is null;')
FROM @dbTable
WHERE ID = @tblLoop;
BEGIN TRY
EXEC(@tblSql)
END TRY
BEGIN CATCH
PRINT @tblSql;
END CATCH;
-- reset variables
SELECT @tblLoop = @tblLoop - 1, @tblSql ='';
END;
/*
Populate the VALIDATOR with a list of COLUMNS from the tables we wish to consider
*/
DECLARE @cLoop INT = 0;
DECLARE @cSql nvarchar(3000);
DECLARE @cdbTable as TABLE (ID INT IDENTITY(1,1), DatabaseName VARCHAR(250))
INSERT INTO @cdbTable
SELECT DatabaseName
FROM dbo.Validator
WHERE ObjectTypeName = 'database' and IsActive=1 and Status != 'exclude'
SELECT @cLoop = MAX(ID) FROM @cdbTable;
WHILE @cLoop > 0
BEGIN
SELECT @cSql =CONCAT('INSERT INTO dbo.Validator
([DatabaseName], [SchemaName], [TableName], [ColumnName], [ObjectTypeName], [Status], [DataType], [IsActive], [dtCreated], [dtLastModified])
SELECT ''',DatabaseName,''', sc.name, tbl.name, c.name, ''column'',''standard'',
(CASE WHEN t.Name in (''decimal'',''numeric'') THEN CONCAT(t.Name,''('',c.max_length,'','', c.precision,'')'')
WHEN t.Name in (''sql_variant'',''hierachyid'',''varbinary'',''varchar'',''binary'',''char'',''nvarchar'',''nchar'') THEN CONCAT(t.Name,''('',REPLACE(c.max_length,''-1'',''max''),'')'')
ELSE t.Name END),1,getutcdate(),getutcdate()
FROM ',DatabaseName,'.sys.tables tbl
INNER JOIN ',DatabaseName,'.sys.schemas sc ON Tbl.schema_id = Sc.schema_id
INNER JOIN ',DatabaseName,'.sys.columns c ON tbl.object_id = c.object_id
INNER JOIN ',DatabaseName,'.sys.types t ON c.user_type_id = t.user_type_id
LEFT JOIN DB_Clean_DataGeneration.dbo.tfnValidatorTableExcluded(''',DatabaseName,''') fn ON fn.SchemaName = sc.name and fn.TableName = tbl.name
LEFT JOIN DB_Clean_DataGeneration.dbo.Validator v ON v.SchemaName = sc.name and v.TableName = tbl.name and v.ColumnName = c.name
WHERE fn.TableName IS NULL
and v.ColumnName is null;')
FROM @cdbTable
WHERE ID = @cLoop;
BEGIN TRY
EXEC(@cSql)
END TRY
BEGIN CATCH
PRINT @cSql;
END CATCH;
-- reset variables
SELECT @cLoop = @cLoop - 1, @cSql ='';
END;
END
GO
(OPTIONAL) In my case I began this process as described in PART ONE and made manually entries for cleaning. So when this is a “build upon step”.)” I will need to identify the manual entries I already have, in the new catalog.
SELECT *
FROM dbo.CleaningColumns c
inner join dbo.Validator v on C.DatabaseName = v.DatabaseName
AND c.SchemaName = v.SchemaName
AND c.TableName = v.TableName
AND c.ColumnName = v.ColumnName
UPDATE v
SET Status = (Case when v.IsActive = 1 then 'cleaning' else 'clean-inactive' end)
FROM dbo.Validator AS v
INNER JOIN dbo.CleaningColumns AS c
ON v.DatabaseName = c.DatabaseName
AND v.SchemaName = C.SchemaName
AND v.TableName = c.TableName
AND v.ColumnName = c.ColumnName
Now I want to IDENTIFY possible COLUMN NAMES that could contain PI data. I have written a stored proc to help PREVIEW the items that might need to be added to the cleaning soilution.
CREATE OR ALTER PROCEDURE dbo.GeneratePreview
AS
BEGIN
DECLARE @LookupType as Table (ID INT NOT NULL IDENTITY(1,1), DataTypeName VARCHAR(50), Keyword VARCHAR(50), IsInt bit);
INSERT INTO @LookupType
select c.DataTypeName, s.value, c.IsInt
FROM dbo.CleaningDataType c
CROSS APPLY string_split(c.keywordList,',') s
WHERE c.IsActive=1 and LEN(S.value) > 1;
SELECT *
FROM
(
SELECT
v.ID, v.DatabaseName,v.SchemaName,v.TableName,v.ColumnName,v.ObjectTypeName,v.Status,
v.DataType, v.GeneratorDataType, l.DataTypeName, l.Keyword, l.Isint,
ROW_NUMBER() OVER(PARTITION BY v.DatabaseName,v.SchemaName,v.TableName,v.ColumnName ORDER BY v.DatabaseName,v.SchemaName,v.TableName,v.ColumnName) As ROWID
FROM dbo.Validator v
INNER JOIN @LookupType l
ON (v.ColumnName LIKE l.Keyword+'%')
and (l.isint = 1 or LEFT(v.datatype,7) NOT IN (
'bigint','binary','bit','decimal','float','geograp','geometr','hierarc','image','int','money','numeric','real','smallin','smallmo','sql_var','tinyint'
))
WHERE v.IsActive=1 and v.ObjectTypeName='column' and v.Status = 'standard'
) q
WHERE q.Rowid = 1
ORDER BY DatabaseName,SchemaName,TableName,ColumnName;
END
GO
I then REVIEW and EXCLUDE ANY COLUMNS NOT NEEDED.
EXEC dbo.GeneratePreview;
From the results, I identify the items I DO NOT want cleaned and then I add these to my catalog and mark them, using a stored proc, where I can enter a string of ID numbers found from the results.
CREATE OR ALTER PROCEDURE dbo.ColumnsToIgnore
@IdList varchar(8000)
AS
BEGIN
UPDATE dbo.Validator
SET Status = 'ignore'
WHERE ID IN (
SELECT value
FROM string_split(@IdList,','));
END
GO
EXEC dbo.ColumnsToIgnore (a,b,c,d,....); -- Where a,b,c are ID numbers.
ACTIVATE PREVIEW
I rerun EXEC dbo.GeneratePreview and re-verify the list is correct. Then, I run this stored procedure that executes similar code, but actually transfers the object to the REAL cleaning table.
CREATE OR ALTER PROCEDURE dbo.GenerateSuggestions
AS
BEGIN
DECLARE @LookupType as Table (ID INT NOT NULL IDENTITY(1,1), DataTypeName VARCHAR(50), Keyword VARCHAR(50), IsInt bit);
DECLARE @StartID INT;
SELECT @StartID = MAX(ID)
FROM dbo.CleaningColumns;
INSERT INTO @LookupType
select c.DataTypeName, s.value, c.IsInt
FROM dbo.CleaningDataType c
CROSS APPLY string_split(c.keywordList,',') s
WHERE c.IsActive=1 and LEN(S.value) > 1;
INSERT INTO dbo.CleaningColumns
([DatabaseName], [SchemaName], [TableName], [ColumnName], [GenderRefColumnName], [IDColumnName], [DataType], [IsActive], [UsesGenerationDb])
SELECT q.DatabaseName,q.SchemaName,q.TableName,q.ColumnName,NULL,NULL,DataTypeName,1,1
FROM
(
SELECT
v.ID, v.DatabaseName,v.SchemaName,v.TableName,v.ColumnName,v.ObjectTypeName,v.Status,
v.DataType, v.GeneratorDataType, l.DataTypeName, l.Keyword, l.Isint,
ROW_NUMBER() OVER(PARTITION BY v.DatabaseName,v.SchemaName,v.TableName,v.ColumnName ORDER BY v.DatabaseName,v.SchemaName,v.TableName,v.ColumnName) As ROWID
FROM dbo.Validator v
INNER JOIN @LookupType l
ON (v.ColumnName LIKE l.Keyword+'%')
and (l.isint = 1 or LEFT(v.datatype,7) NOT IN (
'bigint','binary','bit','decimal','float','geograp','geometr','hierarc','image','int','money','numeric','real','smallin','smallmo','sql_var','tinyint'
))
WHERE v.IsActive=1 and v.ObjectTypeName='column' and v.Status = 'standard'
) q
-- filter out rows already in our table
LEFT JOIN dbo.CleaningColumns c ON
q.DatabaseName = c.DatabaseName and
q.SchemaName = c.SchemaName and
q.TableName = c.TableName and
q.ColumnName = c.ColumnName
WHERE c.ColumnName IS NULL and q.Rowid = 1
ORDER BY DatabaseName,SchemaName,TableName,ColumnName;
-- Lookup ID Column (Mark missing/multiple CREATED WITHOUT ID COLUMN REVIEW)
DECLARE @PkIdUpdateSql varchar(3000);
DECLARE @iTableLoop int;
SELECT @iTableLoop = 1 + COUNT(*)
FROM
( SELECT COUNT(*) As TableCount from dbo.CleaningColumns where IDColumnName is null GROUP BY DatabaseName, SchemaName, TableName )q
While @iTableLoop > 0
BEGIN
SELECT @PkIdUpdateSql = CONCAT(
'UPDATE c
SET IDColumnname = q.IDColumnname
FROM dbo.CleaningColumns c
INNER JOIN
(SELECT ''',DatabaseName,''' as DatabaseName, ''',TableName,''' as TableName, ''',SchemaName,''' as SchemaName,
ISNULL(',DatabaseName,'.dbo.TablePrimaryKeyColumnNameGet(''',TableName,''',''',SchemaName,'''),''ID'') as IDColumnname) q
ON c.DatabaseName = q.DatabaseName and c.TableName = q.TableName and c.SchemaName = q.SchemaName
')
FROM dbo.CleaningColumns
WHERE IDColumnName is null;
EXEC(@PkIdUpdateSql);
SELECT @iTableLoop = @iTableLoop -1, @PkIdUpdateSql='';
END;
-- Lookup Gender Column (Mark multiple CREATED BUT MULTI GENDER COLUMNS FOUND - REVIEW)
UPDATE c
SET GenderRefColumnName = z.GenderRefColumnName
FROM dbo.CleaningColumns c
INNER JOIN
(SELECT v.DatabaseName, v.SchemaName, v.TableName, v.ColumnName,
( CASE COUNT(*) WHEN 1 THEN v.ColumnName ELSE 'MULTI-FOUND' END) As GenderRefColumnName
FROM dbo.Validator v
INNER JOIN (
SELECT DatabaseName, SchemaName, TableName
FROM dbo.CleaningColumns
WHERE (ColumnName like '%name%') and (GenderRefColumnName is null or LEN(GenderRefColumnName) < 2)
GROUP BY DatabaseName, SchemaName, TableName) q
ON v.DatabaseName = q.DatabaseName and v.SchemaName = q.SchemaName and v.TableName = q.TableName
WHERE V.ColumnName LIKE '%GENDER%' or V.ColumnName LIKE '%SEX%'
GROUP BY v.DatabaseName, v.SchemaName, v.TableName, v.ColumnName) z
ON c.DatabaseName = z.DatabaseName and c.SchemaName = z.SchemaName and c.TableName = z.TableName
WHERE (c.GenderRefColumnName is null or LEN(c.GenderRefColumnName) < 2)
and (c.ColumnName like '%name%');
select * FROM dbo.CleaningColumns WHERE LEN(GenderRefColumnName) > 1;
UPDATE v
SET Status = (Case when v.IsActive = 1 then 'cleaning' else 'clean-inactive' end)
FROM dbo.Validator AS v
INNER JOIN dbo.CleaningColumns AS c
ON v.DatabaseName = c.DatabaseName
AND v.SchemaName = C.SchemaName
AND v.TableName = c.TableName
AND v.ColumnName = c.ColumnName;
SELECT *
FROM dbo.CleaningColumns
WHERE ID > @StartID
ORDER BY DatabaseName, SchemaName, TableName, ColumnName;
END
GO
POST CLEANING STEPS
I have certain tables that need to be UPDATED to conform to our QA environment. Therefore, I created a POST table that can store Tsql code to run AFTER the data cleaning. One thing I do, is EXCLUDE all AUDIT tables and then use this feature to TRUNCATE those tables.
TABLE
PROCEDURE
USE [DB_Clean_DataGeneration]
GO
/****** Object: StoredProcedure [dbo].[PostCleaning] Script Date: 5/19/2023 11:18:35 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[PostCleaning]
AS
BEGIN
DECLARE @tSql varchar(max);
DECLARE @iCount int = 0;
DECLARE @isActive bit = 0;
DECLARE @ParentAuditKey int;
DECLARE @PkgGUID uniqueidentifier;
SELECT @PkgGUID = NEWID();
SELECT @iCount = MAX(ID) FROM dbo.PostCleaningCustomCode;
INSERT INTO dbo.DimAudit ( [ParentAuditKey], [TableName], [PkgName], [PkgGUID], [ExecStartDT], [SuccessfulProcessingInd])
VALUES (1, 'DATA CLEANING: PostClean-Start','POST CLEANING: PostClean-Start', @PkgGUID, getutcdate(), 'P');
SELECT @ParentAuditKey = SCOPE_IDENTITY();
UPDATE dbo.DimAudit SET ParentAuditKey = @ParentAuditKey WHERE AuditKey = @ParentAuditKey;
WHILE @iCount > 0
BEGIN
SELECT @isActive = isActive,
@tSql = Tsql
FROM dbo.PostCleaningCustomCode
WHERE ID = @iCount;
BEGIN TRY
EXEC(@tSql);
END TRY
BEGIN CATCH
PRINT @tSql;
INSERT INTO dbo.DimAudit ( [ParentAuditKey], [TableName], [PkgName], [PkgGUID], [ExecStartDT], [SuccessfulProcessingInd])
VALUES (@ParentAuditKey, 'DATA CLEANING: PostClean-Step',CONVERT(varchar(12),@iCount), @PkgGUID, getutcdate(), 'N');
END CATCH;
SELECT @iCount= @iCount-1, @isActive=0, @tSql=''
END
-- Success / Failure Actions
IF ((SELECT COUNT(*) FROM dbo.DimAudit WHERE ParentAuditKey=@ParentAuditKey AND SuccessfulProcessingInd='N')=0)
BEGIN
UPDATE dbo.DimAudit
SET ExecStopDT = getutcdate(),SuccessfulProcessingInd = 'Y'
WHERE (AuditKey = @ParentAuditKey );
-- ELSE send email with Failed Steps table
END
ELSE
BEGIN
UPDATE dbo.DimAudit
SET ExecStopDT = getutcdate(),SuccessfulProcessingInd = 'N'
WHERE (AuditKey = @ParentAuditKey );
END
END
GO
SET UP AGNET JOB and TEST
Finally, I set up my SQL Agent job that will run AFTER the database RESTORE job completes.
I include the following steps in my Agent job.
- Restore Databases.
- EXEC dbo.DataCleaning;
- EXEC dbo.PostCleaning;
- Delete the Restore files.
GIT PROJECT
The above project can be found in my GIT repository as a Visual Studio (2019) project. You may still use the scripted objects (tables, views, sprocs) in the dbo folder even if you don’t use VS.
https://github.com/LoveTheSql/DB_Clean_DataGeneration/tree/main
End of post.