Due to the amount of data collected from dozens of servers at half a dozen companies, I typically clean out the data warehouse, keeping only the last 30 days of data. However, I like to take a snapshot of the a single day each month. I have set up a SQL Agent job to run this snapshot procedure on the first day of the month.
Here is a look at my table for this.
This is the code for the stored procedure.
USE [ServerAnalysisDW]
GO
/****** Object: StoredProcedure [Analysis].[PerfCounterSnapshot] Script Date: 9/27/2022 8:04:24 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: David Speight
-- Create date: 20220902
-- =============================================
CREATE OR ALTER PROCEDURE [Analysis].[PerfCounterSnapshot]
@DateKey varchar(24) = ''
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
SELECT @DateKey = (CASE WHEN LEN(@DateKey) > 5 THEN @DateKey
ELSE REPLACE(EOMONTH(getutcdate(),-1),'-','') END) -- Last Day of previous month
INSERT INTO [Analysis].[PerfCounterStatsSnapshot]
([ID], [LocationID], [ClusterID], [MetricID], [PartName], [DateKey], [TimeKey], [iVal], [dVal], [UtcDate])
SELECT [ID], [LocationID], [ClusterID], p.[MetricID], [PartName], [DateKey], [TimeKey], [iVal], [dVal], [UtcDate]
FROM Analysis.PerfCounterMetric as m WITH(NOLOCK)
inner join Analysis.PerfCounterStats as p WITH(NOLOCK) on m.MetricID = p.MetricID
WHERE m.MetricTsql is not null
AND p.DateKey = CONVERT(INT,@DateKey)
AND (m.MetricIntervalMinutes > 1 OR SUBSTRING(p.TimeKey,3,2) = '00')
ORDER BY [LocationID], [ClusterID], p.[MetricID], [DateKey], [TimeKey];
END
TROUBLE SHOOTING and FINE TUNING
When there is an issue. Typically I check the logs first and then go from there.
1. Start with the log. I learn a lot simply by querying the log files the SSIS packages create.
SELECT TOP (1000) *
FROM ServerAnalysis.Analysis.PerfCounterLogActions ORDER BY 1 DESC;
SELECT TOP (1000) *
FROM ServerAnalysisDwStaging.Analysis.PerfFileExtractionLog ORDER BY 1 DESC
2. STEP through the process manually to find where the breakdown occurs.
Here is a list of common issues and solutions.
ISSUES | PART | RESOLUTION |
---|---|---|
Dashboard shows DOWN when server is UP. | 9 | The query is based on Metrics that may only pull every three hours. Rewrite query or check again later. |
My CPU is at 90% but dashboard says 5%. | 9 | Remember there is a lag time of 5 to 30 minutes, depending on the settings. While this is NEAR real-time, it is by no means REAL TIME. |
Dashboard has no data. | 9,6 | Verify the data source can connect. Check that SQL Agent jobs are running. Check data source connections & passwords |
Some instances are missing from the dashboard. | 9,6 | Check the entire process for those instances and make sure each step is working from capturing metrics, output of csv file, import of data. |
Dashboard runs slow. | 8 | Identify and create or update an index on one or more tables. Delete old records that are bloating the table, or Come up with a partition solution. |
CSV file is not saving locally. | 6 | Make sure the SSIS CONFIGURATION on the actual server for the SSIS project parameters is correct. |
RDS is not saving the csv to D:\ drive. | 6 | When saving a file to a folder on the D: drive of RDS, the folder must first be created. Save a small test file to the path, to create the folder where the CSV file will go |
A certain Metric is not showing up. | 5 | Check the PerfCounterLog table on the local server and see if data is being captured for it. Test that the TSQL definition in the PerfCounterMetric table runs by itself on said server, using the SELECT statement only. |
CSV file is saving locally but not copying to AWS. | 4 | Make sure the correct permissions are set on the S3 bucket, and ACLs enabled. Make sure port 443 is not blocked, or at least open for AWS CLI. Check that the Service Account running the CLI is using the right IAM credentials. Recheck the bucket policy. Make sure permission is granted on any custom KMS keys that may have changed/updated. |
SQL Agent job is failing. | Is the PROXY USER active and its CREDENTIAL using a current password? | |
SAMPLE CODE DOWNLOAD
My untested, still in progress code for this project may be found on GITHUB. Use at your own risk and always test things out first in a development area.