Friday, July 21, 2017

Read all files of SQL Server's drive in a Tree using recursive CTE

In this blog I'll use undocumented in SQL Server extended stored procedure "xp_dirtree".
That procedure is well described in Patrick Keisler's blog.

Briefly, "xp_dirtree" extended procedure returns tree of sub-directories from given folder and has three parameters:
1. Starting or root folder
2. Depth level - determines how deep SQL Server will dig into a sub-directory structure. If it set to zero or omitted whole tree will be returned.
3. If omitted or set to zero will return only folders .If set to "1" will return files as well.

The easiest examples would be like this:
EXEC master.sys.xp_dirtree 'C:\';
EXEC master.sys.xp_dirtree 'C:\',0,1; EXEC master.sys.xp_dirtree 'C:\TEST';

The Result might look like something like this and be not very useful.

So, in order to make it more useful we have to read drive content into a table and then build a directory tree from it:
CREATE TABLE #dirtree (
 Dir VARCHAR(256), 
 Depth TinyInt, 
 IsFile Bit, 
 ID INT IDENTITY(1,1) PRIMARY KEY);
GO
CREATE TABLE #fulltree (
 Dir VARCHAR(256), 
 Depth TinyInt, 
 IsFile Bit, 
 Parent INT, 
 ID INT PRIMARY KEY);
GO
INSERT INTO #dirtree(Dir, Depth, IsFile) 
EXEC master.sys.xp_dirtree 'C:\',0,1;
GO
INSERT INTO #fulltree(Dir, Depth, IsFile, Parent, ID)
SELECT Dir, Depth, IsFile, p.MID as Parent, ID
FROM #dirtree as d WITH (NOLOCK)
OUTER APPLY (SELECT MAX(ID) as MID FROM #dirtree as i WITH (NOLOCK)
 WHERE i.Depth + 1 = d.Depth and i.ID < d.ID ) as p;
GO
CREATE NONCLUSTERED INDEX #fulltree_parent ON #fulltree([Parent])
GO
;WITH RecCTE as (
 SELECT CAST('C:\' + Dir as VARCHAR(MAX)) as Dir
  , Depth, ID, IsFile, Dir as FName
 FROM #fulltree WHERE Parent is Null
 UNION ALL
 SELECT CAST(p.Dir + '\' + d.Dir as VARCHAR(MAX))
  , d.Depth, d.ID, d.IsFile, d.Dir as FName
 FROM #fulltree as d INNER JOIN RecCTE as p ON d.Parent = p.ID
) SELECT Dir, Depth, IsFile, FName FROM RecCTE ORDER BY ID
The script is using two temporary tables. First is for data extraction and second for data manipulation.
In case of very small tree and small number of files it is possible to simplify query by using only one table and avoiding intermediate update, but if you extract structure of a regular drive it might take a while to produce a result.

In my case the result looks like this. More than 200K items and most of them in "Windows" folder:

To test my script in the extreme conditions I've created test folder and bunch of underlying sub-folders and then run following script against test folder:
DROP TABLE #dirtree 
GO
CREATE TABLE #dirtree (
 Dir VARCHAR(256), 
 Depth TinyInt, 
 Parent INT, 
 ID INT IDENTITY(1,1) PRIMARY KEY);
GO
INSERT INTO #dirtree(Dir, Depth) 
EXEC master.sys.xp_dirtree 'C:\T\';
GO
UPDATE d SET Parent = p.MID
FROM #dirtree as d WITH (NOLOCK)
OUTER APPLY (SELECT MAX(ID) as MID FROM #dirtree as i WITH (NOLOCK)
 WHERE i.Depth + 1 = d.Depth and i.ID < d.ID ) as p;
GO
;WITH RecCTE as (
 SELECT CAST('C:\T\' + Dir as VARCHAR(MAX)) as Dir, Depth, ID
 FROM #dirtree WHERE Parent is Null
 UNION ALL
 SELECT CAST(p.Dir + '\' + d.Dir as VARCHAR(MAX)), d.Depth, d.ID
 FROM #dirtree as d INNER JOIN RecCTE as p ON d.Parent = p.ID
) SELECT Dir, Depth FROM RecCTE ORDER BY ID

I was not surprised when I've got following error message:
Msg 530, Level 16, State 1, Line 60
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
I've built too many sub-folders that my recursive CTE couldn't handle so many levels of depth.

To fix this problem I've had to increase maximum recursion level by adding one more statement to my last query:
OPTION (MAXRECURSION 125)
And I've got following results:


Monday, April 10, 2017

SQL Server on Linux: Specifying File Path. Possible problems.

NOTE: Please be aware that all scripts work only as of publishing date and with SQL Server on Linux CTP 1.4. In future releases some features/bugs may disappear.

As you might know Linux file system is different from Windows.
Linux does not have drive letters like "A:\","B:\","C:\","D:\","E:\" etc.
It has only root folder: "/" and all devices, services, mapped network resources are linked as underlying sub-folders.

SQL Server files are located under Linux following folder: "/var/opt/mssql/".
Knowing that, you can reference SQL server files in that "Linux way".
However, Microsoft did extremely good job to satisfy current customers, it also translated "Linux path" in "Windows way" just by adding drive "C:\" letter instead of root folder. Now "C:\" is our root!

So, in that manner, SQL Server files will be located in "c:\var\opt\mssql\" directory!
Isn't it simple?!!!
There will be no effort at all to refurbish old SQL SQL Server code for new Linux platforms!

Here I'll give you an example of Database creation using different addressing methods:
1. I use Windows notation for data file and Linux notation for log file.
2. Because Linux systems are case sensitive it is very important to know it when you work with Linux. However, Microsoft allows you to be reluctant. You can use upper or lower case in the order you want.
Use Master
GO
CREATE DATABASE [LinuxFilePathDemo]
ON ( NAME = 'LinuxFilePathDemo_dat', SIZE = 1, MAXSIZE = 1,
 FILENAME = 'c:\var\opt\mssql\USERDATA\LinuxFilePathDemo.mdf')  
LOG ON (NAME = 'LinuxFilePathDemo_log', SIZE = 1, MAXSIZE = 1,
 FILENAME = '/var/opt/mssql/USERLOGS/LinuxFilePathDemo.ldf');  
GO  
SELECT name, type_desc, physical_name
FROM sys.master_files
WHERE database_id = DB_ID('LinuxFilePathDemo');
GO

As you can see SQL Server accepted your request!

At that point Linux Geeks will start laughing at you:
Do you know now WHERE you files really are?

Lets check our folders:

What a surprise! Both folders are completely empty!!!!

Lets check somewhere else:
SQL server placed our file in folders with lower case "userdata" and "userlogs"!

Do you think SQL Server just converted upper case to lower case, but still reported to us that folder names "USERDATA" and "USERLOGS" in Upper case?

It is kind of true, but still wrong. SQL Server does something completely different.
It looks like if file system has two folders with similar names, which differentiate only by case than SQL Server chooses case insensitively the one, but there is no correlation which one will be chosen.

Error Case with "case" different folders.

In this case I'll show huge potential problem you could have:

Step 1. Create new folder in lower case "/var/opt/mssql/testfolder":

sudo -u mssql mkdir /var/opt/mssql/testfolder

Step 2. Create New database "Test":

CREATE DATABASE [Test] ON ( NAME = 'Test', 
FILENAME = '/var/opt/mssql/testfolder/Test.mdf')  
LOG ON (NAME = 'Testlog', SIZE = 1, MAXSIZE = 1, 
FILENAME = '/var/opt/mssql/testfolder/Test.ldf');  
GO

Step 3. Verify that new "Test" database is Online:

SELECT LEFT(name,22) Name, LEFT(state_desc,20) State 
FROM sys.databases WHERE name = 'Test';
GO

Step 4. Verify that database files are in the place where they supposed to be:

sudo ls -l /var/opt/mssql/testfolder

Step 5. Create new folder in upper case "/var/opt/mssql/TESTFOLDER"

sudo -u mssql mkdir /var/opt/mssql/TESTFOLDER

Step 6. Restart SQL Server service

sudo systemctl restart mssql-server

Step 7. Verify that new "Test" database is Online:

SELECT LEFT(name,22) Name, LEFT(state_desc,20) State 
FROM sys.databases WHERE name = 'Test';
GO
At that step we already have an error in the error log: "The system cannot find the file specified"

Step 8. Delete "Test" database:

DROP DATABASE [Test];
GO

Step 9. Check for missing database files where they are supposed to be:

sudo ls -l /var/opt/mssql/testfolder
And database files are still in the folder.

See screen shot of all 9 steps:

I'm assuming that after SQL Server restart it was looking for "Test" database files in new "/var/opt/mssql/TESTFOLDER" and not in "/var/opt/mssql/testfolder", where we actually created them.

Conclusion:

While working with SQL Server on Linux avoid "duplicated" folder and file names differentiated only by case.

Monday, April 3, 2017

SSMS Custom Query shortcuts Update.

Haven't touched that area for a long while.
Here is the link to my original post with all descriptions and implementation process:
http://slavasql.blogspot.com/2016/02/ssms-query-shortcuts.html

In this post I'll only list changes made since a year ago:

The newest version of SSMS Query Shortcuts is available HERE.

Ctrl+4 Changes/Improvements

- Fixed an issue when database collation is different from a server;

Ctrl+5 Changes/Improvements

- Added statistical performance dataset for all mounted drives used by SQL Server;
- Added column: "# of VLFs" for Log files;
- Added column: "% of CPU usage" by database;
- Added column: Database "Access state";
- Added column: Database "Mode";

Ctrl+5: Drives' stats + couple of new columns:

Ctrl+6 Changes/Improvements

- Fixed memory and "Buffer cache hit ratio, %" metrics;
- Added essential SQL Server counters' changes for a time frame capturing (Wx-option);
- Added I/O SQL Server counters' changes for a time frame capturing (Wx-option);
- CPU diagram is now reversed and in only two colors;
- Added currently set Trace Flags list data set into informational option  (I-option) ;
- Added list of SQL related Services into informational option  (I-option) ;

When you type would say "W10" (wait for 10 seconds) then select it by a cursor and press Ctrl+6 the script will capture Wait statistics, essential SQL metrics and I/O metrics. Then script waits for 10 seconds and compares previous values with the new ones and reports any difference.
That option can bring a lot of information telling you what SQL Server is doing right now and what potential bottleneck can be.

That is CPU usage diagram you get after simple "Ctrl+6". After it changed to reverse order it is easier to define timing. For instance on the diagram below you can see that SQL Server experienced very heavy CPU usage about 260-270 minutes ago.

Ctrl+8 Changes/Improvements

- Fixed diagram's scale by implementing logarithmic measuring;
- Top 10 diagram includes "Distinct Top 10" combination of MAX CPU, MAX I/O, & MAX # of executions. Number of top queries can potentially rise up to 30;

Now Query executions' diagram can have more than 10 queries and because it is in logarithmic scale now there are no hassle with circle sizing.
Axis-X represents amount of CPU used by a query
Axis-Y represents amount of I/O used by a query
Circle radius represents number of query executions.
If you move cursor to a particular circle you'll see Queries metrics
To see more details on the query you want to research, such as query text, execution plan, etc.
You have to copy-paste "query_hash" value to an editor window, select it and press Ctrl+8 again.