Recent work with reporting statistical data gave me an idea to use SQL Server Management Studio to build a Box-and-Whisker Plot right in SQL.
If you have no idea what Box-and-Whisker Plot is, please visit following link: http://www.wellbeingatschool.org.nz/information-sheet/understanding-and-interpreting-box-plots
At first, I will show how to do it based on AdventureWorks database in SQL Server 2014.
We will analyze amounts of Individual lines of Sales Orders within each month.
The first step is to create a Data Set to process. That Data Set will contain a Month, Single Line amount and order number of that record within a month.
Here is how I did it:
Here is TOP 10 of what I have in my Data Set:
Here is the Main script to extract the data into a Box-and-Whisker Plot:
Here is the diagram I've got:
1. We can divide the diagram on three about equal sections by Inter-quartile range: Wide, Medium and Short. All three sections are skewed top.
2. In the first section with Wide Inter-quartile range we see three outliers where data skeweddown.
3. The middle section has pretty stable distribution of first three quartiles
4. In the third, Short Inter-quartile range section, we also have three outliers with very condense data set.
5. All data sets, excluding the very last one, have oulier values at the top.
What else I can say without knowing particular business details just by looking at the data?
Obviously, Box-and-Whisker Plot alone, does not give us anything else.
That means we have to add some other metrics to better understand the business situation.
In that script I include two additional lines: Total amount of sales lines and total number of sale lines:
Here is what I have as a result:
Red line indicates Total Monthly Amount and Blue line indicates total number of items in the data set.
Those two lines proove stability of sales' data in the middle section and lead us to an explanation of Short Inter-quartile range in the third section. As you can see, at the right side of the diagram drammatically increased number of individual sales and outperformed rise of the total amount. That means that sales amount in the third section were generally very small.
From this analysis I can conclude that Box-and-Whisker Plot can help us in certain situations, but does not represent the whole picture without knowing number of records in the individual data set.
At the end, will draw one more diagram based on Celestial data (For more details see my earlier blog post: http://slavasql.blogspot.com/2015/01/skymap.html)
Here is the data set creation for celestial data.
Here I've extracted Magnitude of a star and categorized the by First letter of their spectrum.
Here is a Box-and-Whisker Plot I've built from it:
From this diagram I can conclude that most of the stars in all spectrums (with couuple of exceptions) are tended to be with about same density level and have obout the same magnitude.
Not having outliers at the bottom just an indication of small number of stars in the data set.
That would be interesting to plot trades for most traded stocks on NYSE.