Top n Window Function queries over a specific subset of data are common in analysis and reporting requirements. Luckily, in MySQL, there are Window functions we can use for this type of query. To be quite honest, you don’t necessarily need Window Functions. You can retrieve those top 3 (or whatever) types of results with a regular SQL query. But, since we have those powerful Window Functions, why not use them? My thoughts exactly! Besides, no one wants a spaghetti code mess of SQL to try and understand. Not to mention, Window functions are often better optimized for querying larger data sets. Continue reading and see example queries for more understanding…
OS and DB used:
- Linux Mint 20 “Ulyana”
- MySQL 8.0.22
Self-Promotion:
If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!
Here is the description of the target table I am working with for this blog post. If you have read any of my other posts, you may recognize this table as I tend to use it widely for example queries. The ‘walking_stats’ table stores my personal walking data, that I track while working on my fitness and to maintain a healthier weight:
+————–+————–+——+—–+———+——-+
| Field | Type | Null | Key | Default | Extra |
+————–+————–+——+—–+———+——-+
| day_walked | date | YES | | NULL | |
| cal_burned | decimal(4,1) | YES | | NULL | |
| miles_walked | decimal(4,2) | YES | | NULL | |
| duration | time | YES | | NULL | |
| mph | decimal(2,1) | YES | | NULL | |
| shoe_id | int | YES | | NULL | |
+————–+————–+——+—–+———+——-+
6 rows in set (0.0034 sec)
There are just over 400 rows of data in the table ‘walking_stats’:
FROM walking_stats;
409
Top n Window Function query: Top 3 days
I store the amount of time – in hours, minutes, and seconds – spent walking in the ‘duration’ column. Suppose I want to find the ‘top 3’ number of rows – meaning the longest walks time-wise – over a 6-month span, broken down by each month? This type of question can be easily answered using Top n Window Function queries.
I’ll make use of a CTE instead of a subquery or derived table and generate the rows based on a WHERE
clause that restricts the results to the year 2020 within the months of January to June.
Let’s look at the overall query:
WITH top_3 AS (
SELECT day_walked, cal_burned, duration,
ROW_NUMBER() OVER(PARTITION BY EXTRACT(MONTH FROM day_walked) ORDER BY duration DESC) AS row_num
FROM walking_stats
WHERE EXTRACT(YEAR FROM day_walked) = 2020 —all walks are in the year 2020
AND EXTRACT(MONTH FROM day_walked) <= 6) —all walks are in the months from January to June
SELECT *
FROM top_3
WHERE row_num <= 3;
Below are the query results:
In the screen-shot above, I have segregated the groups of rows using red rectangles to create some distinction between each set of top 3 duration rows for each month.
So how does this query work?
Top n Window Function query: Ranking and partitioning data
To establish a ranking amongst the rows, I use the ROW_NUMBER()
Window function which assigns a continuous, incrementing integer value to rows based on the ORDER BY
clause sort criteria within the OVER()
clause.
Related: Read, ROW_NUMBER() Window Function – find duplicate values, where I cover examples of using this handy Window Function to remove duplicate rows of data from a table.
For this specific query, I order the rows in descending order (greatest to least) for the ‘duration’ column, placing the lengthier times first in the query results order. The PARTITION BY
clause sub-groups the rows based on each month as returned by the EXTRACT()
DATE
function call.
Then all that’s left is to filter the returned rows based on the ‘row_num’ column alias in the outer query. I use less than or equal to 3, for a top 3 result, but you could use top n Window Function queries for whatever your specific requirements are at this point.
Window Functions informational resources
Be sure and read any of the below blog posts for more information on Window Functions:
- Rolling sum and average – Window Functions MySQL
- RANK() and DENSE_RANK() differences
- ROWS and RANGE: Windowing clause defaults – learning by example in MySQL.
Window Functions are so powerful and fun to use. What are your favorite Window function tricks? Let me know in the comments below.
Like what you have read? See anything incorrect? Please comment below and thank you for reading!!!
A Call To Action!
Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.
Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.
To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)
Be sure and visit the “Best Of” page for a collection of my best blog posts.
Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.
Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, are performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.
The post Top n Window Function queries in MySQL appeared first on Digital Owl’s Prose.