Sunday, June 15, 2025

Saturday, June 14, 2025

SQL Exercise From Brent Ozar

This is coming from:  https://www.brentozar.com/archive/2025/06/query-exercise-return-routes-in-the-right-order/

I started this exercise when Brent first posted it, but misread the directions at first and accidentally got most of the right answer, but in the wrong way. I wasn't really selecting the true next Station, I was just iterating through them all and picking up the one that should be next. So I set it aside and forgot about it.

Now that all the answers are out, I was reminded that I needed to finish. Before looking at the answers, I went back to try my hand again. I went a recursive CTE route, but I was unhappy with the Query Plan. I did get the right answer, but it looked gross to me. However, it worked, and it was fewer than 20 lines, so I wasn't too unhappy. It did take me much longer to noodle through than it should have: about 2 hours, going down a couple of wrong paths before I finally found one I was happy with.

; WITH StationList AS (
  ---- GENERATE THE LIST WITH THE NextStationName.
  SELECT s.StationID
     , s.StationName
     , s.StationPhysicalOrder
    , COALESCE( sro.StationToName, LEAD(s.StationName) OVER (ORDER BY s.StationPhysicalOrder) ) AS NextStationName
     ---- If there's an Override Station, use that. Otherwise use regular Station.
  FROM Stations s
  LEFT OUTER JOIN StationRoutingOverride sro ON sro.StationFromName = s.StationName
)
, NO2 AS (
  ---- RECURSIVE CTE TO GET NEXT STATION
  SELECT sl.StationName, sl.NextStationName, rn=1
  FROM StationList sl WHERE sl.StationName = 'A'
  ---- sl.StationName can be changed to start from any station.

  UNION ALL

  SELECT sl2.StationName, sl2.NextStationName, rn+1 AS rn
  FROM StationList sl2
  INNER JOIN NO2 ON no2.NextStationName = sl2.StationName
  ---- Last StationName will have NULL NextStationName and stop the recursion.
  WHERE rn <= 99 ---- To prevent infinite loops
)
SELECT StationName FROM NO2

When I looked at some of the other answers, I was pretty impressed. Swarts STRING_AGG() was pretty cool, and I was glad to see that a lot of the other answers went the recursion route.

I decided to see how mine performed, so I ginned up some more data into the tables to throw at these queries. I dumped a million rows into the Stations table, using

INSERT INTO Stations (StationName, StationPhysicalOrder)
SELECT CAST( NEWID() AS varchar(50) ), n
FROM numbers
WHERE n > 19

to generate some extra (hopefully unique) data. Then created about 500 Override rows. I figured this would give the queries some pretty decent exercise.

DROP TABLE IF EXISTS #tmpInsertOverrrides1
CREATE TABLE #tmpInsertOverrides1 ( rn int identity, f varchar(50) )

DROP TABLE IF EXISTS #tmpInsertOverrrides2
CREATE TABLE #tmpInsertOverrides2 ( rn int identity, t varchar(50) )

INSERT INTO #tmpInsertOverrides1 (f)
SELECT StationName
FROM stations
WHERE StationID IN ( SELECT TOP 500 n FROM numbers ORDER BY NEWID() )

INSERT INTO #tmpInsertOverrides2 (t)
SELECT StationName
FROM stations
WHERE StationID IN ( SELECT TOP 500 n FROM numbers ORDER BY NEWID() )
AND NOT EXISTS (SELECT 1 FROM #tmpInsertOverrides1 WHERE f = StationName)

SELECT * FROM #tmpInsertOverrides2

INSERT INTO StationRoutingOverride (StationFromName, StationToName)
SELECT f, t
FROM #tmpInsertOverrides1 t1
INNER JOIN #tmpInsertOverrides2 t2 ON t1.rn = t2.rn

Again, I tried to be careful to make sure that I didn't have any dupes and that I didn't create any infinite loops. I haven't really done much more than a cursory check on the data to make sure it's good.

I reran my query against the beefed up data. Mine ran in about 2:23 and returned 100 rows. But that was also with the Query Plan turned on.

I tried the string stuffing query, but it broke because my string was too long. (STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation.) That was my fault with the test data. Next one I tried was the function-based approach. But after 13 minutes, I abandoned that one. I'm still not sure if it was something in my data. The query still looks pretty cool. Michael Swart has some pretty cool query ideas.

I tried the Ugly Stations one from Greg Dodd. That one exceeded max recursion, so I didn't fully test at one. Again, the query met the rules, this was my ridiculous data that I threw this at.

I next tried Ilya Makarov's CTE solution. The OUTER APPLY solution was pretty slick. I had initially tried an OUTER APPLY but couldn't get it to work like I wanted. This one ran for almost 2 minutes before it too hit Max Recursion and failed. Again, my data's fault.

Then came Michael Bogdanov's answer. It too crashed on Max Recursion, but it did it in just 7 seconds. This one was flying.

The last one I tried was the Brute Force version by Brian Boodman. It ran for over 7 minutes before I had to kill it. It angered my SSMS.


This was an interesting exercise. And all of these answers followed the rules as they were laid out. Things didn't get weird with some of them until I threw a stupid amount of data at them. That'll cause any query to break, if it wasn't intended to run against that data.

To be fair, the first time I ran my query against the bulky data, it too crashed on Max Recursion. So I took a couple of minutes to refactor part of it and put in a check to stop it if it recursed too far. I'm kind of impressed that it actually finished. And after seeing that it finished quicker than some of the others, I'm not as disappointed with it as I was at first. But again, I also had a bit of an insider's view of the new data I added. That wasn't in the original requirements, so please nobody take this as me bashing their queries. I changed the rules at the end game, and I'm impressed with most of the answers to the given problem. I have a tendency to over-think the negative possibilities which often leads me to over-engineer the solution to the problem.

I think I may have been a bit overzealous with my fake data.


Also, be sure to check out Brent's fantastic blog.

Brent Ozar