Wise(ish) Words of a CodeFuMonkey: SQL

Showing posts with label SQL. Show all posts

Saturday, June 14, 2025

SQL Exercise From Brent Ozar

This is coming from: https://www.brentozar.com/archive/2025/06/query-exercise-return-routes-in-the-right-order/

I started this exercise when Brent first posted it, but misread the directions at first and accidentally got most of the right answer, but in the wrong way. I wasn't really selecting the true next Station, I was just iterating through them all and picking up the one that should be next. So I set it aside and forgot about it.

Now that all the answers are out, I was reminded that I needed to finish. Before looking at the answers, I went back to try my hand again. I went a recursive CTE route, but I was unhappy with the Query Plan. I did get the right answer, but it looked gross to me. However, it worked, and it was fewer than 20 lines, so I wasn't too unhappy. It did take me much longer to noodle through than it should have: about 2 hours, going down a couple of wrong paths before I finally found one I was happy with.

; WITH StationList AS (
  ---- GENERATE THE LIST WITH THE NextStationName.
  SELECT s.StationID
    , s.StationName
    , s.StationPhysicalOrder
    , COALESCE( sro.StationToName, LEAD(s.StationName) OVER (ORDER BY s.StationPhysicalOrder) ) AS NextStationName
    ---- If there's an Override Station, use that. Otherwise use regular Station.
  FROM Stations s
  LEFT OUTER JOIN StationRoutingOverride sro ON sro.StationFromName = s.StationName
)
, NO2 AS (
  ---- RECURSIVE CTE TO GET NEXT STATION
  SELECT sl.StationName, sl.NextStationName, rn=1
  FROM StationList sl WHERE sl.StationName = 'A'
  ---- sl.StationName can be changed to start from any station.

  UNION ALL

  SELECT sl2.StationName, sl2.NextStationName, rn+1 AS rn
  FROM StationList sl2
  INNER JOIN NO2 ON no2.NextStationName = sl2.StationName
  ---- Last StationName will have NULL NextStationName and stop the recursion.
  WHERE rn <= 99 ---- To prevent infinite loops
)
SELECT StationName FROM NO2

When I looked at some of the other answers, I was pretty impressed. Swarts STRING_AGG() was pretty cool, and I was glad to see that a lot of the other answers went the recursion route.

I decided to see how mine performed, so I ginned up some more data into the tables to throw at these queries. I dumped a million rows into the Stations table, using

INSERT INTO Stations (StationName, StationPhysicalOrder)
SELECT CAST( NEWID() AS varchar(50) ), n
FROM numbers
WHERE n > 19

to generate some extra (hopefully unique) data. Then created about 500 Override rows. I figured this would give the queries some pretty decent exercise.

DROP TABLE IF EXISTS #tmpInsertOverrrides1
CREATE TABLE #tmpInsertOverrides1 ( rn int identity, f varchar(50) )

DROP TABLE IF EXISTS #tmpInsertOverrrides2
CREATE TABLE #tmpInsertOverrides2 ( rn int identity, t varchar(50) )

INSERT INTO #tmpInsertOverrides1 (f)
SELECT StationName
FROM stations
WHERE StationID IN ( SELECT TOP 500 n FROM numbers ORDER BY NEWID() )

INSERT INTO #tmpInsertOverrides2 (t)
SELECT StationName
FROM stations
WHERE StationID IN ( SELECT TOP 500 n FROM numbers ORDER BY NEWID() )
AND NOT EXISTS (SELECT 1 FROM #tmpInsertOverrides1 WHERE f = StationName)

SELECT * FROM #tmpInsertOverrides2

INSERT INTO StationRoutingOverride (StationFromName, StationToName)
SELECT f, t
FROM #tmpInsertOverrides1 t1
INNER JOIN #tmpInsertOverrides2 t2 ON t1.rn = t2.rn

Again, I tried to be careful to make sure that I didn't have any dupes and that I didn't create any infinite loops. I haven't really done much more than a cursory check on the data to make sure it's good.

I reran my query against the beefed up data. Mine ran in about 2:23 and returned 100 rows. But that was also with the Query Plan turned on.

I tried the string stuffing query, but it broke because my string was too long. (STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation.) That was my fault with the test data. Next one I tried was the function-based approach. But after 13 minutes, I abandoned that one. I'm still not sure if it was something in my data. The query still looks pretty cool. Michael Swart has some pretty cool query ideas.

I tried the Ugly Stations one from Greg Dodd. That one exceeded max recursion, so I didn't fully test at one. Again, the query met the rules, this was my ridiculous data that I threw this at.

I next tried Ilya Makarov's CTE solution. The OUTER APPLY solution was pretty slick. I had initially tried an OUTER APPLY but couldn't get it to work like I wanted. This one ran for almost 2 minutes before it too hit Max Recursion and failed. Again, my data's fault.

Then came Michael Bogdanov's answer. It too crashed on Max Recursion, but it did it in just 7 seconds. This one was flying.

The last one I tried was the Brute Force version by Brian Boodman. It ran for over 7 minutes before I had to kill it. It angered my SSMS.

This was an interesting exercise. And all of these answers followed the rules as they were laid out. Things didn't get weird with some of them until I threw a stupid amount of data at them. That'll cause any query to break, if it wasn't intended to run against that data.

To be fair, the first time I ran my query against the bulky data, it too crashed on Max Recursion. So I took a couple of minutes to refactor part of it and put in a check to stop it if it recursed too far. I'm kind of impressed that it actually finished. And after seeing that it finished quicker than some of the others, I'm not as disappointed with it as I was at first. But again, I also had a bit of an insider's view of the new data I added. That wasn't in the original requirements, so please nobody take this as me bashing their queries. I changed the rules at the end game, and I'm impressed with most of the answers to the given problem. I have a tendency to over-think the negative possibilities which often leads me to over-engineer the solution to the problem.

I think I may have been a bit overzealous with my fake data.

Also, be sure to check out Brent's fantastic blog.

Wednesday, February 10, 2016

Oracle Number Formatting

This may be something you all already know, but it had me scratching my head for a minute. On the one hand, Oracle’s analytic functions like LAST_VALUE make some things kinda cool, but then Oracle smacks me with not handling number masking like pretty much every other language.

Even if you specify a decimal place mask, if the number ends with a 0 in the decimal, it will drop it.

Example:
1) To_Number(44.50,’99.99’) >> 45.5
2) To_Number(100/10.0,’99.99’) >> 10 <<< This one even drops the decimal and returns an integer 10, which leads to a whole different philosophical discussion about the differences between 10 and 10.00.

To get the decimal places mask that we want in Oracle, we have to use To_Char(44.50,’99.99’). But then that value is a string, so you won’t be able to do math on it. Except in ColdFusion, where it’s a numberish string. So to actually format it, we have to remask it in whatever code we use to output the value – ie in ColdFusion = NumberFormat(44.5,’99.90’) >> 44.50 and NumberFormat(100/10.0,’99.00’) >> 10.00

Add this to ColdFusion’s method of determining how to parse a value into JSON, and it makes things super fun. :-/

Moral of my story: If you need a number to be in a specific format, don’t assume the database will give it back to you the way you thought you told it to.

Monday, December 21, 2015

Oracle Goodness

I've spent years in SQL Land, and I consider myself fairly well informed about data. But after working with some of the people I have, I realize that there is so much that I don't know. I do know enough to do what I need to do without causing smoke and fire and other general mayhem, but just about everything I know now I've had to discover and learn myself. I've had a few good mentors that pointed out ways to tell if I'm writing slow, inefficient queries (and I've done plenty of that), and my views have shifted towards DB performance in my queries. I've also come to realize why software developers and DBAs often seem to be at odds with each other. Some of the worst queries I ever wrote were when I was writing in-code queries and didn't understand their impact. One of my previous databases had several related tables with 10s of millions of rows each. That's not BIG data in the modern sense, but that's A LOT of data. More than enough to let me easily see the difference between my code-based reporting query that ran in 2 hours and the DBA-optimized SQL that ran in 2 seconds. The SQL looked weird, but it was MUCH faster. Big lesson learned there. A database isn't just a place to store and retrieve data. You can get lots more from it by knowing how to ask nicely rather than trying to brute force the data you want. It's about getting SETS of data rather than simply trying to filter data. And it's about letting the database do the job it was made to do.

I like T-SQL, and I learned quite a bit about it in my reporting. And now that I'm back in Developer Land and making the move to an Oracle system, I'm seeing a few things I don't like about PL-SQL, but that's likely because I'm just not used to it. Only time will tell.

However, there are a few things I do like quite a bit. For instance, when making a new stored procedure (my current task), I used to have to do something like this:

IF OBJECT_ID('xxxxx') IS NULL EXEC('CREATE PROCEDURE xxxxx ...') GO 
ALTER PROCEDURE xxxxx ...

Or do several lines of logic with checking sproc existence in a system view. It wasn't hard, but it was long and involved.

MySQL was a bit better, but that wasn't my primary environment:

DROP PROCEDURE IF EXISTS xxxxx ;
CREATE PROCEDURE xxxxx ...

Though dropping and recreating db objects is never nice to existing permissions. :-/

With Oracle, I get to do:

CREATE OR REPLACE xxxxx ...

It's nice to have that kind of functionality already built in without having to worry about killing permissions. SO much easier.

Now, on to build some sprocs....