I'm trying to improve the performance of my query. From the
EXPLAIN ANALYZE I understand that my query considers too many
songs records when I think it's not necessary.
There are three tables
songs(song_id, artist_id) and
My current query looks like this:
WITH artists_ranked AS (
, rank() OVER (ORDER BY score ) rnk
ORDER BY rnk ASC
not_listened_songs AS (
WHERE NOT EXISTS(
WHERE listened.song_id = songs.song_id) -- bad: I go through all songs
shuffled_songs AS (
JOIN not_listened_songs ON not_listened_songs.artist_id = artists_ranked.artist_id
ORDER BY random() --bad: I shuffle all songs
SELECT DISTINCT ON (artist_id) *
Ideally (at least in my mind), the query should follow these steps:
- Rank the
artists table by rating.
Take a batch of artists with the highest rating. Can be one or multiple artists.
Join with the table
songs, but exclude already
Now we want to pick one random song, by giving each of the artists equal chance.
ORDER BY random(),
DISTINCT BY (artist_id),
If there is such song, we stop and return it. Otherwise, take the next batch of artists (with closest lower rank) and repeat the steps.
- To stop, either a song is returned (very likely in just after few iterations) or all artists have been considered.