Home Take batches of records for each rank, then JOIN, then LIMIT 1 in postgres
Reply: 0

Take batches of records for each rank, then JOIN, then LIMIT 1 in postgres

user1871
1#
user1871 Published in April 21, 2018, 3:28 pm

I'm trying to improve the performance of my query. From the EXPLAIN ANALYZE I understand that my query considers too many songs records when I think it's not necessary.

There are three tables artists(artist_id, score), songs(song_id, artist_id) and listened(song_id).

My current query looks like this:

WITH artists_ranked AS (
    SELECT
      artist_id
      , rank() OVER (ORDER BY score ) rnk
    ORDER BY rnk ASC
),
    not_listened_songs AS (
      SELECT *
      FROM songs
      WHERE NOT EXISTS(
          SELECT 1
          FROM listened
          WHERE listened.song_id = songs.song_id) -- bad: I go through all songs
  ),
    shuffled_songs AS (
      SELECT *
      FROM artists_ranked
        JOIN not_listened_songs ON not_listened_songs.artist_id = artists_ranked.artist_id
      ORDER BY random() --bad: I shuffle all songs
  )
SELECT DISTINCT ON (artist_id) *
FROM shuffled_songs
LIMIT 1;

Ideally (at least in my mind), the query should follow these steps:

  1. Rank the artists table by rating.
  2. Take a batch of artists with the highest rating. Can be one or multiple artists.

  3. Join with the table songs, but exclude already listened songs.

  4. Now we want to pick one random song, by giving each of the artists equal chance. ORDER BY random(), DISTINCT BY (artist_id), LIMIT 1

  5. If there is such song, we stop and return it. Otherwise, take the next batch of artists (with closest lower rank) and repeat the steps.

    • To stop, either a song is returned (very likely in just after few iterations) or all artists have been considered.

Thank you.

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.308436 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO