Home Querying all past and future round birthdays
Reply: 1

Querying all past and future round birthdays

Stephan
1#
Stephan Published in 2018-01-12 23:10:43Z

I got the birthdates of users in a table and want to display a list of round birthdays for the next n years (starting from an arbitrary date x) which looks like this:

 +----------------------------------------------------------------------------------------+
 | Name   | id | birthdate  | current_age | birthday   | year | month | day | age_at_date |
 +----------------------------------------------------------------------------------------+
 | User 1 | 1  | 1958-01-23 | 59          | 2013-01-23 | 2013 | 1     | 23  | 55          | 
 | User 2 | 2  | 1988-01-29 | 29          | 2013-01-29 | 2013 | 1     | 29  | 25          | 
 | User 3 | 3  | 1963-02-12 | 54          | 2013-02-12 | 2013 | 2     | 12  | 50          | 
 | User 1 | 1  | 1958-01-23 | 59          | 2018-01-23 | 2018 | 1     | 23  | 60          | 
 | User 2 | 2  | 1988-01-29 | 29          | 2018-01-29 | 2018 | 1     | 29  | 30          | 
 | User 3 | 3  | 1963-02-12 | 54          | 2018-02-12 | 2018 | 2     | 12  | 55          | 
 | User 1 | 1  | 1958-01-23 | 59          | 2023-01-23 | 2023 | 1     | 23  | 65          | 
 | User 2 | 2  | 1988-01-29 | 29          | 2023-01-29 | 2023 | 1     | 29  | 35          | 
 | User 3 | 3  | 1963-02-12 | 54          | 2023-02-12 | 2023 | 2     | 12  | 60          | 
 +----------------------------------------------------------------------------------------+

As you can see, I want to be "wrap around" and not only show the next upcoming round birthday, which is easy, but also historical and far future data.

The core idea of my current approach is the following: I generate via generate_series all dates from 1900 till 2100 and join them by matching day and month of the birthdate with the user. Based on that, I calculate the age at that date to select finally only that birthdays, which are round (divideable by 5) and yield to a nonnegative age.

WITH
  test_users(id, name, birthdate) AS (
    VALUES
      (1, 'User 1', '23-01-1958' :: DATE),
      (2, 'User 2', '29-01-1988'),
      (3, 'User 3', '12-02-1963')
  ),
  dates AS (
    SELECT
      s                     AS date,
      date_part('year', s)  AS year,
      date_part('month', s) AS month,
      date_part('day', s)   AS day
    FROM generate_series('01-01-1900' :: TIMESTAMP, '01-01-2100' :: TIMESTAMP, '1 days' :: INTERVAL) AS s
  ),
  birthday_data AS (
    SELECT
      id                                                                                AS member_id,
      test_users.birthdate                                                              AS birthdate,
      (date_part('year', age((test_users.birthdate)))) :: INT                           AS current_age,
      date :: DATE                                                                      AS birthday,
      date_part('year', date)                                                           AS year,
      date_part('month', date)                                                          AS month,
      date_part('day', date)                                                            AS day,
      ROUND(extract(EPOCH FROM (dates.date - birthdate)) / (60 * 60 * 24 * 365)) :: INT AS age_at_date
    FROM test_users, dates
    WHERE
      dates.day = date_part('day', birthdate) AND
      dates.month = date_part('month', birthdate) AND
      dates.year >= date_part('year', birthdate)
  )

SELECT
  test_users.name,
  bd.*
FROM test_users
LEFT JOIN birthday_data bd ON bd.member_id = test_users.id
WHERE
  bd.age_at_date % 5 = 0 AND
  bd.birthday BETWEEN NOW() - INTERVAL '5' YEAR AND NOW() + INTERVAL '10' YEAR
ORDER BY bd.birthday;

My current approach seems to be very inefficient and rather complicated: It takes >100ms. Does anybody have an idea for a more compact and performant query? I am using Postgresql 9.5.3. Thank you!

Martin
2#
Martin Reply to 2018-01-13 13:18:46Z

Maybe try to join the generate series:

create table bday(id serial, name text, dob date);
insert into bday (name, dob) values ('a', '08-21-1972'::date);
insert into bday (name, dob) values ('b', '03-20-1974'::date);

select * from bday , 
lateral( select generate_series( (1950-y)/5  , (2010-y)/5)*5 + y  as year
         from (select date_part('year',dob)::integer as y) as t2 
       ) as t1;

This will for each entry generate years between 1950 and 2010.

You can add a where clause to exclude people born after 2010 (they cant have a birthday in range) Or exclude people born before 1850 (they are unlikely...)

-- Edit (after your edit):

So your generate_series creates 360+ rows per annum. In 100 years that is over 30.000. And they get joined to each user. (3 users => 100.000 rows)

My query generates only rows for years needed. In 100 years that is 20 rows. That means 20 rows per user.

By dividing by 5, it ensures that the start date is a round birthday. (1950-y)/5) calculates how many round birthdays there were before 1950.

A person born in 1941 needs to skip 1941 and 1946, but has a round birthday in 1951. So that is the difference (9 years) divided by 5, and then actually plus 1 to account for the 0st.

If the person is born after 1950 the number is negative, and greatest(-1,...)+1 gives 0, starting at the actual birthday year.

But actually it should be

select * from bday , 
lateral( select generate_series( greatest(-1,(1950-y)/5)+1, (2010-y)/5)*5 + y  as year
         from (select date_part('year',dob)::integer as y) as t2 
       ) as t1;

(you may be doing greatest(0,...)+1 if you want to start at age 5)

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.294274 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO