harbu.org

Group by Age Intervals and Gender in SQL

October 26, 2014

Yesterday, I was confronted with the need to count the number of users per age and gender in a table of registered users, in order to produce a nice bar chart. This can quite easily be achieved in (postgres)SQL as demonstrated below.

SELECT age
  , gender
  , COUNT(*) n
FROM members
GROUP BY age, gender
ORDER BY age, gender

However, depending on your data this may result in quite too many rows. To reduce information overload, one may want to group the ages into intervals. However doing this manually in application code can be kind of tedious.

Thankfully on StackOverflow someone had answered a similar question and I adapted the response to this particular scenario by doing the following:

SELECT
  FLOOR((age - <LOWER_LIMIT>)/<INTERVAL_SIZE>) * <INTERVAL_SIZE> + LOWER_LIMIT
    AS segment
  , gender
  , COUNT(*) n
FROM members
GROUP BY segment, gender
ORDER BY segment, gender

where <LOWER_LIMIT> should be replaced with the lowest age occuring in the data and <INTERVAL_SIZE> is the number of distinct ages to put into one interval. This produces a nice result set where each row corresponds to an age interval/gender pair. The segment column then represents the (inclusive) lower limit of each interval.

To make the query more automatic, we can pull the minimum age straight from the table leaving the parameter <INTERVAL_SIZE> as the only one that needs to be explicity specified.

SELECT
  FLOOR((age - min_age)/<INTERVAL_SIZE>) * <INTERVAL_SIZE> + min_age
    AS segment
  , gender
  , COUNT(*) n
FROM members
  , (SELECT MIN(age) min_age FROM members) temp
GROUP BY segment, gender
ORDER BY segment, gender