Group by Age Intervals and Gender in SQL
October 26, 2014
Yesterday, I was confronted with the need to count the number of users per age and gender in a table of registered users, in order to produce a nice bar chart. This can quite easily be achieved in (postgres)SQL as demonstrated below.
SELECT age , gender , COUNT(*) n FROM members GROUP BY age, gender ORDER BY age, gender
However, depending on your data this may result in quite too many rows. To reduce information overload, one may want to group the ages into intervals. However doing this manually in application code can be kind of tedious.
Thankfully on StackOverflow someone had answered a similar question and I adapted the response to this particular scenario by doing the following:
SELECT FLOOR((age - <LOWER_LIMIT>)/<INTERVAL_SIZE>) * <INTERVAL_SIZE> + LOWER_LIMIT AS segment , gender , COUNT(*) n FROM members GROUP BY segment, gender ORDER BY segment, gender
where <LOWER_LIMIT>
should be replaced with the lowest age occuring in the
data and <INTERVAL_SIZE>
is the number of distinct ages to put into one
interval. This produces a nice result set where each row corresponds to an
age interval/gender pair. The segment
column then represents the (inclusive)
lower limit of each interval.
To make the query more automatic, we can pull the minimum age straight from the
table leaving the parameter <INTERVAL_SIZE>
as the only one that needs to
be explicity specified.
SELECT FLOOR((age - min_age)/<INTERVAL_SIZE>) * <INTERVAL_SIZE> + min_age AS segment , gender , COUNT(*) n FROM members , (SELECT MIN(age) min_age FROM members) temp GROUP BY segment, gender ORDER BY segment, gender