Azee Published in 2018-01-12 13:08:28Z

To keep it simple I have four tables(A, B, Category and Relation), Relation table stores the Intensity of A in B and Category stores the type of B.

A <--- Relation ---> B ---> Category

(So the relation between A and B is n to n, when the relation between B and Category is n to 1)

I need an ORM to group Relation records by Category and A, then calculate Sum of Intensity in each (Category, A) (seems simple till here), then I want to annotate Max of calculated Sum in each Category.

My code is something like:


Which throws the error:

django.core.exceptions.FieldError: Cannot compute Max('AcSum'): 'AcSum' is an aggregate

Django-group-by package with the same error.

For further information please also see this stackoverflow question.

I am using Django 2 and PostgreSQL.

Is there a way to achieve this using ORM, if there is not, what would be the solution using raw SQL expression?


After lots of struggling I found out that what I wrote was indeed an aggregation, however what I want is to find out the maximum of AcSum of each A in each category. So I suppose I have to group-by the result once more after AcSum Calculation. Based on this insight I found a stack-overflow question which asks the same concept(The question was asked 1 year, 2 months ago without any accepted answer). Chaining another values('id') to the set does not function neither as a group_by nor as a filter for output attributes, It removes AcSum from the set. Adding AcSum to values() is also not an option due to changes in the grouped by result set. I think what I am trying to do is re grouping the grouped by query based on the fields inside a column (i.e id). any thoughts?

Brad Martsberger
Brad Martsberger Reply to 2018-01-14 20:00:34Z

You can't do an aggregate of an aggregate Max(Sum()), it's not valid in SQL, whether you're using the ORM or not. Instead, you have to join the table to itself to find the maximum. You can do this using a subquery. The below code looks right to me, but keep in mind I don't have something to run this on, so it might not be perfect.

from django.db.models import Subquery, OuterRef

annotation = {
    'AcSum': Sum('intensity')
# The basic query is on Relation grouped by A and Category, annotated
# with the Sum of intensity
query = Relation.objects.values('a', 'b__category').annotate(**annotation)

# The subquery is joined to the outerquery on the Category
sub_filter = Q(b__category=OuterRef('b__category'))
# The subquery is grouped by A and Category and annotated with the Sum
# of intensity, which is then ordered descending so that when a LIMIT 1
# is applied, you get the Max.
subquery = Relation.objects.filter(sub_filter).values('a', 'b__category').annotate(**annotation).order_by('-AcSum').values('AcSum')[:1]

query = query.annotate(max_intensity=Subquery(subquery))

This should generate SQL like:

SELECT a_id, category_id,
       (SELECT SUM(U0.intensity) AS AcSum
        JOIN B U1 on U0.b_id = U1.id
        WHERE U1.category_id = B.category_id
        GROUP BY U0.a_id, U1.category_id
        ORDER BY SUM(U0.intensity) DESC
        LIMIT 1
       ) AS max_intensity
FROM Relation
JOIN B on Relation.b_id = B.id
GROUP BY Relation.a_id, B.category_id

It may be more performant to eliminate the join in Subquery by using a backend specific feature like array_agg (Postgres) or GroupConcat (MySQL) to collect the Relation.ids that are grouped together in the outer query. But I don't know what backend you're using.

Ahmad Reply to 2018-01-13 11:05:11Z

Something like this should work for you. I couldn't test it myself, so please let me know the result:

   'A', 'b_category'
   'A', MaxIntensitySumPerCategory=Max('SumInensityPerCategory')
