Home Calculate Max of Sum of an annotated field over a grouped by query in Django ORM?

# Calculate Max of Sum of an annotated field over a grouped by query in Django ORM?

Azee
1#
Azee Published in 2018-01-12 13:08:28Z

To keep it simple I have four tables(A, B, Category and Relation), Relation table stores the Intensity of A in B and Category stores the type of B.

A <--- Relation ---> B ---> Category

(So the relation between A and B is n to n, when the relation between B and Category is n to 1)

I need an ORM to group Relation records by Category and A, then calculate Sum of Intensity in each (Category, A) (seems simple till here), then I want to annotate Max of calculated Sum in each Category.

My code is something like:

 A.objects.values('B_id').annotate(AcSum=Sum(Intensity)).annotate(Max(AcSum))


Which throws the error:

django.core.exceptions.FieldError: Cannot compute Max('AcSum'): 'AcSum' is an aggregate


Django-group-by package with the same error.

For further information please also see this stackoverflow question.

I am using Django 2 and PostgreSQL.

Is there a way to achieve this using ORM, if there is not, what would be the solution using raw SQL expression?

### Update

After lots of struggling I found out that what I wrote was indeed an aggregation, however what I want is to find out the maximum of AcSum of each A in each category. So I suppose I have to group-by the result once more after AcSum Calculation. Based on this insight I found a stack-overflow question which asks the same concept(The question was asked 1 year, 2 months ago without any accepted answer). Chaining another values('id') to the set does not function neither as a group_by nor as a filter for output attributes, It removes AcSum from the set. Adding AcSum to values() is also not an option due to changes in the grouped by result set. I think what I am trying to do is re grouping the grouped by query based on the fields inside a column (i.e id). any thoughts?

 You can't do an aggregate of an aggregate Max(Sum()), it's not valid in SQL, whether you're using the ORM or not. Instead, you have to join the table to itself to find the maximum. You can do this using a subquery. The below code looks right to me, but keep in mind I don't have something to run this on, so it might not be perfect. from django.db.models import Subquery, OuterRef annotation = { 'AcSum': Sum('intensity') } # The basic query is on Relation grouped by A and Category, annotated # with the Sum of intensity query = Relation.objects.values('a', 'b__category').annotate(**annotation) # The subquery is joined to the outerquery on the Category sub_filter = Q(b__category=OuterRef('b__category')) # The subquery is grouped by A and Category and annotated with the Sum # of intensity, which is then ordered descending so that when a LIMIT 1 # is applied, you get the Max. subquery = Relation.objects.filter(sub_filter).values('a', 'b__category').annotate(**annotation).order_by('-AcSum').values('AcSum')[:1] query = query.annotate(max_intensity=Subquery(subquery))  This should generate SQL like: SELECT a_id, category_id, (SELECT SUM(U0.intensity) AS AcSum FROM RELATION U0 JOIN B U1 on U0.b_id = U1.id WHERE U1.category_id = B.category_id GROUP BY U0.a_id, U1.category_id ORDER BY SUM(U0.intensity) DESC LIMIT 1 ) AS max_intensity FROM Relation JOIN B on Relation.b_id = B.id GROUP BY Relation.a_id, B.category_id  It may be more performant to eliminate the join in Subquery by using a backend specific feature like array_agg (Postgres) or GroupConcat (MySQL) to collect the Relation.ids that are grouped together in the outer query. But I don't know what backend you're using.
 Something like this should work for you. I couldn't test it myself, so please let me know the result: Relation.objects.annotate( b_category=F('B__Category') ).values( 'A', 'b_category' ).annotate( SumInensityPerCategory=Sum('Intensity') ).values( 'A', MaxIntensitySumPerCategory=Max('SumInensityPerCategory') )