Django Query Tips select & prefetch_related

Django Query Tips select & prefetch_related

All you need to know about prefetching in Django

The models looked (roughly) like this:

class Program(models.Model):
    name = models.CharField(max_length=20)

class Price(models.Model):
    program = models.ForeignKey(Program)
    from_date = models.DateTimeField()
    to_date = models.DateTimeField()

class Order(models.Model):
    state = models.CharField(max_length=20)
    items = models.ManyToManyField(Price)
  • Program is a session, lecture or a conference day.
  • Prices can change over time so we used a model called Price, modeled as a type 2 slowly changing dimension (SCD) that represents the price of a program at a certain time.
  • User’s can register to one or more programs. Each item in an Order is a program price at the time the order was made.

Let’s try to fetch the program names for a single order:

> o = Order.objects.filter(state=’completed’).first()
(0.002) SELECT … FROM “orders_order” WHERE “orders_order”.”state” = ‘completed’ ORDER BY “orders_order”.”id” ASC LIMIT 1;
> [ for p in o.items.all()]
(0.002) SELECT … FROM “events_price” INNER JOIN “orders_order_items” ON (“events_price”.”id” = “orders_order_items”.”price_id”) WHERE “orders_order_items”.”order_id” = 29; args=(29,)
(0.001) SELECT … FROM “events_program” WHERE “events_program”.”id” = 8; args=(8,)
[‘Day 1 Pass’]
  • To fetch completed orders we need one query.
  • To fetch the program names for each order we need two more queries.

If we need two queries for each order, the number of queries for 100 orders will be 1 + 100 * 2 = 201 queries — that’s a lot!

Let’s use Django to reduce the amount of queries:

> o.items.values_list(‘program__name’)
(0.003) SELECT “events_program”.”name” FROM “events_price” INNER JOIN “orders_order_items” ON (“events_price”.”id” = “orders_order_items”.”price_id”) INNER JOIN “events_program” ON (“events_price”.”program_id” = “events_program”.”id”) WHERE “orders_order_items”.”order_id” = 29 LIMIT 21;
[‘Day 1 Pass’]

Great! Django performed a join between Price and Program and reduced the amount of queries to just one per order.

At this point instead of 201 queries we only need 101 queries for 100 orders.

Can we do better?

Why can’t we join?

If we have a foreign key we can use select_related or use snake case like we did above to fetch the related fields in a single query.

For example, we fetched the program name for a list of prices in a single query using values_list(‘program__name’). We were able to do that because each price is related to exactly one program.

If the relation between two models is many to many we can’t do that. Every order has one or more related prices — if we join the two tables we get duplicate orders:

SELECT AS order_id, AS price_id
    orders_order o 
    JOIN orders_order_items op ON ( = op.order_id) 
    JOIN events_price p ON (op.price_id = 
order_id | price_id 
       45 |       38
       45 |       56
       70 |       38
       70 |       50
       70 |       77
       71 |       38

Orders 70 and 45 have multiple items so they come up more than once in the result — Django can’t handle that.

Enter prefetch_related

Django has a nice, built-in way, of dealing with this problem called prefetch_related:

> o = Order.objects.filter(
(0.002) SELECT ... FROM "orders_order" WHERE "orders_order"."state" = 'completed' ORDER BY "orders_order"."id" ASC LIMIT 1;
(0.001) SELECT ("orders_order_items"."order_id") AS "_prefetch_related_val_order_id", "events_price"... FROM "events_price" INNER JOIN "orders_order_items" ON ("events_price"."id" = "orders_order_items"."price_id") WHERE "orders_order_items"."order_id" IN (29);
(0.001) SELECT "events_program"."id", "events_program"."name" FROM "events_program" WHERE "events_program"."id" IN (8);

We told Django we intend to fetch items__program. from the result set. In the second and third query we can see that Django fetched the through table orders_order_items and the relevant programs from events_program. The results of the prefetch are cached on the objects.

What happens when we try to fetch program names from the result?

> [ for p in o.items.all()]
[‘Day 1 Pass’]

No additional queries — exactly what we wanted!

When using prefetch, it’s important to work on the object and not on the query. Trying to fetch the program names with a query will produce the same outcome but will result in an additional query:

> o.items.values_list(‘program__name’)
(0.002) SELECT “events_program”.”name” FROM “events_price” INNER JOIN “orders_order_items” ON (“events_price”.”id” = “orders_order_items”.”price_id”) INNER JOIN “events_program” ON (“events_price”.”program_id” = “events_program”.”id”) WHERE “orders_order_items”.”order_id” = 29 LIMIT 21;
[‘Day 1 Pass’]

At this point, fetching 100 orders requires only 3 queries.

Can we do even better?

Introducing Prefetch

At version 1.7 Django introduced a new Prefetch object that extends the capabilities of prefetch_related.

The new object allows the developer to override the query used by Django to prefetch the related objects.

In our previous example Django used two queries for the prefetch — one for the through table and one for the program table. What if we could tell Django to join these two together?

> prices_and_programs = Price.objects.select_related(‘program’)
> o = Order.objects.filter(
    Prefetch('items', queryset=prices_and_programs)
(0.001) SELECT … FROM “orders_order” WHERE “orders_order”.”state” = ‘completed’ ORDER BY “orders_order”.”id” ASC LIMIT 1;
(0.001) SELECT (“orders_order_items”.”order_id”) AS “_prefetch_related_val_order_id”, “events_price”…., “events_program”…. INNER JOIN “events_program” ON (“events_price”.”program_id” = “events_program”.”id”) WHERE “orders_order_items”.”order_id” IN (29);

We created a query that joins prices with programs. Than we told Django to use this query to prefetch the values. This is like telling Django that you intend to fetch both items and programs for each order.

Fetching program names for an order:

> [ for p in o.items.all()]
[‘Day 1 Pass’]

No additional queries — it worked!

Taking it to the next level

When we talked earlier about the models we mentioned that the prices are modeled as an SCD table. This means we might want to query only active prices at a certain date.

A price is active at a certain date if it’s between from_date and end_date:

> now =
> active_prices = Price.objects.filter(

Using the Prefetch object we can tell Django to store the prefetched objects in a new attribute of the result set:

> active_prices_and_programs = (
> o = Order.objects.filter(
(0.001) SELECT … FROM “orders_order” WHERE “orders_order”.”state” = ‘completed’ ORDER BY “orders_order”.”id” ASC LIMIT 1;
(0.001) SELECT … FROM “events_price” INNER JOIN “orders_order_items” ON (“events_price”.”id” = “orders_order_items”.”price_id”) INNER JOIN “events_program” ON (“events_price”.”program_id” = “events_program”.”id”) WHERE (“events_price”.”from_date” <= ‘2017–04–29T07:53:00.210537+00:00’::timestamptz AND “events_price”.”to_date” > ‘2017–04–29T07:53:00.210537+00:00’::timestamptz AND “orders_order_items”.”order_id” IN (29));

We can see in the log that Django performed only two queries, and the prefetch query now include the custom filter we defined.

To fetch the active prices we can use the new attribute defined in to_attr:

> [ for p in o.active_prices]
[‘Day 1 Pass’]

Prefetch is a very powerful feature of Django ORM. I strongly recommend going over the documentation — you are bound to strike a gem.