Query count pitfalls and how to avoid them

Stephen Burrows

If you notice that your Django application is running slowly, the first place I always recommend looking is at how many SQL queries you’re running. Even in simple applications, there are likely to be easy opportunities to reduce your query count that result in a significant performance improvement. In general, reducing query count can improve your performance because every query spends a certain amount of time traveling to and from the database server. In this blog post I’ll cover some common pitfalls, some best practices, and how to measure the number of queries you’re making.

Models & examples

These are the models and utilities we’ll be using in the examples in this post.

# models.py
from django.db import models


class Tag(models.Model):
    name = models.CharField(max_length=100)


class Author(models.Model):
    name = models.CharField(max_length=100)


class Blog(models.Model):
    name = models.CharField(max_length=100)


class BlogPost(models.Model):
    title = models.CharField(max_length=100)
    blog = models.ForeignKey(Blog, related_name='posts')
    author = models.ForeignKey(Author)
    tags = models.ManyToManyField(Tag, blank=True)

The following helper uses the Faker library to create fake data.

# examples.py
from faker import Faker

fake = Faker()

def create_blog_posts(count):
    blog = Blog.objects.create(name=fake.words(2))
    for x in range(count):
        BlogPost.objects.create(
            title=fake.words(5),
            blog=blog,
            author=Author.objects.create(name=fake.name()),
        )

I’ll also be using a rollback decorator to make sure the database is always left in a clean state after an example is run, and a count_queries() context manager which counts the queries executed within its scope and prints that count to stdout.

You can find the code for these helpers and models, as well as all the examples, on Github, or install it locally on your machine as follows:

$ pip install pipenv # if pipenv is not already installed on your system
$ git clone https://github.com/littleweaver/query-count-post.git
$ cd query-count-post
$ pipenv install

Examples are found inside the repository at query_count/examples.py. To run an example, you can do the following from the repository root:

$ pipenv shell
$ python manage.py shell
>>> from query_count.examples import following_foreignkeys__small
>>> following_foreignkeys__small()

Common pitfalls

There are a number of ways that it’s easy to unintentionally inflate your query count when working with Django models. I’m going to run through some of the most common scenarios and show you how to fix them.

Following ForeignKeys

One of the most common things you will do in Django is follow ForeignKey relationships. It’s super easy, but it silently adds a query every time you do it. Take the following example:

@rollback
def following_foreignkeys__small():
    create_blog_posts(1)

    with count_queries():
        blog_post = BlogPost.objects.get()
        print(blog_post.author.name)

In the example above, two queries will be executed: one to get the (only) blog post, and one in the background when the blog post’s author is accessed. We can eliminate the second query by using select_related, a QuerySet method that lets us fetch related object data without needing additional queries.

@rollback
def following_foreignkeys__small__select_related():
    create_blog_posts(1)

    with count_queries():
        blog_post = BlogPost.objects.select_related(author).get()
        print(blog_post.author.name)

This version of the code only executes one query: getting the blog post. The author data is fetched at the same time, and cached on the blog post before we try to access it.

Select related shines even more with larger data sets. For example, suppose we wanted to display the author names for 300 blog posts. Here’s an example without select_related:

@rollback
def following_foreignkeys__large():
    create_blog_posts(300)

    with count_queries():
        for blog_post in BlogPost.objects.all():
            print(blog_post.author.name)

Without select_related , we will end up executing 301 queries - one to get the posts, and then one query per post to get the related author. We can tweak this just a bit with select related:

@rollback
def following_foreignkeys__large__select_related():
    create_blog_posts(300)

    with count_queries():
        for blog_post in BlogPost.objects.select_related('author'):
            print(blog_post.author.name)

Now suddenly we’re getting all that data with only a single query!

Following ManyToManyFields & reverse ForeignKeys

select_related is great, but it only works for following ForeignKeys from the model they’re defined on to the model they point to (for example from blog post to author). It doesn’t work going the opposite direction, and it doesn’t work for ManyToManyFields. So take an example where you’re printing all the tags for each blog post:

@rollback
def following_m2m():
    create_blog_posts(300)

    with count_queries():
        for blog_post in BlogPost.objects.all():
            print(', '.join((tag.name for tag in blog_post.tags.all())))

This will execute 301 queries, just like the example above where author wasn’t select_related. Unfortunately we can’t eliminate all 301 queries. But we can still reduce the count with prefetch_related. prefetch_related executes one query for the primary queryset, then one additional query for each set of prefetched data, then caches the prefetched values on your model instances for later. Take the following code:

@rollback
def following_m2m__prefetch_related():
    create_blog_posts(300)

    with count_queries():
        for blog_post in BlogPost.objects.prefetch_related('tags'):
            print(', '.join((tag.name for tag in blog_post.tags.all())))

Even though we’re still using blog_post.tags.all() like before, this time we'll access the prefetched data on each object instead of running new queries, leaving us with a query count of just two!


You can also combine prefetch_related and select_related using Prefetch objects. For example, let’s say you want to display a list of blog posts with the name of their blog for each author. Here’s a basic implementation:

@rollback
def following_reverse_fk():
    create_blog_posts(300)

    with count_queries():
        for author in Author.objects.all():
            print(author.name)

            for blog_post in author.blogpost_set.all():
                print(blog_post.title, '||', blog_post.blog.name)

The setup for this example will create three hundred authors, each with one blog post, each of which has one related blog. So the code above will execute 601 queries as written. If we tweak things just a bit:

@rollback
def following_reverse_fk__prefetch_related__select_related():
    create_blog_posts(300)

    with count_queries():
        authors = Author.objects.prefetch_related(
            Prefetch(
                'blogpost_set',
                queryset=BlogPost.objects.select_related('blog'),
            )
        )
        for author in authors:
            print(author.name)

            for blog_post in author.blogpost_set.all():
                print(blog_post.title, '||', blog_post.blog.name)

Getting the blog posts looks exactly the same, but thanks to to the prefetch_related and select_related , we’re now back at two queries.

Related object counts

From time to time, you may find yourself wanting to count related objects - for example, if you want to know how many blog posts each author has written. Here’s a simple implementation:

@rollback
def related_object_count():
    create_blog_posts(300)

    with count_queries():
        for author in Author.objects.all():
            print(author.name, '||', author.blogpost_set.count())

Here, again, we end up with 301 queries: one for the author queryset, and one for each author to get the number of posts that author has written. We can improve this by using Django’s Count annotation, which moves the count calculation work to the database and caches the returned value on the model instance.

@rollback
def related_object_count__annotation():
    create_blog_posts(300)

    with count_queries():
        for author in Author.objects.annotate(blogpost_count=Count('blogpost')):
            print(author.name, '||', author.blogpost_count)

Because the count is done on the database side, this ends up only costing a single query.

Creating lots of model instances

The basic way of making new model instances is to use Model.objects.create(). For example, if you’re creating a bunch of blog posts at once you might be doing something along these lines:

@rollback
def bulk_create():
    blog = Blog.objects.create(name=' '.join(fake.words(2)))
    author = Author.objects.create(name=fake.name())

    with count_queries:
        for x in range(300):
            BlogPost.objects.create(
                title=' '.join(fake.words(5)),
                blog=blog,
                author=author,
            )

This ends up firing off a single insert query for every object we create – so 300 queries total. We can do better than that by using Model.objects.bulk_create(). For this method, you need to pass in an iterable of objects of the right type, and they’ll all be created with a single query.

@rollback
def bulk_create__bulk_create():
    blog = Blog.objects.create(name=' '.join(fake.words(2)))
    author = Author.objects.create(name=fake.name())

    with count_queries:
        posts = [
            BlogPost(
                title=' '.join(fake.words(5)),
                blog=blog,
                author=author,
            )
            for x in range(300)
        ]
        BlogPost.objects.bulk_create(posts)

bulk_create does have one big limitation: you won’t have access to the ids of the created rows in the database (unless you’re using Postgres and Django 1.10+). Without an id, the model instances you’ve created will not know that they have been saved to the database. This means that you won’t be able to create ManyToMany connections with the instances, set ForeignKeys to point to them, or call instance.refresh_from_db() . Plus, if you use instance.save() you will end up with a duplicate row. But as long as you have that caveat in mind, this can be a great way to save some queries.

Updating field values based on field values

There will be times when you want to update a field based on the value of another field. One classic example of this would be incrementing a “like” counter on a model instance. Here’s a basic implementation, where you get an instance, update the field value, and save the instance:

@rollback
def update_field_value():
    Blog.objects.create(name=' '.join(fake.words(2)))

    with count_queries():
        blog = Blog.objects.get()
        blog.likes = blog.likes + 1
        blog.save()

This takes two queries. But we can do better by using “F objects” provided by Django, that let us represent the current value of a field in the database as part of a query. This lets us do the following:

@rollback
def update_field_value__f():
    blog = Blog.objects.create(name=' '.join(fake.words(2)))

    with count_queries():
        Blog.objects.filter(
            id=blog.id,
        ).update(
            likes=F('likes') + 1,
        )

In this second example, we’re creating a filtered queryset (instead of getting a single object) and then telling the database to update all items in that queryset to increment the “likes” field by one. This does have the disadvantage of not giving you an updated instance to work with, but by delegating the update to the database we’re able to save a query and also avoid race conditions for simultaneous “likes”.

Best practices

Apart from the ways that Django can help you, there are also some best coding practices that can help keep your code clear and efficient.

Avoid side effects

There’s often a temptation with Django to execute queries in helper methods and properties on models. While this can in some cases make code DRYer, it comes at the cost of an increased likelihood that queries will be executed unintentionally, and perhaps unnecessarily. Take this example:

class Author(models.Model):
    name = models.CharField(max_length=100)

    @property
    def post_count(self):
        return self.blogpost_set.count()

If we add this shortcut property to Author, people will start using author.post_count without realizing that they’re triggering a query every time. And this is a simple case that only executes one query. It’s easy to end up with methods and properties that do a lot more work than this.

If you need access to data that can be fetched as part of a queryset, find a way to add that to the queryset directly, for example as part of generating the context for a template or as part of an API view. If you’re really needing that data on a regular basis, consider adding a custom manager method or creating a helper function that executes the query the same way every time. But in general, it’s good to wait until you’re repeating a pattern at least three times before abstracting it out; premature DRYing can make code less clear and harder to maintain.

Reuse querysets

This is probably one of the areas where you’ll save the smallest number of queries, but it’s such an easy win and such a common mistake that I can’t leave it out.

@rollback
def reuse_querysets():
    create_blog_posts(1)

    with count_queries():
        if Blog.objects.all():
            print('There are blogs!')
            for blog in Blog.objects.all():
                print(blog.name)

In the above example, Blog.objects.all() is called twice, resulting in two queries. You can eliminate one of those queries by storing the queryset in a variable and using that variable from then on.

@rollback
def reuse_querysets__reuse():
    create_blog_posts(1)

    with count_queries():
        blogs = Blog.objects.all()
        if blogs:
            print('There are blogs!')
            for blog in blogs:
                print(blog.name)

You can also do this inside Django templates by using the `{% with %}` template tag.

Measuring query count

Django Debug Toolbar is an excellent tool. If you want to profile your queries inside Django views you need to be using it. It includes a panel that breaks down the number of SQL queries you’re executing, how long each query takes, which queries are duplicates, what line of code triggered each query, and more.

If you’re trying to optimize code outside of Django views, it’s a little trickier since it’s manual, but essentially what you want to do is:

from django.db import connection
connection.queries

The connection.queries object contains a list of all executed queries, which you can inspect to get whatever information you need for your optimization.

Conclusion

When you’re using Django views, you can often increase the speed of the view by reducing the number of queries executed, using the APIs and techniques described here. The important question for any optimization, though, is whether it meets the needs of your project. If you’re already fast enough, or efficient enough, maybe you don’t need to do more work right now. And sometimes the functionality you’re building won’t allow for an optimization that would otherwise be obvious. For example, if you need to find out how many blog posts each author has that have “Django” in the title, there’s no way to get that information with just annotations before Django 2.0.

If you’re interested in learning more about optimizing Django’s database access, I recommend checking out Django’s database optimization docs, which go into techniques beyond query count reduction. Good luck!

Stephen Burrows

Stephen is an experienced Python engineer and server administrator based in Seattle, WA. He has been building and deploying Python/Django-based web applications for eight years.