After 2 years at an enterprise backup software firm, I finally took the plunge and joined a startup. I love the engineering culture we have here at Squad, and rigorous code reviews are very much a part of it. Since I often found myself repeating the same mistakes again (and again, and again..), I went ahead and wrote a checklist to help me.

This is not a generic list and is very much tied to me and the mistakes that I made. The list helped me, but your mileage may vary
The TL;DR
Read through the code you wrote, and stop and ponder at each line.
The actual (noobie) list
1. No create/update inside a loop
Are you making create()
or update()
calls in a loop? Have you considered whether they could be replaced with bulk_create()
and bulk_update()
? See the django-bulk-update package.
2. No unnecessary model attribute fetches
Are you writing post.id
where you could have gotten away with post_id
? For example:
class Blog(models.Model):
title = models.CharField(max_length=200)
class Post(models.Model):
blog = models.ForeignKey(Blog)
time = models.TimeField()
author = models.CharField(max_length=100
Let’s say you want to know the id
of the blog to which a particular post belongs to. One way is to do:
post = # some how get a reference to a Post object
print post.blog.id
But the better way is to do:
print post.blog_id
What’s the difference? In the first case, we are asking Django to fetch the id
attribute from the post’s blog entry. As you would probably know, Post
and Blog
are separate tables in the database. So when you ask for post.blog.id
, Django will query the Blog
table to fetch the id that we need. That’s an extra query. However, this is not really necessary because we have the information we need in the Post
object itself. Since we have a foreign key relationship from Post
to Blog
, django must somehow keep track of which post is related to which blog. Django does this by storing a special blog_id
attribute in Post
which would store the primary key of its parent Blog
. So post.blog_id
would give us the information we need, without resulting in an extra query.
3. Docstrings and comments are important
Read through the docstrings and comments. It might seem unimportant to read the docstrings and comments when you have a feature to ship. But a wrong comment/docstring can throw the next developer who reads your code off track, and trust me you don’t want to be that guy.
4. If possible, limit your business logic to the model classes
Be mindful of where you write your business logic. Coming from jquery world, I had a tendency to put my business logic wherever I please. Avoid writing business logic in ModelAdmin
classes or ModelForm
classes (yikes!) and write them where they belong – Model
classes. This would ensure:
- Consistency in the codebase. If it is business logic, there’s only one place it could be.
- Better tests. If it’s in a
Model
class, then you know you should write unit tests for it
5. Is it covered by unit tests?
Speaking of unit tests, how do you decide when to write unit tests and when not to? If it’s business logic, it needs unit tests. No buts, no maybes, just write the damn tests. If it is something in your ModelAdmin
, then you can afford not to write unit tests for that as long as you don’t do any fancy if..else
s there. Test your business logic, not your boilerplate
6. Think how your changes affect the existing data
In some cases, for example, when you introduce a new field, you might have to write a data migration so that existing rows in the table would have a sane value for that field. I made the rookie mistake of happily coding away on my feature with nary a thought about the existing data in the database and regretted it afterward. See here for a primer on Django migrations
7. Use that cache!
Keep an eye out for things that can be cached. Find yourselves fetching ‘top 10 scorers of all time’ from the db everytime the page loads? Cache it! Though this should be fairly obvious, it’s easy to forget about the cache when you are busy writing your feature.
8. Offload non-critical heavy tasks to an async queue
Okay, this one is a little specific to our particular stack. Let’s say you have a feature where your user presses ‘generate cats report’ button and you wade through the entire database to figure out how much of your total traffic involved pictures of grumpy cats. It is probably not a good idea to make your user stare at a loading screen while we crunch gigabytes of cat data. Here’s what you could have done:
- When the user presses the button, start an async task to calculate grumpy cat traffic volume. We use celery to make this happen.
- Once you fire the async task, immediately respond to the http request with the message “Looking for grumpy cats in the system, we will let you know when we are done”. Now your user can use his/her time for something more productive
- Message the user in slack/send an email/display a button on the page when your async task is done.
This will let us offload heavier tasks to spare EC2 instances so that more critical requests/queries do not get slower because of grumpy cats

9. Be aware of popular optimizations
Know your python. Use list comprehensions over for loops, izip over zip, et cetera. This comes with time and practice, so don’t sweat it. Oh and don’t forget this:
need_refuel = None
if fuel_level < 0.2:
need_refuel = True
else
need_refuel = False
The above mess can be refactored into:
need_refuel = fuel_level < THRESHOLD or False
Whether this aids or impairs readability is a whole different debate. Things like these are subjective, and it is okay to have opinions.
10. Query only what you need
If you had a model like this:
class Banana(models.Model):
gorilla = models.CharField()
tree = models.CharField()
jungle = models.CharField()
(God save you if you actually have a class structure like that in production, but it would serve our example well) And you want to do something like this:
banana = Banana.objects.get(id=3)
What you wanted was a banana, but what you got was a gorilla holding the banana with the tree it was sitting on along with the entire fricking jungle (thanks, Joe Armstrong for the quote). Not cool.
What you can do instead is:
banana = Banana.objects.get(id=3).only('id')
Here’s the documentation for only. No more gorillas, just the banana. However, I prefer using .value_list('id', flat=True)
over only('id')
because the latter might result in extra queries if we carelessly try to use any attribute other than id
in our code. Using .values()
is very explicit and conveys to the programmer that you only need this particular attribute here. It is also faster.

11. Learn to use Django Debug Tools
Django debug tools is your friend. Lavishly using only()
and defer()
could bite you back if not careful. If you defer loading attributes that you don’t think you will need, but you end up needing them anyways, that would be an extra DB query. At least in the Django list pages, this would result in the dreaded n+1 query. Let’s say you want to tabulate bananas and gorillas:
id | Gorilla Details |
---|---|
1 | Chump, Amazon rainforest |
2 | Rick, Cambodia |
3 | Appu, Kerala |
You thought all you need is id
and Gorilla
, so you did Banana.objects.all().only('id', 'gorilla')
, so that we don’t need the tree and the jungle. But 3 months later, you thought it would be a good idea to display where the gorilla came from in your table. So you fire up a custom function in the ModelAdmin
to do this:
def gorilla_details(self, obj)
return '{0} {1}'.format(obj.gorilla, obj.jungle)
And everything worked smooth. But unbeknownst to you, Django is making DB queries in a loop. We had told Django to get only id
and gorilla
through the only()
method. We now need the jungle
as well. So whenever we access obj.jungle
, Django queries the DB to get the jungle because we specifically told it not to fetch the jungle earlier. So we end up making 10 calls to the db for 10 gorillas (or bananas, whatever). The fix is to include jungle
in the only()
clause, but more often than not we do not even know that we are making an n+1 query.
Enter django debug tools.
Among many other things, DDT will tell you how many queries were fire to load your page. So if our banana-gorilla table makes 35 queries to the database to load, we know something’s wrong. Always look at what the debug toolbar has to say before you send in that pull request for review
Sorry for the long post. Here’s a potato.
