Flask, Jinja2 and SLQAlchemy many-to-many relationship with conditions

What is the use of defining a relationship with back_populates? Start thinking tuples.

17 June 2019 Updated 31 August 2019

In SQLAlchemy

Disclaimer: This my first SQLAlchemy project so I am not experienced and may make wrong statements in this post.

While working on my first Flask / SQLAchemy project (I do not use Flask-SQLAlchemy) I hit a very high and thick wall. To explain this, consider you have a blog with posts and tags. There is a many-to-many relationship between posts and tags: a post can have many tags and a tag can have many posts.

The association table and models:

Base = declarative_base()

# many-to-many association table: blog_post - blog_tag
blog_post_mtm_blog_tag_table = Table('blog_post_mtm_blog_tag', Base.metadata,
    Column('blog_post_id', Integer, ForeignKey('blog_post.id')),
    Column('blog_tag_id', Integer, ForeignKey('blog_tag.id'))
)

class BlogPost(Base):
    __tablename__ = 'blog_post'

    id = Column(Integer, primary_key=True)
    deleted = Column(Boolean, default=False)
    title = Column(String)
    # many-to-many relationship with child
    blog_tags = relationship(
        'BlogTag', 
        secondary=blog_post_mtm_blog_tag_table,
        back_populates='blog_posts')

    def __repr__(self):
        return "%s(title=%r)" % (self.__class__.__name__, self.title) 


class BlogTag(Base):
    __tablename__ = 'blog_tag'

    id = Column(Integer, primary_key=True)
    deleted = Column(Boolean, default=False)
    name = Column(String)
    # many-to-many relationship with parent
    blog_posts = relationship('BlogPost',
        secondary=blog_post_mtm_blog_tag_table,
        back_populates='blog_tags')

    def __repr__(self):
        return "%s(name=%r)" % (self.__class__.__name__, self.name)

Easy, nice, no problems. In the view function we pass posts to the template:

def posts_list():

    blog_posts = db.query(BlogPost)

    return render_template(
        ...
        blog_posts=blog_posts)

In the Ninja template we iterate over the posts and tags:

    {% for blog_post in blog_posts %}
        <tr>
            <td>
                <a href="{{ url_for('blog.posts_post_edit', id=blog_post.id) }}">
                    <i class="fa fa-pencil-alt"></i> {{ blog_post.title }} 
                </a>
            </td>
            <td>
            {% if blog_post.blog_tags %}
                {{ blog_post.blog_tags|join(', ', attribute='name') }}
            {% endif %}
            </td>
        </tr>
    {% endfor %}

Beautiful, elegant! But realistic?

The problem: conditions

Among the SQLAlchemy questions most viewed on e.g. stackoverflow you will find the many-to-many relationship with conditions. Why? Because starting with SQLAlchemy is confusing.

You run into problems when you want to add one or more conditions. How do you do this? The defined relationship can no longer be used or can it? I believed there would be a simple solution to add e.g. a deleted-condition or a filter on the tag name. While looking for solutions I found this text, see the first link below: 'if you need a JOIN condition different from what the relationship() defines, then you can't use that relationship to do the join. You have to spell it out explcitly'.

No, no, no, this was not what I was looking for, it must be possible to do this easier. Reading more and more I got more depressed and finally had to conclude that it was not possible to do this easier. Worst even, to get both BlogPost and BlogTag from a query, you must specify both:

qry = db.query(BlogPost, BlogTag)......

This will result in tuples to be returned, not a single object.

Summarizing, if you have a relationship and most of the time the conditions are different then there is no use for back_populates. So better forget about iterating like above and start thinking tuples and other ways to accomplish what you want. I came up with two possible solutions:

- Single query with outerjoins
- Two query SELECT-only, first query selecting the posts, second query selecting tags for each post

Below I investigate both.

Solution 1: Single query with outerjoins

The query returns a list of tuples consisting of pairs: post-tag. An example of this query is:

qry = db.query(BlogPost, BlogTag) \
    .outerjoin(blog_post_mtm_blog_tag_table) \
    .outerjoin(BlogTag, and_( (BlogTag.id == blog_post_mtm_blog_tag_table.c.blog_tag_id) )) \
    .order_by(asc(BlogPost.title)) \
    .order_by(asc(BlogTag.name))

We use outerjoin because we want all BlogPost entries to appear in the result. When a post has zero tags, the tag is None, when a post has two or more tags, two or more tuples are returned:

[(BlogPost(title='Drinks on the house'), BlogTag(name='refreshment')), (BlogPost(title='Hello Mexico'), BlogTag(name='Holiday')), (BlogPost(title='Hello Mexico'), BlogTag(name='Mexico')), (BlogPost(title='Mountain shoes'), None), (BlogPost(title='Sunny beaches'), BlogTag(name='Holiday')), (BlogPost(title='Sunny beaches'), BlogTag(name='Travel'))]

This means we cannot simply iterate over the tuples in the template because a post can be in more than one tuple. We must do some post-processing before passing the data to the template. One way of doing this is to iterate over the tuples and reduce multiple identical blog_posts to a single blog_post and put all tags for a blog_post in a list:

blog_posts = []
blog_post_id2blog_tags = {}
for blog_post, blog_tag in qry.all():
    # add blog_post
    if not blog_post in blog_posts:
        blog_posts.append(blog_post)

    if blog_tag == None:
        if not blog_post.id in blog_post_id2blog_tags:
            blog_post_id2blog_tags[blog_post.id] = []
        continue
    if not blog_post.id in blog_post_id2blog_tags:
        blog_post_id2blog_tags[blog_post.id] = []
    blog_post_id2blog_tags[blog_post.id].append(blog_tag)

In the view function we pass the blog_posts and blog_tags to the template:

def posts_list():

    ... 
    
    return render_template(
        ...
        blog_posts=blog_posts,
        blog_post_id2blog_tags=blog_post_id2blog_tags)

In the template we do something like:

    {% for blog_post in blog_posts %}
        <tr>
            <td>
                <a href="{{ url_for('blog.posts_post_edit', id=blog_post.id) }}">
                    <i class="fa fa-pencil-alt"></i> {{ blog_post.title }} 
                </a>
            </td>
            <td>
            {% if blog_post.id in blog_post_id2blog_tags %}
                {{ blog_post2blog_tags[blog_post.id]|join(', ', attribute='name') }}
            {% endif %}
            </td>
        </tr>
    {% endfor %}

Solution 2: Two query SELECT-only, the first query selecting the posts, and the second query selecting tags for each post

Why two queries instead of one? Isn't this always slower? While I believe you should always try to minimize the number of queries, performance difference is difficult to predict. Some queries are much faster than others. Query caching can also increase performance dramatically. For a website, never think of single visitor performance, always consider what happens when many vistors connect, click a link, etc.

The first SELECT-only query gets all posts:

blog_posts = db.query(BlogPost) \
    .order_by(asc(BlogPost.title))

The second SELECT-only query consists of a select with all the conditions in place:

qry = db.query(BlogPost.id, BlogTag) \
    .filter( and_( \
        (BlogPost.id == blog_post_mtm_blog_tag_table.c.blog_post_id), \
        (BlogTag.id == blog_post_mtm_blog_tag_table.c.blog_tag_id) \
        )) \
    .order_by(asc(BlogPost.title)) \
    .order_by(asc(BlogTag.name))

This query only returns tuples meeting all criteria meaning that a post without tags will not show up in the result and a tag can never be None. Like in solution 1 we do post-processing by creating a dictionary with key blog_post_id and a value a list holding the tags using defaultdict. This is now much simpler than solution 1:

from collections import defaultdict

blog_post_id2blog_tags = defaultdict(list)
for blog_post_id, blog_tag in qry.all():
    blog_post_id2blog_tags[blog_post_id].append(blog_tag)

In the view function we pass the data like we did in solution 1:

def posts_list():

    ... 
    
    return render_template(
        ...
        blog_posts=blog_posts,
        blog_post_id2blog_tags=blog_post_id2blog_tags)

The template does not differ from solution 1:

    {% for blog_post in blog_posts %}
        <tr>
            <td>
                <a href="{{ url_for('blog.posts_post_edit', id=blog_post.id) }}">
                    <i class="fa fa-pencil-alt"></i> {{ blog_post.title }} 
                </a>
            </td>
            <td>
            {% if blog_post.id in blog_post_id2blog_tags %}
                {{ blog_post2blog_tags[blog_post.id]|join(', ', attribute='name') }}
            {% endif %}
            </td>
        </tr>
    {% endfor %}

Summary

Forget about the beautiful, elegant solution mentioned in the beginning, it only exists in very specific situations. Real life applications often require queries with multiple dynamic conditions.

With Flask, Jinja2 and SQLAlchemy think in tuples because these are the units that often will be returned. These tuples need some post-processing to create objects suitable for display in the template.

I do not like this post-processing, in my opinion a query should return all data that can be put in a template, but at the moment I do not know another way. I have no preference for any of the two solutions above although I tend to choose the two query SELECT-only solution for its simplicity. For the moment I will use this.

In case you want to try this, here is the full code:

from sqlalchemy import Table, Column, Integer, String, Boolean, BigInteger, DateTime, ForeignKey, func, and_, or_, desc, asc, create_engine, inspect, sql
from sqlalchemy.orm import relationship, Session, with_polymorphic, backref, contains_eager, Query
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func, label

import os
import sys


Base = declarative_base()

# many-to-many association table: blog_post - blog_tag
blog_post_mtm_blog_tag_table = Table('blog_post_mtm_blog_tag', Base.metadata,
    Column('blog_post_id', Integer, ForeignKey('blog_post.id')),
    Column('blog_tag_id', Integer, ForeignKey('blog_tag.id'))
)

class BlogPost(Base):
    __tablename__ = 'blog_post'

    id = Column(Integer, primary_key=True)
    deleted = Column(Boolean, default=False)
    title = Column(String)
    # many-to-many relationship with child
    blog_tags = relationship(
        'BlogTag', 
        secondary=blog_post_mtm_blog_tag_table,
        back_populates='blog_posts')

    def __repr__(self):
        return "%s(title=%r)" % (self.__class__.__name__, self.title) 


class BlogTag(Base):
    __tablename__ = 'blog_tag'

    id = Column(Integer, primary_key=True)
    deleted = Column(Boolean, default=False)
    name = Column(String)
    # many-to-many relationship with parent
    blog_posts = relationship('BlogPost',
        secondary=blog_post_mtm_blog_tag_table,
        back_populates='blog_tags')

    def __repr__(self):
        return "%s(name=%r)" % (self.__class__.__name__, self.name) 


# not show/show generated sql
engine = create_engine('sqlite://')
#engine = create_engine('sqlite://', echo=True)

Base.metadata.create_all(engine)

from sqlalchemy.orm import sessionmaker
session = sessionmaker()
session.configure(bind=engine)
db = session()

# posts
post_hello_mexico = BlogPost(title='Hello Mexico')
post_sunny_beach = BlogPost(title='Sunny beaches')
post_mountain_shoes = BlogPost(title='Mountain shoes')
post_drinks_on_the_house = BlogPost(title='Drinks on the house')

# tags
tag_mexico = BlogTag(name='Mexico')
tag_holiday = BlogTag(name='Holiday')
tag_travel = BlogTag(name='Travel')
tag_refreshment = BlogTag(name='refreshment')
tag_sandy = BlogTag(name='Sandy')

# post tags
post_hello_mexico.blog_tags.append(tag_mexico)
post_hello_mexico.blog_tags.append(tag_holiday)
post_sunny_beach.blog_tags.append(tag_travel)
post_sunny_beach.blog_tags.append(tag_holiday)
post_drinks_on_the_house.blog_tags.append(tag_refreshment)

db.add_all([post_hello_mexico, post_sunny_beach, post_mountain_shoes, post_drinks_on_the_house, 
    tag_mexico, tag_holiday, tag_travel, tag_refreshment, tag_sandy])
db.commit()


print("Beautiful, elegant, but realistic?")
qry = db.query(BlogPost) \
    .order_by(asc(BlogPost.title))

for blog_post in qry.all():
    print("blog_post {}".format(blog_post.title))
    if blog_post.blog_tags:
        for blog_tag in blog_post.blog_tags:
            print("- blog_tag = {}".format(blog_tag.name))


print("\n\nSolution 1: Single query with outerjoins")

print("Solution 1: query")
qry = db.query(BlogPost, BlogTag) \
    .outerjoin(blog_post_mtm_blog_tag_table) \
    .outerjoin(BlogTag, and_( (BlogTag.id == blog_post_mtm_blog_tag_table.c.blog_tag_id) )) \
    .order_by(asc(BlogPost.title)) \
    .order_by(asc(BlogTag.name))

print("Solution 1: qry.all() = {}".format(qry.all()))

print("Solution 1: create blog_posts and blog_tags dict")
blog_posts = []
blog_post_id2blog_tags = {}
for blog_post, blog_tag in qry.all():
    # add blog_post
    if not blog_post in blog_posts:
        blog_posts.append(blog_post)

    if blog_tag == None:
        if not blog_post.id in blog_post_id2blog_tags:
            blog_post_id2blog_tags[blog_post.id] = []
        continue
    if not blog_post.id in blog_post_id2blog_tags:
        blog_post_id2blog_tags[blog_post.id] = []
    blog_post_id2blog_tags[blog_post.id].append(blog_tag)

print("Solution 1: template")
for blog_post in blog_posts:
    print("blog_post {}".format(blog_post.title))
    if blog_post.id in blog_post_id2blog_tags:
        for blog_tag in blog_post_id2blog_tags[blog_post.id]:
            print("- blog_tag = {}".format(blog_tag.name))


print("\n\nSolution 2: Two query select-only, the first query selecting the posts, and the second query selecting tags for each post")

print("Solution 2: select-query-1 blog_posts")
blog_posts = db.query(BlogPost) \
    .order_by(asc(BlogPost.title))

print("Solution 2: select-query-2 blog_tags for all blog_posts")
qry = db.query(BlogPost.id, BlogTag) \
    .filter( and_( \
        (BlogPost.id == blog_post_mtm_blog_tag_table.c.blog_post_id), \
        (BlogTag.id == blog_post_mtm_blog_tag_table.c.blog_tag_id) \
        )) \
    .order_by(asc(BlogPost.title)) \
    .order_by(asc(BlogTag.name))

print("Solution 2: qry.all() = {}".format(qry.all()))

print("Solution 2: create blog_tags dict")
from collections import defaultdict

blog_post_id2blog_tags = defaultdict(list)
for blog_post_id, blog_tag in qry.all():
    blog_post_id2blog_tags[blog_post_id].append(blog_tag)

print("Solution 2: template")
for blog_post in blog_posts:
    print("blog_post {}".format(blog_post.title))
    if blog_post.id in blog_post_id2blog_tags:
        for blog_tag in blog_post_id2blog_tags[blog_post.id]:
            print("- blog_tag = {}".format(blog_tag.name))

Links / credits

Confused about Many-to-Many outerjoin
https://groups.google.com/forum/?fromgroups=#!topic/sqlalchemy/GNIBQMvMRg8

SqlAlchemy - Many to Many outer join with conditions on join
https://stackoverflow.com/questions/22909531/sqlalchemy-many-to-many-outer-join-with-conditions-on-join

SQLAlchemy: complex ON clause when performing LEFT JOIN on many-to-many relation
https://stackoverflow.com/questions/25868681/sqlalchemy-complex-on-clause-when-performing-left-join-on-many-to-many-relation

SQLAlchemy: querying all objects with no direct relationship
https://stackoverflow.com/questions/47918946/sqlalchemy-querying-all-objects-with-no-direct-relationship

Flask, Jinja2 and SLQAlchemy many-to-many relationship with conditions

The problem: conditions

Solution 1: Single query with outerjoins

Solution 2: Two query SELECT-only, the first query selecting the posts, and the second query selecting tags for each post

Summary

Links / credits

Read more

SQLAlchemy

Leave a comment

Comments

Leave a reply

Recent

Most viewed

Tags