Fixing synapse 1.128.0 with SQLite
Synapse 1.128.0 came with a unpleasant surprise, here is how I worked around it
Table of Contents
The Problem
On 2025-04-08 element HQ released the update for synapse 1.128.0.
I running my server the way I do got the update via my package manger like most other people. Also like most other people I'm using the SQLite backend.
After the update my matrix clients reported that they were unable to connect and the synapse process used 100% of one CPU.
Not a great start for debugging, but fortunately I wasn't alone, other people had the same issue, also with SQLite, which got me looking at the changelog in detail for anything that could be database related …
Long story short: I found out that the Add a column
background migration caused the problem with a VERY slow SQL query. On other databases, this sucks up performance, but on SQLite, this locks up the database thread and the service grinds to a halt.participant
to room_memberships
table.
The Solution
Almost every SQL performance problem can be worked around by throwing enough indices at it. This is the classic, trade of storage space for compute time and here it's worth it.
Also as an admin I also have the benefit of being able to inject indices with the sqlite3
command without having to modify any application code.
So here are the 3 indices that speed the query fast enough to not only allow the server to function, but also speed the migration up to fit within a few minutes of a weekend:
ON current_state_events(room_id, state_key)
where membership='join';
ON events(room_id, sender)
where type='m.room.message' OR type='m.room.encrypted';
ON room_memberships (event_stream_ordering, room_id, user_id);
You can check the status of background jobs that synapse runs on the database by using the following SQL query, an empty output means all jobs have finished:
select * from background_updates;
Note: Whether or not you leave those indices in after the migration has finished is up to you, I've let them in.
The Problem in Detail
As said above, the problem is that there is slow query running, I'll split this up into why slow queries are bad in SQLite and then why the query is slow.
Why Slow Queries are bad in SQLite
SQLite is a simple and performant database and unlike other databases one can include it as a library that stores the data in a single file, no extra service needed and very welcome for free time admins who don't want to put in the time to keep a database service running.
Part of why SQLite is so simple, yet fast is that it runs in a single thread, meaning it does one thing start to finish and then does the next allowing it to mostly ignore the problem of multiple queries running into each other (Note: oversimplified). Problem with that is, if one query takes very long the others have to wait very long, which is a problem if that other query is for a request from a client with a timeout that is not very long, resulting in what looks like connectivity problems.
The Slow Query
Part of the adding another column to the room_memberships
table is populating it with the correct data, which is what the populate_participant_bg_update
background update is for. Searching the code for that one can quickly find out …
… it is mainly implemented using the following query:
UPDATE room_memberships
SET participant = True
FROM (
SELECT DISTINCT c.state_key, e.room_id
FROM current_state_events AS c
INNER JOIN events AS e ON c.room_id = e.room_id
WHERE c.membership = 'join'
AND c.state_key = e.sender
AND (
e.type = 'm.room.message'
OR e.type = 'm.room.encrypted'
)
) AS subquery
WHERE room_memberships.user_id = subquery.state_key
AND room_memberships.room_id = subquery.room_id
AND room_memberships.event_stream_ordering <= ?
AND room_memberships.event_stream_ordering > ?;
That's a little monster, but nothing too bad, the subquery can be a potential source of slowness, but not in this case.
Speaking of the subquery, Asking the database about how it would do the select …
EXPLAIN QUERY PLAN
SELECT DISTINCT c.state_key, e.room_id
FROM current_state_events AS c
INNER JOIN events AS e ON c.room_id = e.room_id
WHERE c.membership = 'join'
AND c.state_key = e.sender
AND (
e.type = 'm.room.message'
OR e.type = 'm.room.encrypted'
)
;
Result:
QUERY PLAN
|--SCAN c
|--SEARCH e USING INDEX events_room_stream (room_id=?)
`--USE TEMP B-TREE FOR DISTINCT
… tells us, that it is:
- Looking at all entries in the
current_state_events
table (aliasc
) - Querying the
events
table using an index that only contains the room id - Collecting all of that into a binary tree to return each result only once
For this the context, that the current_state_events
contains round 64K entries and the events
table about 1 million entries. (Yes, small server)
The two problems here are:
- The scan on
current_state_events
happens without using an index, meaning the database has to look at every single entry - The
events
are only indexed by theirroom_id
leaving a lot of comparisons for the database to do on the fly (remember the high CPU usage?)
Speeding Things up
The most efficient way to get this sped up is to custom tailor two indices on those queries so that the database has to do as little as possible when doing the actual work, meaning we need covering indices.
What's a covering index? A covering index is an index, that contains all data the database needs for a given query. this has the benefit, that the database never has to leave the index saving a lookup in the actual table. This is less code running and also easier to predict and cache, making it quite fast.
The first index is for the current_state_events
table, it needs the state_key
and the room_id
and only the rows where the membership
column contains the string join
:
ON current_state_events(room_id, state_key)
where membership='join';
I've put room_id
first, because that's needed for doing the joining before the DISTINCT
part happens. Maybe the other way around could have been optimozed even better by SQLite, but the main point of this index is saving the comparison against the membership
column.
The second index is for the events
table, it needs the room_id
for the join (which is why it comes first), the sender
for filtering immediately after the join and it should only contain the entries that comparison against the type
is checking for:
ON events(room_id, sender)
where type='m.room.message' OR type='m.room.encrypted';
Those two indices speed the query up enough for synapse to do it's job as a server again, but of course there is the outer query, which can be sped up too.
Looking at the schema of the room_memberships
table reveals …
.schema room_memberships
Output: (reformatted for readability)
(
event_id TEXT NOT NULL,
user_id TEXT NOT NULL,
sender TEXT NOT NULL,
room_id TEXT NOT NULL,
membership TEXT NOT NULL,
forgotten INTEGER DEFAULT 0,
display_name TEXT,
avatar_url TEXT,
event_stream_ordering BIGINT REFERENCES events(stream_ordering),
participant BOOLEAN DEFAULT FALSE,
UNIQUE (event_id)
);
(room_id);
(user_id);
WHERE forgotten = 1;
(user_id, room_id)
[… left out …]
(user_id, room_id) ;
(event_stream_ordering) ;
quite some columns, a trigger we're not interested in and a lot of indices, but none that combine event_stream_ordering
with the needed user_id
and room_id
, so to create one:
ON room_memberships (event_stream_ordering, room_id, user_id);
Now one can watch the background job eating through the remaining work within a few minutes.
Closing Notes

This was the first time the synapse server brought a surprise with an update, so far every other update just worked™.
This little exercise in database performance allowed me to avoid a downgrade to an older version and get my chat working again, although I wish that it wouldn't have been necessary.

If you're now interested in where to get the knowledge to speed up SQLite I recommend reading the SQLite Indexing Tutorial and The SQLite Query Optimizer Overview.