Comments & Replies
September 25, 2024
Introduction
In this blog article, we’ll explore how to model comments and replies in a relational database and how to efficiently fetch them. In the case of a commenting system, each post can have multiple comments. A comment consists of the comment text, the username of the commenter, their profile picture, the timestamp of creation, and the number of likes. Additionally, it should track whether the user viewing the comment has already liked it. Each comment made directly on a post (a parent comment) can have multiple replies. However, replies are not nested further - meaning a reply cannot have its own replies. Instead, replying to a reply is treated as a reply to the original parent comment, similar to how Instagram handles comments. Parent comments are ordered based on relevance, which is determined by a combination of the number of likes and the comment's age. Replies (child comments) are simply sorted by time. When viewing a post, only parent comments are initially loaded, and users can expand them to view replies. These are the key requirements for this use case, and we will explore how to model and retrieve this data efficiently in a relational database.
Schema Design
When designing the schema using normalization, you would create separate tables for comments and
replies. In this setup, each comment in the comments table can have multiple associated replies
stored in a replies table, with the relationship enforced by a foreign key
(comment_id
). To track which users have liked a particular comment or reply, additional
many-to-many relationship tables - such as relation_liked_comment
and
relation_liked_reply
- are introduced. The diagram below illustrates a normalized
design:

While the normalized approach is a valid way to model the data, it comes with some drawbacks. In
many cases, a single-table design using a self-referencing parent_comment_id
is
preferable
because comments and replies share nearly identical data and behavior, making them essentially the
same entity. This unified approach avoids splitting similar data across two tables, which can
increase the complexity of queries and application logic - especially if you later decide to support
nested replies beyond a single level. By consolidating everything into one table, you simplify data
management, reduce code duplication, and gain the flexibility to traverse comment threads using
recursive queries or common table expressions. In contrast, a two-table design introduces redundant
structures, adds join overhead, and limits the evolution of the comment system into deeper
hierarchies without a significant redesign.

Query Construction
Let’s begin by querying the parent comments for a specific post. The following query demonstrates how to fetch these top-level comments.
SELECT * FROM comments WHERE parent_id IS NULL AND post_id = :${POST_ID};
The initial query retrieves the comment text and its creation timestamp, but it lacks additional user details. To include information about the comment’s author, we join the comments with the users table.
SELECT c.comment, c.created_at, u.username, u.picture FROM comments c JOIN users u ON c.user_id = u.id WHERE c.parent_id IS NULL AND c.post_id = :${POST_ID};
Next, we want to display the number of likes each comment has received. This is done by joining the
relation_liked_comment
table, which records the many-to-many relationship between users
and comments, into our query.
SELECT c.comment, c.created_at, u.username, u.picture, COUNT(rlc) AS like_count FROM comments c JOIN users u ON c.user_id = u.id LEFT JOIN relation_liked_comment rlc ON rlc.comment_id = c.id WHERE parent_id IS NULL AND c.post_id = :${POST_ID} GROUP BY c.comment, c.created_at, u.username, u.picture;
To include the number of replies, we perform a self-join on the comments table using the
parent_id
field, which connects a comment to its replies.
SELECT c.comment, c.created_at, u.username, u.picture, COUNT(rlc) AS like_count, COUNT(c2.id) AS response_count FROM comments c JOIN users u ON c.user_id = u.id LEFT JOIN comments c2 ON c2.parent_id = c.id LEFT JOIN relation_liked_comment rlc ON rlc.comment_id = c.id WHERE c.parent_id IS NULL AND c.post_id = :${POST_ID} GROUP BY c.comment, c.created_at, u.username, u.picture;
Additionally, we need to determine whether the currently logged-in user has liked each comment. This
is achieved by leveraging the relation_liked_comment
table once more.
SELECT c.comment, c.created_at, u.username, u.picture, CASE WHEN rlc2.comment_id IS NOT NULL THEN true ELSE false END AS is_liked, COUNT(rlc) AS like_count, COUNT(c2.id) AS response_count FROM comments c JOIN users u ON c.user_id = u.id LEFT JOIN relation_liked_comment rlc ON rlc.comment_id = c.id LEFT JOIN relation_liked_comment rlc2 ON rlc2.comment_id = c.id AND rlc2.user_id = :${USER_ID} LEFT JOIN comments c2 ON c2.parent_id = c.id WHERE parent_id IS NULL AND c.post_id = :${POST_ID} GROUP BY c.comment, c.created_at, u.username, u.picture, is_liked;
The final piece is an ordering column that sorts comments based on relevance. The query below shows how this field is calculated and used.
SELECT c.comment, c.created_at, u.username, u.picture, CASE WHEN rlc2.comment_id IS NOT NULL THEN true ELSE false END AS is_liked, COUNT(rlc) AS like_count, COUNT(c2.id) AS response_count, (10 * COUNT(rlc.user_id)) / POWER((EXTRACT(EPOCH FROM (NOW() - c.created_at)) / 3600 + 2), 1.5) AS order_col FROM comments c JOIN users u ON c.user_id = u.id LEFT JOIN relation_liked_comment rlc ON rlc.comment_id = c.id LEFT JOIN relation_liked_comment rlc2 ON rlc2.comment_id = c.id AND rlc2.user_id = :${USER_ID} LEFT JOIN comments c2 ON c2.parent_id = c.id WHERE parent_id IS NULL AND c.post_id = :${POST_ID} GROUP BY c.comment, c.created_at, u.username, u.picture, is_liked ORDER BY order_col DESC, c.id DESC;
Here’s a breakdown of how the ordering column, order_col
, is computed:
- The
numerator multiplies the like count by 10, ensuring that likes have a significant impact.
- The
denominator is derived from the comment’s age in hours (calculated using
EXTRACT(EPOCH FROM (NOW() - c.created_at)) / 3600
), with a constant added to prevent
issues with very recent comments. An exponent of 1.5 controls how quickly the score decays over
time.
- This formula means that a comment with zero likes scores 0, a new comment with a few
likes scores higher than an older comment with the same number of likes, and popular comments remain
prominent even as they age.
- To ensure consistent ordering when scores match, the comment ID is
used as a secondary sort key.
Rather than retrieving all comments simultaneously, it's more efficient to use pagination. Since a
custom order column determines the sequence, keyset pagination is employed to consistently maintain
that order across pages.
SELECT c.comment, c.created_at, u.username, u.picture, CASE WHEN rlc2.comment_id IS NOT NULL THEN true ELSE false END AS is_liked, COUNT(rlc) AS like_count, COUNT(c2.id) AS response_count, (10 * COUNT(rlc.user_id)) / POWER((EXTRACT(EPOCH FROM (NOW() - c.created_at)) / 3600 + 2), 1.5) AS order_col FROM comments c JOIN users u ON c.user_id = u.id LEFT JOIN relation_liked_comment rlc ON rlc.comment_id = c.id LEFT JOIN relation_liked_comment rlc2 ON rlc2.comment_id = c.id AND rlc2.user_id = :${USER_ID} LEFT JOIN comments c2 ON c2.parent_id = c.id WHERE parent_id IS NULL AND c.post_id = :${POST_ID} GROUP BY c.comment, c.created_at, u.username, u.picture, is_liked HAVING ((10 * COUNT(rlc.user_id)) / POWER((EXTRACT(EPOCH FROM (NOW() - c.created_at)) / 3600 + 2), 1.5), c.id) < (:${ORDER_VALUE}, :${COMMENT_ID}) ORDER BY order_col DESC, c.id DESC FETCH FIRST :${PAGE_SIZE} ROWS ONLY;
With this query design, we use keyset pagination to ensure a consistent order of comments across
pages. For the first page, we start by using maximum possible values for the ordering fields (such
as the highest order_col
and comment ID) as our cursor, so that we retrieve the top
results. For
subsequent pages, the pagination query uses the order_col
and ID of the last comment
from the
previous page as the cursor in the HAVING
clause. This filters the results to only
those comments
that come after the last fetched entry, ensuring no duplicates and maintaining the intended order.
This is the complete query for fetching the parent comments. But how should we retrieve the child
comments associated with a parent? The following example illustrates the query for fetching them.
SELECT c.comment, c.created_at, u.username, u.picture, CASE WHEN rlc2.comment_id IS NOT NULL THEN true ELSE false END AS is_liked, COUNT(rlc) AS like_count FROM comments c JOIN users u ON c.user_id = u.id LEFT JOIN comments c2 ON c2.parent_id = c.id LEFT JOIN relation_liked_comment rlc ON rlc.comment_id = c.id LEFT JOIN relation_liked_comment rlc2 ON rlc2.comment_id = c.id AND rlc2.user_id = :${USER_ID} WHERE parent_id = :${PARENT_COMMENT_ID} AND (c.created_at, c.id) > (:${CREATED_AT}, :${REPLY_ID}) GROUP BY c.comment, c.created_at, u.username, u.picture, is_liked ORDER BY c.created_at, c.id FETCH FIRST :${PAGE_SIZE} ROWS ONLY;
Key differences include filtering the parent_id
by the specific comment id rather than
null.
Additionally, the reply count is removed because nested replies are not supported, and the
post_id
condition is omitted since unique comment ids already ensure the required filtering granularity.
Finally, the ordering is based solely on the natural sequence - displaying the oldest responses
first
and progressing to the newest.
Indexing
Proper indexing is critical for maintaining performance as your comments and replies dataset grows.
To ensure efficient query execution, create indexes on columns that are frequently used in filtering
(such as post_id
, parent_id
) and sorting (like created_at
).
In cases where
multiple columns are often queried together, consider composite indexes to optimize join and search
operations. Keep in mind that while indexes can significantly speed up data retrieval, they also add
overhead during write operations, so it’s important to balance indexing strategies with your update
frequency. Regularly review your query execution plans and monitor index usage to adjust or add
indexes as your application's query patterns evolve.
Conclusion
This example illustrates how a single SQL query can effectively retrieve comments, providing all the necessary details for display. It fetches parent comments along with associated user information, like counts, reply counts, and a relevance-based ordering mechanism, while also offering a method to retrieve child comments. This approach balances performance with flexibility, streamlining the management and pagination of comment threads - all without duplicating data, as both parent comments and replies are stored in a single table.