Replacing Your Database with Git: How Tobesee Stores Content in GitHub Repositories
August 14, 2024
A technical deep-dive into the architecture behind Tobesee — how JSON indexes and Markdown files in a Git repository replace traditional database tables for content management
Replacing Your Database with Git: How Tobesee Stores Content in GitHub Repositories
Every CMS needs somewhere to put its data. For decades, that somewhere has been a relational database. Tobesee takes a different path: it stores content as files in a GitHub repository and accesses them through the GitHub REST API. This article explains the technical details of how this works, why it is a viable approach for content websites, and where the boundaries are.
The Data Model
Traditional CMS platforms model content as database rows. An article might be a row in an articles table with columns for title, body, slug, created_at, and updated_at. Relationships between content types are expressed through foreign keys and join tables.
Tobesee models content as files:
- JSON index files serve the role of database tables — they contain arrays of metadata objects that can be quickly scanned for listing and search operations
- Markdown files serve the role of individual records — each file contains the full content of one article, with YAML frontmatter for structured metadata
Here is a concrete example. The file data/json/articles.json contains:
[
{
"id": "my-first-post",
"title": "My First Post",
"description": "An introduction to my blog",
"date": "2026-01-15",
"path": "data/md/my-first-post.md"
}
]
And the corresponding Markdown file at data/md/my-first-post.md:
---
title: "My First Post"
description: "An introduction to my blog"
date: "2026-01-15"
---
# My First Post
Welcome to my blog...
How Reads Work
When a visitor opens the articles listing page, Tobesee makes a single GitHub API call to fetch articles.json. This returns the metadata for all articles — titles, descriptions, dates — without loading the full content of each one. The listing page renders from this lightweight index.
When a visitor clicks on a specific article, Tobesee makes another API call to fetch the corresponding Markdown file. The gray-matter library extracts the YAML frontmatter, and remark converts the Markdown body to HTML. Next.js renders the result as a complete page.
The two-step approach — index for listings, full file for detail views — mirrors how databases use indexes to speed up queries. The difference is that the "index" is a JSON file and the "query" is a file read.
How Writes Work
When an admin creates or edits an article through the dashboard, the write flow involves multiple API calls:
- Read the current index — fetch
articles.jsonto get the current SHA (needed for updates) - Write the Markdown file — create or update the
.mdfile with the article content - Update the index — add or modify the article entry in
articles.json
Each write operation creates a Git commit. The GitHub API requires the current file SHA for updates, which prevents race conditions — if someone else modified the file between your read and write, the API returns a 409 conflict error.
Performance Characteristics
Read Latency
GitHub API responses typically arrive in 100-300ms from server-side code. Combined with Next.js server-side rendering, the total time to first byte for a Tobesee page is usually 200-500ms. This is comparable to a well-optimized database-backed CMS.
For frequently accessed pages, Next.js caching reduces this further. Static generation can pre-render pages at build time, serving them from the CDN edge in under 50ms.
Write Latency
Write operations are slower than reads because they involve multiple API calls and Git commit creation. A typical save operation takes 1-3 seconds. This is acceptable for content management but would be too slow for high-frequency writes.
Rate Limits
Authenticated GitHub API requests are limited to 5,000 per hour. For a content website with moderate traffic, this is more than sufficient. A site serving heavy traffic would need aggressive caching (which Next.js provides) to stay within limits.
Version Control as a Feature
The most significant advantage of storing content in Git is version control. Every article edit creates a commit with the exact changes made (viewable as a diff), a timestamp, and the author information.
This means you can roll back any article to a previous version, compare versions to see what changed, audit changes to track modifications, and branch content to work on drafts without affecting the live site.
Trade-Offs to Consider
No Complex Queries
You cannot run SQL-style queries against file-based content. There is no WHERE, JOIN, or GROUP BY. For most content websites, the queries are simple — list all articles sorted by date, fetch one article by slug — and these work well with JSON indexes.
File Size Limits
GitHub API has a 100MB file size limit and a 1MB limit for the contents API. For text-based content, these limits are rarely an issue. For large binary files, use a dedicated storage service.
Collaboration at Scale
Resolving merge conflicts in JSON files can be tricky when two admins edit simultaneously. Tobesee handles this with SHA-based conflict detection, but one admin might need to retry their save.
When This Architecture Makes Sense
The file-based approach works well when content changes are infrequent, the content model is simple, version control is valuable, and cost matters. It does not work well for real-time data, massive content volumes, complex queries, or many concurrent writers.
Storing content in GitHub repositories is a deliberate architectural choice that trades database flexibility for simplicity, portability, and built-in version control. For the category of websites Tobesee targets, this trade-off is favorable.