Serving the For You Feed

We're excited to publish another guest post highlighting development in the atproto ecosystem. Spacecowboy is the builder behind the popular For You feed, which serves personalized content to tens of thousands of users every day. In this post, Spacecowboy explains how they serve the For You feed from their living room, using a combination of local infrastructure and a VPS as a proxy.

In the spirit of the post on how Graze.social serves their feeds, here is how I run 💖For You.

The logic of the feed is super simple: "It finds people who liked the same posts as you, and shows you what else they've liked recently"

For You is a single Go binary that does most things:

consumes the firehose (Jetstream) of posts, likes, reposts and saves them into a sqlite database
serves the feed
serves the playground page https://foryou.club/playground

Being a single process makes it easy to manage.

Database

I store all data in sqlite. A key advantage of sqlite for me is testability - I can create the database in memory for unit testing purposes. I don't need to mock or fake my storage layer. The setup/teardown is instant.

For querying the data, I use the excellent sqlc.dev. It is a tool that gives me full control of the queries I want to run and takes care of the boilerplate through code generation.

To keep likes data memory-efficient (both on disk and in in-memory caches), I give each post AT URI an integer id:

sqlite> select \* from items order by id limit 10;
id uid metadata
7 at://did:plc:x5qqhu6n6jrwp3vulwia6eee/app.bsky.feed.post/3lmn2andjik2x
10 at://did:plc:hn2uy22exqcvxsh7mnikeevz/app.bsky.feed.post/3lmmj3x2oos2i
11 at://did:plc:4wep2s4udx4bqv774cq5wlsk/app.bsky.feed.post/3llh66exc2c2r
12 at://did:plc:kydd7ppawtffeqfvek5omiz4/app.bsky.feed.post/3lmn6tijcgk2z

Similarly for the users stored in the raters table:

sqlite> select * from raters order by id limit 10;
id uid
1 did:plc:xmxy5pcqjbbjmyl2kfrmhswb
2 did:plc:nznmwgfiz7wnmzajxffpirmc
3 did:plc:6mn4v5ao2fggfbyeuysrmshq
4 did:plc:zreixrmbrxrpky6k6yi4yilq
5 did:plc:dlbzlumnyv2dabsinhacqyxu

The posts, likes, reposts are stored as an entry in the ratings table:

sqlite> select * from ratings limit 10;

id item_id rater_id created_timestamp
5981931671 747239155 556296 1768688218
5981931672 747111170 3677445 1768688218
5981931673 747226521 114350 1768688218
5981931674 745813473 1751615 1768688218
5981931675 718027268 2796254 1768688218
5981931676 747220721 2526 1768688218

I open the sqlite database with multiple connections (5) to allow concurrent reads. To avoid the dreaded "busy" error on writes, I enforce that only one thread can perform a write using a Go mutex. If you search online the most common suggestion for solving the "busy" error is db.SetMaxOpenConns(1), but that makes reads sequential too.

To put a limit on db size I store only the last 90 days of data. Every 24 hours a cleanup goroutine kicks in that deletes all ratings older than 90 days and then deletes all items that have no ratings pointing at them. The sqlite db file is still hefty at 419GB. I never run vacuum on it because it would be too slow (maybe a couple of hours) and would make the feed unavailable.

Logging

There is one more sqlite db - 200GB file storing response logs. It keeps track of which posts have been returned. When the user likes a post that was shown in For You, I notice this in the firehose and update the log entry. Same with "show more" and "show less" interactions. These logs are useful for analyzing how the feed is performing. And when I run an A/B test I can compare the performance of control vs treatment arms. I dump the logs from sqlite periodically into per-day parquet files like this:

2.6G Apr 18 08:16 logs-2026-04-17.parquet

(as these parquet files accumulate I should probably move them to HDD)

I use duckdb to query these parquet files and build graphs like this:

and to generate A/B test reports:

I have been running this test for more than a month and finished it today. The statistically significant result is that users are 2.6% less likely to press "show less like this" 🎉 - 5.7% more "show more" interactions: 41,425 -> 43,795 (+2,370) - 5.2% fewer "show less": 169,848 -> 160,938 (-8,910)
— spacecowboy (@spacecowboy17.bsky.social) 2026-04-06T20:08:46.802Z

and to build the user stats dashboard:

Here is a very rough version of the For You Stats dashboard: linklonk.com/foryou It shows you 3 things: - how much you've been using For You - how many views and likes your posts got in For You - how your likes/reposts helped surface posts to other people in For You Feedback welcome
— spacecowboy (@spacecowboy17.bsky.social) 2026-02-15T15:41:16.214Z

Caching

For You makes very heavy use of in-process caches (https://github.com/hashicorp/golang-lru). I don't need a separate service like Redis because all writers and readers of this cache are in a single process. Such caching is fast - no interprocess communication or serialization/deserialization overhead. Almost 100% of the data that is necessary to generate recommendations comes from the cache and not from the database.

The downside of in-process caches is that whenever I restart the feed process it becomes super slow until the caches are sufficiently warmed up.

Hardware

The feed runs in my living room on my "gaming" PC attached to the TV. Some specs:

CPU: AMD 9950X3D - a 16-core processor with extra L3 cache
- upgraded from 12-core 7900 in Dec 2025:
  - The For You feed is going to be down while I'm upgrading the CPU from AMD 7900 (12 core) to AMD 9950X3D (16 core). This should increase how many users it can serve by ~50%-100%. If all goes well it should be back up in an hour or two.
    — spacecowboy (@spacecowboy17.bsky.social) 2025-12-28T16:40:05.960Z
- The 3D cache gives ~25% performance boost to the CCD with the cache
RAM: 96GB DDR5 6000MT/s
- upgraded from 32GB back in June 2025 before the price crunch
Storage:
- 2TB NVMe for the database
- 2TB NVMe for the rest of the system
Power backup: the PC, the Internet modem and the router are hooked up to an Ecoflow Delta 3, which would provide ~4-5 hours of runtime in case of a power outage

The load

Every day ~72K users load the feed at least once. On average, a user generates ~22 requests per day.

The traffic varies from 15 QPS to 25 QPS. At 25 QPS, the CPU is ~37% loaded (12 out of 32 threads).

The process consumes ~50GB of RAM - 99% of it is the caches.

Being CPU bound, this setup should be able to accommodate 3x more users.

Future growth

What if we get >3x more traffic? In that case, the feed notices that requests are taking longer to process and switches to a set of algorithm parameters that make the calculations significantly (>10x) cheaper with minimal quality degradation:

With these parameters the CPU load went way down: from 22/24 cores being loaded to only 7/24. This means there is a way to handle a lot more traffic.
— spacecowboy (@spacecowboy17.bsky.social) 2025-12-26T17:20:38.611Z

It means we could be serving 30x more users which would be 72K*30=2.1M. The best available proxy for active Bluesky users is the daily count of users who've liked something. This metric has been stable at ~1M (https://bsky.jazco.dev/stats). Which means that the current setup could theoretically support all active users!

Exposing the feed to the internet

The local process serves http://localhost:8090 but you can't access it from outside. To be publicly visible, I rent a small VPS on OVH.

I use Nginx to handle the incoming requests to /xrpc/app.bsky.feed.getFeedSkeleton and /xrpc/app.bsky.feed.sendInteractions. Nginx then proxies the request to a small Go process I call "dispatch". Dispatch does a few things:

It validates the JWT tokens. This process involves fetching the DID document. The host address of the document is controlled by the caller. I don't want to make these requests from my home PC and reveal my IP address.
I run multiple feeds - they run on different ports on my local machine, but they all have to share the same external endpoint: /xrpc/app.bsky.feed.getFeedSkeleton. Dispatch proxies the request to the correct address. Both my home PC and the VPS are on the same Tailscale network, so dispatch makes requests to my PC as http://gaming:8090 for For You or http://gaming:8093 for Videos For You.

When the PC is down for maintenance, I can tell dispatch to return a canned post for all feed requests. Example:

For You will be down for 10-20 minutes. I want to change my RAM speed in bios from 4800 to 6000 in case it helps the feed load a bit faster. For the last week I've been running Gnome from the fresh Ubuntu install and I need to go back to MATE.
— spacecowboy (@spacecowboy17.bsky.social) 2026-02-28T16:39:45.649Z

All VPS services are run through docker-compose.

Monitoring

After a few outages, I've set up free uptime monitoring using https://hetrixtools.com/. They try to access https://foryou.club/playground every minute, and if it becomes unavailable for >5 minutes, they send me an email, a phone call and a text. It's a great service.

The uptime since January has been 99.77%.

Running costs

~$20/month for the electricity - ~200W 24/7
~$7/month for the VPS
~$3/month for two domain names

If I were to rent a similar server, it would cost $245/month. But what fun is that?

I'm totally fine to cover the costs:

Thanks everyone for offering to pitch in to support the For You feed! I want to keep it as a pure hobby project with no financial side. I'm fine to do this indefinitely, so please don't worry about the sustainability.
— spacecowboy (@spacecowboy17.bsky.social) 2025-12-26T22:13:34.108Z

If you can, consider supporting other feed service maintainers who need it much more:

Folks have been reaching out about how to support Graze beyond our core product. We think that support needs to go beyond Graze. That's why we've teamed up with our direct competitors @skyfeed.app and @blueskyfeedcreator.com to jointly support all our work:
— Graze Social (@graze.social) 2026-04-01T16:19:22.122524+00:00

Hi! SkyFeed is struggling and I need to find a solution. If you don't know SkyFeed, it's a service I built 2 years ago, enabling anyone to build and publish feeds using a visual block-based editor. SkyFeed is still hosting the most feeds on Bluesky, but that comes at a cost [1/X]
— redsolver (@redsolver.dev) 2026-03-10T20:22:54.408Z

I think this provides a great example of how you can run a service on local infrastructure, combined with a VPS proxy. It's a cost-effective way to serve a feed to a large number of users without needing to invest in expensive cloud infrastructure.

Thanks for reading! And look forward to more posts featuring feed builders.