Introducing Hubble: A Public Mirror for the Whole Atmosphere
This is a guest post from fig, creator and maintainer of the microcosm infrastructure for building great atproto apps, originally published on the microcosm blog. We're thrilled to be supporting fig's work building a more resilient ecosystem with Hubble.
Every account in the Atmosphere lives in a Personal Data Server (PDS) somewhere. Whether you run your own (like Cory Doctorow), team up with friends to share a server, or park your account at a PDS from Bluesky, Blacksky, Northsky, Eurosky, npmx.social, selfhosted.social, pckt.blog, margin.at, sprk.so, tngl.sh, teal(deep breath)— your app experience is the same. Your PDS logs you in, and holds your Bluesky posts, Leaflet blogs, and alllll your other public data for atproto apps.
But what happens when a PDS goes offline?
[MISSING]: Has anyone seen this PDS?
For you as a user, this is recoverable! …if you’ve taken some preparations. It's easier to move when your PDS is online, but with a rotation key and a backup of your data, you're in complete control. Tools like PDS MOOver—a community-built multi-purpose account management tool—have made this preparation much easier than it used to be.
For developers building in the Atmosphere, the answer is less comfortable. An offline PDS is like a hole cut out of the database. If your app has saved a copy of the data it needs from it, you might be fine. But if you need anything from it, or spin up a new app and backfill from scratch: unless you can find a copy from somewhere, you are simply out of luck. Your picture of the network will be a bit different from everyone else's, potentially permanently.
It wasn’t always like this.
Ye olde archival relay
In the bad old days of early Atmosphere, relays stored a local copy of every user’s data repository (“repo”). This was great for apps! Devs could backfill the entire network from a relay, no matter whether any PDS was online or not.
Unfortunately this setup was less great for the relays themselves, because keeping copies of tens of millions of repositories is expensive to store and run.
Last year the sync1.1 protocol update changed the firehose game: now you can fully authenticate repository data updates without keeping your own full copy for proof! Smart cryptography! All relays in the network now run this way, and are called non-archival: lightweight and cheap to run, but only retaining recent events, not the full history of the network.
These days, atproto apps backfill by crawling the live network themselves, instead of replaying from an archival relay. Initial repository contents are downloaded direct-from-PDS, and Bluesky’s Tap and other great tools have emerged which help developers manage the backfill process. But Tap can’t bring unreachable PDSes back from the dead offline.
When a PDS disappears, those repos are gone.
(dramatic voice) Until now
Hubble is a new open-source project to build and operate a whole-Atmosphere public data mirror, synchronizing every atproto repository in real-time, keeping public data available even when a PDS goes down.
I've wanted to build this for a while! I build and run various full-network indexes, relays, and other atproto infrastructure, including an identity-and-record edge cache (called Slingshot) which already helps a bit with PDS availability gaps—if it has already cached the data you want.
A full mirror of the network is a step up from that and a step-change in what it brings: account recovery for users, data availability for apps, and complete snapshots for researchers. But it’s a bigger, riskier project, with potentially high operating costs and no obvious direct revenue model. Hard to take the leap without some support.
That support came together as a $20,000 grant from Bluesky for Hubble’s development and one year of operation. The structure of the grant aligns incentives: operating costs come out of the total amount, so the more efficient I can make it, the better for me. (and if DRAM keeps rocketing in price, …uh oh). We set out some acceptance criteria for Hubble’s launch (and the final payout), and then I run it for 12 months.
What happens after 12 months? It’s all open-source, so anyone (including me!) will be able to keep a Hubble server running. And I plan to! But if after a year i can't continue operating it sustainably, Bluesky will pick up the live instance to keep it online. Hubble itself won’t go down like an offline PDS!
I think it’s a pretty neat funding arrangement! Substantial support without giving up independence, while keeping a collaborative view toward long-term stability for the ecosystem.
✨Authenticated✨ data mirror
One thing to underscore: because all public data in atproto is cryptographically signed by the account it’s from, Hubble cannot tamper with it. This is the Authenticated in the Authenticated Transfer Protocol: copies of public data you get from anywhere are verifiable, even if you can’t reach the original PDS it came from.
What’s mirrored (and what isn’t)
Hubble will mirror every public data repository in the Atmosphere, like relays did before the non-archival switch. Repositories contain almost everything you might think of as “your data” that you share publicly: your profile, all your likes, posts, replies, follows, recipes, RSVPs, …
But it’s worth noting a few things that Hubble will not store:
- No blobs: Images, videos, and other media live separately from repos. It takes a lot of storage to keep 43+ million repositories, and blobs take orders of magnitude more space than that! (think: petabytes). Future work for Hubble.
- No private app data: Things like mutes and bookmarks on Bluesky that aren’t part of your public data repository will not be stored by Hubble. (This isn’t just Hubble being virtuous, it literally cannot access any non-public data!)
Hubble will fully respect content deletion. Deleted records are deleted from the mirror, and deleting your Atmosphere account removes your entire data repository. Data from inactive accounts won't be accessible from unauthenticated public endpoints. Hubble mirrors the current state of the network, not its history.
How to get a repository from Hubble
For personal account recovery, Hubble will launch with a public website where you can find and download your data: hubble.microcosm.blue. The archive you’ll receive is exactly the same as what you get when exporting your data from the Bluesky app, for example. You can import this archive when moving to a new PDS to restore all your public data.
For developers, Hubble will implement standard protocol XRPC queries for programmatic access. The funding acceptance criteria for launch includes three:
com.atproto.sync.getRepo**: Download a repository archive for a provided a DID.com.atproto.sync.listRepos**: Enumerate every account Hubble knows about.com.atproto.sync.getRepoStatus**: Get the hosting status on Hubble for a DID.
You might notice that these are some of the same queries you can send a PDS or relay: that’s on purpose! Using Hubble can be just a matter of changing where you send a request when the original host is offline.
These three queries are the start (the minimum required) but not the end of what I have planned. More on this soon!
Who's building this
I’m fig (or Phil) (they/them), and I started microcosm.blue to build and run community infrastructure for the Atmosphere. In the year since its launching the first service (Constellation universal backlinks), microcosm has grown to serve hundreds of requests per second from dozens of apps you might have heard of, like Blacksky, npmx.dev, and PDS MOOver.
Microcosm already hosts other independent protocol-level infrastructure—multiple relays, PLC mirrors—along with new full-network API services like Slingshot, UFOs, Spacedust, and more. Most of it runs on hardware that would embarrass a hyperscaler (might be a Raspberry Pi or three involved), and it’s all free to use thanks to direct community sponsorship helping cover costs.
Hubble fits right in with these projects! I’m so excited to keep building and being part of the growing community of independent infrastructure developers and operators in the Atmosphere.
Network resilience
More PDSes come online every week, and more people are using them! Even the number of different PDS implementations seems to grow by the day.
Chart showing non-Bluesky PDS usage, courtesy of https://blue.mackuba.eu/stats/
This diversification is really good news for the future of atproto!
And also, to be realistic about it: some of these PDSes might not have robust backup plans, or the people running them might get bored, or a new implementation might have bugs, and all of this might lead to more PDSes to going offline. Maybe one day a PDS will even try to stop their users from moving away.
Hubble is one small step to counter these effects, in line with the atproto ethos. It fills the gap left by relays switching to non-archival mode; it’s a new baseline for account recovery and even adversarial PDS migration; and it’s a data availability backstop for developers building apps.
I’ll be publishing about the design and implementation as it takes shape, and you can subscribe to those at updates.microcosm.blue.
Questions or high-volume access inquiries: reach out to fig at @bad-example.com.