Skip to main content

· 4 min read

In March, we announced the AT Protocol Grant program, aimed at fostering the growth and sustainability of the atproto developer ecosystem, as well as the first three recipients.

We’re excited to share the second batch of grant recipients with you today!

In this batch, we distributed $4,800 total in grants. There is $2,200 remaining in the initial allocation of $10,000. Congratulations to all of the recipients so far, and thank you to everyone who has applied — we're very lucky to have such a great developer ecosystem.

Reminder: You can apply for a grant for your own project by filling out this form. We’re excited to hear what you’re working on!


Blacksky Algorithms@rudyfraser.com ($1000)

Blacksky is a suite of services on AT Protocol that all serve to amplify, protect, and provide moderation services for the network’s Black users. It also doubles as a working atproto implementation in Rust. AT Protocol creates the opportunity to build community in public (e.g. custom feeds, PDS-based handles) while retaining enough agency to build protective measures (e.g. blocking viewers of a feed, third-party labeling) for the unique issues that community may face such as anti-blackness. Rudy has worked on Blacksky for the last 10 months and has built a custom firehose subscriber, feedgen, and has almost completed a PDS implementation from scratch.

SkyBridge@videah.net ($800)

SkyBridge is a in-progress proxy web server that translates Mastodon API calls into appropriate Bluesky ones, with the main goal of making already existing Mastodon tools/apps/clients such as Ivory compatible with Bluesky. The project is currently going through a significant rewrite from Dart into Rust.

TOKIMEKI@holybea.social ($500)

TOKIMEKI is a third-party web client for Bluesky. It is one of the most popular 3rd party clients in Japan. Development and release began in March 2023. It supports multi-column and multi-accounts like TweetDeck, and features lightweight behavior, a variety of themes, and unique features such as bookmarks.

Bluesky Case BotsFree Law Project ($500)

Free Law Project is a small non-profit dedicated to making the legal sector more fair and competitive. We run a number of bots on Blue Sky that post whenever there are updates in important American legal cases, such as the Trump indictments, cases related to AI, and much more.

MorphoOrual ($500)

Morpho is a native Android app for Bluesky and the AT Protocol, written in Kotlin. The primary goals are to have improved performance and accessibility relative to the official app, and to accommodate de-Googled Android devices and security/privacy-minded users while retaining a full set of features.

hexPDSnova ($500)

hexPDS is intended to be a production-grade PDS, written in Elixir/Rust. So far, some DID:PLC related operations have been completed, and MST logic and block storage are in progress. The next milestone is loading a repo’s .CAR file and parsing the records.

ATrium@sugyan.com ($500)

ATrium is a collection of Rust libraries for implementing AT Protocol applications. It generates code from Lexicon schema and provides type definitions, etc. for use with Rust to handle XRPC requests.

SkeetStats@ameliamnesia.xyz ($500)

SkeetStats is currently running and used actively by 400+ users. It tracks account stats for opted-in users and displays them in a variety of charts and graphs, and has a bot that allows users to opt in/out.


Thank you to everyone who has applied for a grant so far! As mentioned above, we have $2,200 remaining in the initial allocation of $10,000 for this program. If you’re an atproto developer and haven’t applied for a grant yet, we encourage you to do so — the application is here.

· 12 min read

Moderation is a crucial aspect of any social network. However, traditional moderation systems often lack transparency and user control, leaving communities vulnerable to sudden policy changes and potential mismanagement. To build a better social media ecosystem, it is necessary to try new approaches.

Today, we’re releasing an open labeling system on Bluesky. “Labeling” is a key part of moderation; it is a system for marking content that may need to be hidden, blurred, taken down, or annotated in applications. Labeling is how a lot of centralized moderation works under the hood, but nobody has ever opened it up for anyone to contribute. By building an open source labeling system, our goal is to empower developers, organizations, and users to actively participate in shaping the future of moderation.

In this post, we’ll dive into the details on how labeling and moderation works in the AT Protocol.

An open network of services

The AT Protocol is an open network of services that anyone can provide, essentially opening up the backend architecture of a large-scale social network. The core services form a pipeline where data flows from where it’s hosted, through a data firehose, and out to the various application indexes.

Data flows from independent account hosts into a firehose and then to applications.

Data flows from independent account hosts into a firehose and then to applications.

This event-driven architecture is similar to other high-scale systems, where you might traditionally use tools like Kafka for your data firehose. However, our open system allows anyone to run a piece of the backend. This means that there can be many hosts, firehoses, and indexes, all operated by different entities and exchanging data with each other.

Account hosts will sync with many firehoses.

Account hosts will sync with many firehoses.

Why would you want to run one of these services?

  • You’d run a PDS (Personal Data Server) if you want to self-host your data and keys to get increased control and privacy.
  • You’d run a Relay if you want a full copy of the network, or to crawl subsets of the network for targeted applications or services.
  • You’d run an AppView if you want to build custom applications with tailored views and experiences, such as a custom view for microblogging or for photos.

So what if you want to run your own moderation?

Decentralized moderation

On traditional social media platforms, moderation is often tightly coupled with other aspects of the system, such as hosting, algorithms, and the user interface. This tight coupling reduces the resilience of social networks as businesses change ownership or as policies shift due to financial or political pressures, leaving users with little choice but to accept the changes or stop using the service.

Decentralized moderation provides a safeguard against these risks. It relies on three principles:

  1. Separation of roles. Moderation services operate separately from other services – particularly hosting and identity – to limit the potential for overreach.
  2. Distributed operation. Multiple organizations providing moderation services reduces the risk of a single entity failing to serve user interests.
  3. Interoperation. Users can choose between their preferred clients and associated moderation services without losing access to their communities.

In the AT Protocol, the PDS stores and manages user data, but it isn’t designed to handle moderation directly. A PDS could remove or filter content, but we chose not to rely on this for two main reasons. First, users can easily switch between PDS providers thanks to the account-migration feature. This means any takedowns performed by a PDS might only have a short-term effect, as users could move their data to another provider. Second, data hosting services aren't always the best equipped to deal with the challenges of content moderation, and those with local expertise and community building skills who want to participate in moderation may lack the technical capacity to run a server.

This is different from ActivityPub servers (Mastodon), which manage both data hosting and moderation as a bundled service, and do not make it as easy to switch servers as the AT Protocol does. By separating data storage from moderation, we let each service focus on what it does best.

Where moderation is applied

Moderation is done by a dedicated service called the Labeler (or “Labeling service”).

Labelers produce “labels” which are associated with specific pieces of user-generated content, such as individual posts, accounts, lists, or feeds. These labels make an assertion about the content, such as whether it contains sensitive material, is unpleasant, or is misleading.

These labels get synced to the AppViews where they can be attached to responses at the client’s request.

Labels are synced into AppViews where they can be attached to responses.

Labels are synced into AppViews where they can be attached to responses.

The clients read those labels to decide what to hide, blur, or drop. Since the clients choose their labelers and how to interpret the labels, they can decide which moderation systems to support. The chosen labels do not have to be broadcast, except to the AppView and PDS which fulfill the requests. A user subscribing to a labeler is not public, though the PDS and AppView can privately infer which users are subscribed to which services.

In the Bluesky app, we hardcode our in-house moderation to provide a strong foundation that upholds our community guidelines. We will continue to uphold our existing policies in the Bluesky app, even as this new architecture is made available. With the introduction of labelers, users will be able to subscribe to additional moderation services on top of the existing foundation of our in-house moderation.

The Bluesky application hardcodes its labeling and then stacks community labelers on top.

The Bluesky application hardcodes its labeling and then stacks community labelers on top.

For the best user experience, we suggest that clients in the AT Protocol ecosystem follow this pattern: have at least one built-in moderation service, and allow additional user-chosen mod services to be layered in on top.

The Bluesky app is a space that we create and maintain, and we want to provide a positive environment for our users, so our moderation service is built-in. On top of that, the additional services that users can subscribe to creates a lot of options within the app. However, if users disagree with Bluesky’s application-level moderation, they can choose to use another client on the network with its own moderation system. There are additional nuances to infrastructure-level moderation, which we will discuss below, but most content moderation happens at the application level.

How are labels defined?

A limited core set of labels are defined at the protocol level. These labels handle generic cases (“Content Warning”) and common adult content cases (“Pornography,” “Violence”).

A label can cover the content of a post with a warning.

A label can cover the content of a post with a warning.

Labelers may additionally define their own custom labels. These definitions are relatively straightforward; they give the label a localized name and description, and define the effects they can have.

interface LabelDefinition {
identifier: string
severity: 'inform' | 'alert'
blurs: 'content' | 'media' | 'none'
defaultSetting: 'hide' | 'warn' | 'ignore'
adultContent: boolean
locales: Record<string, LabelStrings>
}

interface LabelStrings {
name: string
description: string
}

Using these definitions, it’s possible to create labels which are informational (“Satire”), topical (“Politics”), curational (“Dislike”), or moderational (“Rude”).

Users can then tune how the application handles these labels to get the outcomes they want.

Users configure whether they want to use each label.

Users configure whether they want to use each label.

Learn more about label definitions in the API Docs on Labelers and Moderation.

Running a labeler

We recently open-sourced Ozone, our powerful open-source Labeler service that we use in-house to moderate Bluesky. This is a significant step forward in transparency and community involvement, as we're sharing the same professional-grade tooling that our own moderation team relies on daily.

Ozone is designed for traditional moderation, where a team of moderators receives reports and takes action on them, but you're free to apply other models with custom software. We recommend anyone interested in running a labeler try out Ozone, as it simplifies the process by helping labelers set up their service account, field reports, and publish labels from the same web interface used by Bluesky's own moderation team, ensuring that all the necessary technical requirements are met. Detailed instructions for setting up and operating Ozone can be found in the readme.

Beyond Ozone’s interface, you can also explore alternative ways of labeling content. A few examples we've considered:

  • Community-driven voting systems (great for curation)
  • Network analysis (e.g., for detecting botnets)
  • AI models

If you want to create your own labeler, you simply need to build a web service that implements two endpoints to serve its labels to the wider network:

  • com.atproto.label.subscribeLabels : a realtime subscription of all labels
  • com.atproto.label.queryLabels : for looking up labels you've published on user-generated content

Reporting content is an important part of moderation, and reports should be sent privately to whoever is able to act on them. The Labeling service's API includes a specific endpoint designed for this use case. To receive user reports, a labeler can implement:

  • com.atproto.report.createReport : files a report from a particular user

When a user submits a report, they choose which of their active Labelers to report to. This gives users the ability to decide who should be informed of an issue. Reports serve as an additional signal for labelers, and they can handle them in a manner that best suits their needs, whether through human review or automated resolution. Appeals from users are essentially another type of report that provides feedback to Labelers. It's important to note that a Labeler is not required to accept reports.

In addition to the technical implementation, your labeler should also have a dedicated Bluesky account associated with it. This account serves as your labeler's public presence within the Bluesky app, allowing you to share information about the types of labels you plan to publish and how users should interpret them. By publishing an app.bsky.labeler.service record, you effectively "convert" your account into a Bluesky labeler, enabling users to discover and subscribe to your labeling service.

More details about labels and labelers can be found in the atproto specs.

Infrastructure moderation

Labeling is the basic building block of composable moderation, but there are other aspects involved. In the AT Protocol network, various services, such as the PDS, Relay, and AppView, have ultimate discretion over what content they carry, though it's not the most straightforward avenue for content moderation. Services that are closer to users, such as the client and labelers, are designed to be more actively involved in community and content moderation. These service providers have a better understanding of the specific community norms and social dynamics within their user base. By handling content moderation at this level, clients and labelers can make more informed decisions that align with the expectations and values of their communities.

Infrastructure providers such as Relays play a different role in the network, and are designed to be a common service provider that serves many kinds of applications. Relays perform simple data aggregation, and as the network grows, may eventually come to serve a wide range of social apps, each with their own unique communities and social norms. Consequently, Relays focus on combating network abuse and mitigating infrastructure-level harms, rather than making granular content moderation decisions.

An example of harm handled at the infrastructure layer is content that is illegal to host, such as child sexual abuse material (CSAM). Service providers should actively detect and remove content that cannot be hosted in the jurisdictions in which they operate. Bluesky already actively monitors its infrastructure for illegal content, and we're working on systems to advise other services (like PDS hosts) about issues we find.

Labels drive moderation in the client. The Relay and Appview apply infrastructure moderation.

Labels drive moderation in the client. The Relay and Appview apply infrastructure moderation.

This separation between backend infrastructure and application concerns is similar to how the web itself works. The PDSs function like personal websites or blogs on the web, which are hosted by various hosting providers. Just as individuals can choose their hosting provider and move their website if needed, users on the AT Protocol can select their PDS and migrate their data if they wish to change providers. Multiple companies can then run Relays and AppViews over PDSs, which are similar to content delivery networks and search engines, that serve as the backbone infrastructure to aggregate and index information. To provide a unified experience to the end user, application and labeling systems then provide a robust, opinionated approach to content moderation, the way individual websites and applications set their own community guidelines.

In summary

Bluesky's open labeling system is a significant step towards a more transparent, user-controlled, and resilient way to do moderation. We’ve opened up the way centralized moderation works under the hood for anyone to contribute, and provided a seamless integration into the Bluesky app for independent moderators. In addition, by open sourcing our internal moderation tools, we're allowing anyone to use, run, and contribute to improving them.

This open labeling system is a fundamentally new approach that has never been tried in the realm of social media moderation. In an industry where innovation has been stagnant for far too long, we are experimenting with new solutions to address the complex challenges faced by online communities. Exploring new approaches is essential if we want to make meaningful progress in tackling the problems that plague social platforms today, and we have designed and implemented what we believe to be a powerful and flexible approach.

Our goal has always been to build towards a more transparent and resilient social media ecosystem that can better represent an open society. We encourage developers, users, and organizations to get involved in shaping the future of moderation on Bluesky by running their own labeling services, contributing to the open-source Ozone project, or providing feedback on this system of stackable moderation. Together, we can design a more user-controlled social media ecosystem that empowers individuals and communities to create better online spaces.

Additional reading:

· 3 min read

We’re excited to announce the AT Protocol Grants program, aimed at fostering the growth and sustainability of the atproto developer ecosystem.

In the first iteration of this program, we’ll distribute a total of $10,000 in microgrants of $500 to $2,000 per project based on factors like cost, usage, and more.

To apply for a grant, please fill out this form.

Program Details

Over the last few months, we’ve seen independent developers create projects ranging from browser extensions and clients to PDS implementations and atproto tooling. Many of them have become widely adopted in the Bluesky community, too! As we continue on our path toward sustainability, we’re launching this grants program to encourage and support developers building on the AT Protocol.

We will be distributing a total of $10,000, and will publicly announce all grant recipients. We have already distributed $3,000, and the recipients of those grants are detailed below. This is a rolling application, though we will announce when all $10,000 of the initial allocated amount has been distributed.

We’ll evaluate each application based on the submitted project plan and the potential impact. The project should be useful to some user group, whether its fellow developers or Bluesky users. To be eligible for a grant, your project must be open source. We pay out grants via public GitHub Sponsorships.

In addition, we’ve also partnered with Amazon Web Services (AWS) to offer $5,000 in AWS Activate1 credits to atproto developers as well. These credits are applied to your AWS bill to help cover costs from cloud services, including machine learning, compute, databases, storage, containers, dev tools, and more. Simply check a box in your atproto grant application if you’re interested in receiving these credits as well.

Initial AT Protocol Grant recipients

Ahead of Bluesky’s public launch in February, Bluesky PBC extended grants to three developers as a pilot program. We awarded $1,000 each to the following projects and developers:

AT Protocol Python SDK — Ilya Siamionau

AT Protocol Dart SDK — Shinya Kato

Listed on the homepage of the Bluesky API documentation site, these two SDKs have quickly become popular packages with atproto developers. We’re also especially impressed by their own documentation sites!

SkyFeed — redsolver

SkyFeed has helped bring Bluesky’s vision for custom feeds to life — now, there are more than 40,000 custom feeds that users can subscribe to, and a vast majority of them are built with SkyFeed.

Contact

We’re excited to continue to find ways to help developers make their projects built on atproto sustainable. Again, you can submit an application for an AT Protocol grant here.

Please feel free to leave questions or comments on the GitHub discussion for this announcement here.

Footnotes

  1. AWS Activate Credits are subject to the program's terms and conditions.

· 3 min read

This is a guest blog post by Skygaze, creators of the For You feed. You can check out For You, the custom feed that learns what you like, at https://skygaze.io/feed.

Last Sunday, 70 engineers came together at the YC office in San Francisco for the first Bluesky AI Hackathon. The teams took full advantage of Bluesky’s complete data openness to build 17 pretty spectacular projects, many of which genuinely surprised us. My favorites are below. Thank you to Replicate for donating $50 of LLM and image model credits to each participant and sponsoring a $1000 prize for the winning team!

The 17 projects covered a wide range of categories: location-based feeds, feeds with dynamic author lists, collaborative image generation, text moderation, NSFW image labeling, creator tools, and more. The top three stood out for their creativity, practicality, and completeness (despite having only ~6 hours to build), and we’ll share a bit about them below.

Convo Detox

@paritoshk.bsky.social and team came in first place with Convo Detox–a bot that predicts when a thread is at high risk of becoming toxic and interjects to diffuse tension. We were particularly impressed with the team’s use of a self-hosted model trained on Reddit data specifically to predict conversations that are likely to get heated. As a proof of concept they deployed it as a bot that can be summoned via mention, but in the near future this would make for a great third party moderation label.

SF IRL

This is a bot that detects and promotes tech events happening in SF. In addition to flagging events, it keeps track of the accounts posting about SF tech and serves a feed with all of the posts from those accounts. We think simple approaches to dynamic author lists is a very interesting 90/10 on customized feeds and (if designed reasonably) could be both easier for the feed maintainer and higher quality for the feed consumers.

NSFW Image Detection

On Bluesky, users can set whether they want adult content to show up in their app. Beyond this level of customization, whether or not an image is labeled as NSFW can be customized as well — people have a wide variety of preferences. This team trained a model to classify images into a large number of NSFW categories, which would theoretically fit nicely into the 3rd party moderation labeler interface. It’s neat that their choice of architecture extends naturally to processing text in tandem with images.

Other Projects

Other noteworthy projects included translation bots, deep fake detectors, a friend matchmaker, and an image generator tool that allowed people to build image generated prompts together in reply threads. It was genuinely incredibly impressive and exciting to see what folks with no previous AT Proto experience were able to put together (and often deploy !!!) in only a few hours.

Additional Resources

We prepared some starter templates for the hackathon, and want to share them below for anyone who couldn’t attend the event in person!

And if you’re interested in hosting your own bluesky hackathon but don’t know where to start, please feel free to copy all of our invite copy, starter repos, and datasets.

· 5 min read

For a high-level introduction to data federation, as well as a comparison to other federated social protocols, check out the Bluesky blog.

Today, we’re releasing an early-access version of federation intended for self-hosters and developers.

The atproto network is built upon a layer of self-authenticating data. This foundation is critical to guaranteeing the network’s long term integrity. But the protocol’s openness ultimately flows from a diverse set of hosts broadcasting this data across the network.

Up until now, every user on the network used a Bluesky PDS (Personal Data Server) to host their data. We’ve already federated our own data hosting on the backend, both to help operationally scale our service, and to prove out the technical underpinnings of an openly federated network. But today we’re opening up federation for anyone else to begin connecting with the network.

The PDS, in many ways, fulfills a simple role: it hosts your account and gives you the ability to log in, it holds the signing keys for your data, and it keeps your data online and highly available. Unlike a Mastodon instance, it does not need to function as a full-fledged social media service. We wanted to make atproto data hosting—like web hosting—into a fairly simple commoditized service. The PDS’s role has been limited in scope to achieve this goal. By limiting the scope, the role of a PDS in maintaining an open and fluid data network has become all the more powerful.

We’ve packaged the PDS into a friendly distribution with an installer script that handles much of the complexity of setting up a PDS. After you set up your PDS and join the PDS Admins Discord to submit a request for your PDS to be added to the network, your PDS’s data will get routed to other services in the network (like feed generators and the Bluesky Appview) through our Relay, the firehose provider. Check out our Federation Overview for more information on how data flows through the atproto network.

Early Access Limitations

As with many open systems, Relays will never be totally unconstrained in terms of what data they’re willing to crawl and rebroadcast. To prevent network and resource abuse, it will be necessary to place rate limits on the PDS hosts that they consume data from. As trust and reputation is established with PDS hosts, those rate limits will increase. We’ll gain capacity to increase the baseline rate limits we have in place for new PDSs in the network as we build better tools for detecting and mitigating abuse..

For a smooth transition into a federated network, we’re starting with some lower limits. Specifically, each PDS will be able to host 10 accounts and limited to 1500 evts/hr and 10,000 evts/day. After those limits are surpassed, we’ll stop crawling the PDS until the rate limit period resets. This is intended to keep the network and firehose running smoothly for everyone in the ecosystem.

These are early days, and we have some big changes still planned for the PDS distribution (including the introduction of OAuth!). The software will be updating frequently and things may break. We will not be breaking things indiscriminately. However, in this early period, in order to avoid cruft in the protocol and PDS distribution, we are not making promises of backwards compatibility. We will be supporting a migration path for each release, but if you do not keep your PDS distribution up to date, it may break and render the app unusable until you do so.

Because the PDS distribution is not totally settled, we want to have a line of communication with PDS admins in the network, so we’re asking any developer that plans to run a PDS to join the PDS Admins Discord. You’ll need to provide the hostname of your PDS and a contact email in order to get your PDS added to the Relay’s allowlist. This Discord will serve as a channel where we can announce updates about the PDS distribution, relay policy, and federation more generally. It will also serve as a community where PDS admins can experiment, chat, and help each other debug issues.

Account Migration

A major promise of the AT Protocol is the ability to migrate accounts between PDS hosts. This is an important check against potential abuse, and further safeguards the fluid open layer of data hosting that underpins the network.

The PDS distribution that we’re releasing has all of the facilities required to migrate accounts between servers. We’re also opening routes on our PDS that will allow you to migrate your account off our server. However — we do not yet provide the capability to migrate back to the Bluesky PDS, so for the time being, this is a one way street. Be warned: these migrations involve possibly destructive identity operations. While we have some guardrails in place, it may still be possible for you to break your account and we will not be able to help you recover it. So although it’s technically possible, we do not recommend migrating your main account between servers yet. We especially recommend against doing so if you do not have familiarity with how DID PLC operations work.

In the coming months we will be hardening this feature and making it safer and easier to do, including creating an in-app flow for moving between servers.

Getting Started

To get started, join the PDS Administrators Discord, and check out the bluesky-social/pds repo on Github. The README will provide all necessary information on getting your PDS setup and running.

· 3 min read

Bridgy Fed is a bridge between decentralized social networks that currently supports the IndieWeb and the Fediverse, a portmanteau of “federated” and “universe” that refers to a collection of networks including Mastodon. It's a work-in-progress by Ryan Barrett (@snarfed.org), who has already added initial Bluesky support, and is planning on launching it publicly once Bluesky launches federation early next year.

Bridgy Fed is open source, and Ryan has a guide on how IDs and handles are translated between networks. He welcomes feedback!

Screenshot of Bridgy Fed


Can you share a bit about yourself and your background?

I'm a dad, San Francisco resident, and stereotypical Silicon Valley engineer who's always been interested in owning his presence online.

What is Bridgy Fed?

Bridgy Fed is a bridge between decentralized social networks. It currently supports the IndieWeb and the Fediverse, and I soon plan to add other protocols like Bluesky and Nostr.

It's fully bidirectional; from any supported network, you can follow anyone on any other network, see their posts, reply or like or repost them, and those interactions flow across to their network and vice versa. More details here.

Initial Bluesky support is complete! All interactions are working, in both directions. I'm looking forward to launching it publicly after Bluesky federation itself launches!

What inspired you to build Bridgy Fed?

The very first time I posted on Facebook, back over 20 years ago when it was just for college students, I immediately understood that I didn't control or own that space. I had no guarantees as to whether my profile and posts would stay there, who'd see them, etc. I started posting to my website/blog first, and only afterward copied those posts to social networks like Facebook.

I've been working on this stuff ever since, including tools like Granary and Bridgy classic and the IndieWeb community, historical decentralized social protocols like OpenSocial and OStatus, and most recently ActivityPub and Bluesky’s AT Protocol.

What tech stack is Bridgy Fed built on?

Bridgy Fed runs on Google's App Engine serverless platform. It's written in Python, uses libraries like Granary, and leverages standards like webmention and microformats2 in addition to ActivityPub and atproto. I'd eventually like to migrate it to asyncio, but otherwise its stack is serving it well.

What's in the future for Bridgy Fed?

I can't wait to launch Bluesky support! Nostr too. I'm also looking forward to extending the current IndieWeb support to any web site, using standard metadata like OGP and RSS and Atom feeds.


You can follow Ryan on Bluesky here, find the Bridgy Fed GitHub repo here, and keep an eye out for Bridgy Fed’s launch next year!

Note: Use an App Password when logging in to third-party tools for account security and read our disclaimer for third-party applications.

· 9 min read

One of the core principles of the AT Protocol is simple access to public data, including posts, multimedia blobs, and social graph metadata. A user's data is stored in a repository, which can be efficiently exported all together as a CAR file (.car). This post will describe how to export and parse a data repository.

A user's data repository consists of individual records, each of which can be accessed in JSON format via HTTP API endpoints.

The example code in this post is in the Go programming language, and uses the atproto SDK packages from indigo. You can find the full source code in our example cookbook GitHub repository.

This post is written for a developer audience. We plan on adding a feature for users to easily export their own data from within the app in the future.

Privacy Notice

While atproto data is public, you should take care to respect the rights, intents, and expectations of others. The following examples work for downloading any account's public data.

This goes beyond following copyright law, and includes respecting content deletions and block relationships. Images and other media content does not come with any reuse rights, unless explicitly noted by the account holder.

Download a Repository

On Bluesky's Main PDS Instance

You can easily construct a URL to download a repository on Bluesky's main PDS instance. In this case, the PDS host is https://bsky.social, the Lexicon endpoint is com.atproto.sync.getRepo, and the account DID is passed as a query parameter.

As a result, the download URL for the @atproto.com account is:

https://bsky.social/xrpc/com.atproto.sync.getRepo?did=did:plc:ewvi7nxzyoun6zhxrhs64oiz

Note that this endpoint intentionally does not require authentication: content in a user's repository is public (much like a public website), and anybody can download it from the web. Such content includes posts and likes, but does not include content like mutes and list subscriptions.

If you navigate to that URL, you'll download the repository for the @atproto.com account on Bluesky. But if you try to open that file, it won't make sense yet. We'll show you how to parse the data later in this post.

On Another Instance

In the more general case, we start with any "AT Identifier" (handle or DID). We need to find account's PDS instance. This involves first resolving the handle or DID to the account's DID document, then parsing out the #atproto_pds service entry.

The github.com/bluesky-social/indigo/atproto/identity package handles all of this for us already:

import (
"fmt"
"context"

"github.com/bluesky-social/indigo/atproto/identity"
"github.com/bluesky-social/indigo/atproto/syntax"
"github.com/bluesky-social/indigo/xrpc"
)

func main() {
run()
}

func run() error {
ctx := context.Background()
atid, err := syntax.ParseAtIdentifier("atproto.com")
if err != nil {
return err
}

dir := identity.DefaultDirectory()
ident, err := dir.Lookup(ctx, *atid)
if err != nil {
return err
}

if ident.PDSEndpoint() == "" {
return fmt.Errorf("no PDS endpoint for identity")
}

fmt.Println(ident.PDSEndpoint())
}

Once we know the PDS endpoint, we can create an atproto API client, call the getRepo endpoint, and write the results out to a local file on disk:

carPath := ident.DID.String() + ".car"

xrpcc := xrpc.Client{
Host: ident.PDSEndpoint(),
}

repoBytes, err := comatproto.SyncGetRepo(ctx, &xrpcc, ident.DID.String(), "")
if err != nil {
return err
}

err = os.WriteFile(carPath, repoBytes, 0666)
if err != nil {
return err
}

The go-export-repo example from the cookbook repository implements this with the download-repo command:

> ./go-export-repo download-repo atproto.com
resolving identity: atproto.com
downloading from https://bsky.social to: did:plc:ewvi7nxzyoun6zhxrhs64oiz.car

Now you know that the @atproto.com account's PDS instance is at bsky.social, and we've downloaded the repository's CAR file.

Parse Records from CAR File as JSON

What's a CAR File?

CAR files are a standard file format from the IPLD ecosystem. They stand for "Content Addressable aRchives." They have a simple binary format, with a series of binary (CBOR) blocks concatenated together, not dissimilar to tar files or Git packfiles. They're well-suited for efficient data processing and archival storage, but they're not the most accessible to developers.

The Repository CAR File

The repository data structure is a key-value store, with the keys being a combination of a collection name (NSID) and "record key", separated by a slash (<collection>/<rkey>). The CAR file contains a reference (CID) pointing to a (signed) commit object, which then points to the top of the key-value tree. The commit object also has an atproto repo version number (at time of writing, currently 3), the account DID, and a revision string.

Let's load the repository tree structure out of the CAR file in to memory, and list all of the record paths (keys).


import (
"encoding/json"
"context"
"fmt"
"os"
"path/filepath"

"github.com/bluesky-social/indigo/repo"
)
func carList(carPath string) error {
ctx := context.Background()
fi, err := os.Open(carPath)
if err != nil {
return err
}

// read repository tree in to memory
r, err := repo.ReadRepoFromCar(ctx, fi)
if err != nil {
return err
}

// extract DID from repo commit
sc := r.SignedCommit()
did, err := syntax.ParseDID(sc.Did)
if err != nil {
return err
}
topDir := did.String()

// iterate over all of the records by key and CID
err = r.ForEach(ctx, "", func(k string, v cid.Cid) error {
fmt.Printf("%s\t%s\n", k, v.String())
return nil
})
if err != nil {
return err
}
return nil
}

Note that the ForEach iterator provides a record path string as a key, and a CID as the value, instead of the record data itself. If we want to get the record itself, we need to fetch the "block" (CBOR bytes) from the repository, using the CID reference.

Let's also convert the binary CBOR data in to a more accessible JSON format, and write out records to disk. The following code snippet could go in the ForEach function in the previous example:

// where to write out data on local disk
recPath := topDir + "/" + k
os.MkdirAll(filepath.Dir(recPath), os.ModePerm)
if err != nil {
return err
}


// fetch the record CBOR and convert to a golang struct
_, rec, err := r.GetRecord(ctx, k)
if err != nil {
return err
}

// serialize as JSON
recJson, err := json.MarshalIndent(rec, "", " ")
if err != nil {
return err
}

if err := os.WriteFile(recPath+".json", recJson, 0666); err != nil {
return err
}

return nil

In the cookbook repository, the go-repo-export example implements these as list-records and unpack-record:

> ./go-export-repo list-records did:plc:ewvi7nxzyoun6zhxrhs64oiz.car
=== did:plc:ewvi7nxzyoun6zhxrhs64oiz ===
key record_cid
app.bsky.actor.profile/self bafyreifbxwvk2ewuduowdjkkjgspiy5li2dzyycrnlbu27gn3hfgthez3u
app.bsky.feed.like/3jucagnrmn22x bafyreieohq4ngetnrpse22mynxpinzfnaw6m5xcsjj3s4oiidjlnnfo76a
app.bsky.feed.like/3jucahkymkk2e bafyreidqrmqvrnz52efgqfavvjdbwob3bc2g3vvgmhmexgx4xputjty754
app.bsky.feed.like/3jucaj3qgmk2h bafyreig5c2atahtzr2vo4v64aovgqbv6qwivfwf3ex5gn2537wwmtnkm3e
[...]

> ./go-export-repo unpack-records did:plc:ewvi7nxzyoun6zhxrhs64oiz.car
writing output to: did:plc:ewvi7nxzyoun6zhxrhs64oiz
did:plc:ewvi7nxzyoun6zhxrhs64oiz/app.bsky.actor.profile/self.json
did:plc:ewvi7nxzyoun6zhxrhs64oiz/app.bsky.feed.like/3jucagnrmn22x.json
did:plc:ewvi7nxzyoun6zhxrhs64oiz/app.bsky.feed.like/3jucahkymkk2e.json
did:plc:ewvi7nxzyoun6zhxrhs64oiz/app.bsky.feed.like/3jucaj3qgmk2h.json
[...]

If you were downloading and working with CAR in a higher-stakes situation than just running a one-off repository export, you would probably want to confirm the commit signature using the account's signing public key (included in the resolved identity metadata). Signing keys can change over time, meaning the signatures in old repo exports will no longer validate. It may be a good idea to keep a copy of the identity metadata along side the repository for long-term storage.

Downloading Blobs

An account's repository contains all the current (not deleted) records. These records include likes, posts, follows, etc. and may refer to images and other media "blobs" by hash (CID), but the blobs themselves aren't stored directly in the repository. So if you want a full public account data export, you also need to fetch the blobs.

It is possible to parse through all the records in a repository and extract all the blob references (tip: they all have $type: blob). But PDS instances also implement a helpful com.atproto.sync.listBlobs endpoint, which returns all the CIDs (blob hashes) for a specific account (DID).

The com.atproto.sync.getBlob endpoint is used to download the original blob itself.

Neither of these PDS endpoints require authentication, though they may be rate-limited by operators to prevent resource exhaustion or excessive bandwidth costs.

Note that the first part of the blob download function is very similar to the CAR download: resolving identity to find the account's PDS:

func blobDownloadAll(raw string) error {
ctx := context.Background()
atid, err := syntax.ParseAtIdentifier(raw)
if err != nil {
return err
}

// resolve the DID document and PDS for this account
dir := identity.DefaultDirectory()
ident, err := dir.Lookup(ctx, *atid)
if err != nil {
return err
}

// create a new API client to connect to the account's PDS
xrpcc := xrpc.Client{
Host: ident.PDSEndpoint(),
}
if xrpcc.Host == "" {
return fmt.Errorf("no PDS endpoint for identity")
}

topDir := ident.DID.String() + "/_blob"
os.MkdirAll(topDir, os.ModePerm)

// blob-specific part starts here!
cursor := ""
for {
// loop over batches of CIDs
resp, err := comatproto.SyncListBlobs(ctx, &xrpcc, cursor, ident.DID.String(), 500, "")
if err != nil {
return err
}
for _, cidStr := range resp.Cids {
// if the file already exists, skip
blobPath := topDir + "/" + cidStr
if _, err := os.Stat(blobPath); err == nil {
continue
}

// download the entire blob in to memory, then write to disk
blobBytes, err := comatproto.SyncGetBlob(ctx, &xrpcc, cidStr, ident.DID.String())
if err != nil {
return err
}
if err := os.WriteFile(blobPath, blobBytes, 0666); err != nil {
return err
}
}

// a cursor in the result means there are more CIDs to enumerate
if resp.Cursor != nil && *resp.Cursor != "" {
cursor = *resp.Cursor
} else {
break
}
}
return nil
}

In the cookbook repository, the go-repo-export example implements list-blobs and download-blobs commands:

> ./go-export-repo list-blobs atproto.com
bafkreiacrjijybmsgnq3mca6fvhtvtc7jdtjflomoenrh4ph77kghzkiii
bafkreib4xwiqhxbqidwwatoqj7mrx6mr7wlc5s6blicq5wq2qsq37ynx5y
bafkreibdnsisdacjv3fswjic4dp7tju7mywfdlcrpleisefvzf44c3p7wm
bafkreiebtvblnu4jwu66y57kakido7uhiigenznxdlh6r6wiswblv5m4py
[...]

> ./go-export-repo download-blobs atproto.com
writing blobs to: did:plc:ewvi7nxzyoun6zhxrhs64oiz/_blob
did:plc:ewvi7nxzyoun6zhxrhs64oiz/_blob/bafkreiacrjijybmsgnq3mca6fvhtvtc7jdtjflomoenrh4ph77kghzkiii downloaded
did:plc:ewvi7nxzyoun6zhxrhs64oiz/_blob/bafkreib4xwiqhxbqidwwatoqj7mrx6mr7wlc5s6blicq5wq2qsq37ynx5y downloaded
did:plc:ewvi7nxzyoun6zhxrhs64oiz/_blob/bafkreibdnsisdacjv3fswjic4dp7tju7mywfdlcrpleisefvzf44c3p7wm downloaded
[...]

This will download blobs for a repository.

A more rigorous implementation should verify the blob CID (by hashing the downloaded bytes), at a minimum to detect corruption and errors.

· 2 min read

We believe that the future of social networking is open, and we want the AT Protocol, the open protocol that serves as the backbone to Bluesky’s open-source social network, to be your playground.

The AT Protocol is still in active development, but developers can already start building projects ranging from bots and clients to custom feeds, and eventually, entirely separate applications and experiences.

In this blog post, we’ll share what you can already build on atproto, and what you can expect soon.

Getting Started with the AT Protocol

In June of this year, we launched the federation sandbox environment for developers to begin exploring, as we prepare to launch federation in the production environment as well.

Currently, the AT Protocol is under active development, and the Bluesky beta app is a microblogging client built on top of it. We’re using the Bluesky beta to drive and inform protocol development.

While the production network isn’t federated quite yet, developers can already currently build:

  • Feed generators
    • Third-party developers have built algorithmic feeds, community-based feeds, topic-based feeds, and more that hundreds of thousands of users view every day.
    • Get started with our feed generator starter kit.
  • Bots
    • Over the last few months, developers have created MTA Alert bots, earthquake bots, chat bots, and other interactive experiences on the network.
    • Get started with a template for a Bluesky bot here.
  • Bluesky clients
    • Because the AT Protocol is open, third-party developers can create entirely separate frontends for the network too.
    • Here's a starter template for building a Bluesky client.

You can read about more third-party projects and check them out here. Let us know what you build!

What to Expect

We’ve still only scratched the surface of what’s possible on atproto. Currently, microblogging is the only content type that’s integrated, but we’re excited to for apps for forums, photo-centric platforms, long-form writing, and more to exist on atproto.

In the meantime, we’ve published a protocol roadmap here to give you a sense of our priorities and what’s coming in the future.

· 10 min read

This post lays out the current AT Protocol (atproto) development plan, through to a "version one" release. This document is written for developers already familiar with atproto concepts and terminology. The scope here is features of the underlying protocol itself, not any application or Lexicons on top of the protocol. In particular, this post doesn't describe any product features specific to the Bluesky microblogging application (app.bsky.* Lexicons).

At a high level:

  • We are pushing towards federation on the production network early next year (2024), if development continues as planned.
  • A series of related infrastructure and operational changes to Bluesky services are underway to ensure we can handle a large influx of content from federated instances in a resource and cost-efficient way.
  • We plan to submit future governance and formalization of the core protocol to an independent standards body. These are consensus processes relying on interest and support of a community of volunteers, and we have already done some initial outreach, but we expect to start conversations in earnest around the same time as opening up federation.

Feel free to share your thoughts via Github Discussions here.

Getting to Federation

The AT Protocol is designed as a federated social web protocol. While we have an open federated sandbox network for developers to experiment with, the main Bluesky services are currently still not federated. Our current development priority is to resolve any final protocol issues or decisions that would impede federation with independent PDS instances. Some of these points are listed below.

Account Migration: One of the differentiating features of atproto is seamless migration of both identity and public content from one PDS instance to another. The foundations for this feature were set in place from the start, but there are some details and behaviors to decide on and document.

Auth refactor: We want to improve both third-party auth flows (eg, OAuth2), and to support verifiable inter-service requests (eg, with UCANs). These involve both authentication (“who is this”) and authorization (“what is allowed”). This work will hopefully be a matter of integrating and adapting existing standards. There is a chance that this will be ready in time for federation, but it is not a hard requirement.

Repo Event Stream (Firehose) Iteration: A healthy ecosystem of projects has subscribed to the full stream of repo commits that the firehose provides. We have a few improvements and options in mind to help with efficiency and developer experience, without changing the underlying semantics and power dynamic (aka, the ability to replicate all public content). These include "epochs" of sequence numbers; sharding; the option to strip MST nodes ("proof" chain) to reduce bandwidth; and changes to account lifecycle events.

Third-party Labeling: This is the protocol-level ability to distribute and subscribe to labels from many sources, including cryptographic signatures for verification. The lexicons are mostly designed, but are not included in the AppView reference implementation, and documentation is needed.

Account Status Propagation: Actions like account takedowns and (self) deletions don't currently get broadcast to other parties in the network, though they can be publicly observed (i.e., content becomes inaccessible). Some form of explicit signal would likely be helpful. Additionally, we’d like to allow temporary account deactivation as an option, instead of only account deletion. Temporary deactivation will have behavior similar to an account takedown, with effects propagating through the network.

Refactor Moderation Actions: The com.atproto.admin.* Lexicons have a concept of "actions" which "resolve" specific "reports," which has become rigid to work with. A refactor will have more flexible "events" (including both reports and private flags or annotations) which update "subjects." The moderation report submission process should not be impacted, but other admin Lexicons may have breaking changes.

Legacy Record Cleanup: We have a short remaining pre-federation window during which we could, in theory, rewrite any records to remove legacy or invalid data. Specifically, we could try to remove all instances of the legacy "blob" schema, and fix many invalid datetimes. This would break a large number of strong references between records, in a way that is difficult to clean up, and would be an aggressive intervention to existing repository content, so there is a good chance this won't actually happen and these crufty records will be in the network forever.

Operational Changes for Federation

A few infrastructure changes are planned for coming weeks and months that will impact other folks in the ecosystem. We will try to communicate these ahead of time, but are balancing disruption to the existing developer and service ecosystem with federating the network quickly.

Multiple PDS instances: The Bluesky PDS (bsky.social) is currently a monolithic PostgreSQL database with over a million hosted repositories. We will be splitting accounts across multiple instances, using the protocol itself to help with scaling. Depending on details, this will probably not be very visible to end users or client developers, but will impact firehose consumers and some developers.

BGS Firehose: Most folks currently subscribe to the firehose coming from the bsky.social PDS directly. We have had a BGS instance running in production for some time, but it has not been promoted and has had occasional downtime. With the move to multiple PDSes, it will still technically be possible to subscribe to multiple individual PDS feeds, but most folks will want to shift to the unified BGS feed instead. This will involve a change in hostname, and the sequence numbers will be different.

Possible bulk did:plc updates: We are not yet certain when, but there is a good chance that DID documents will need to be updated for all (or a large fraction) of bsky.social accounts. This identity churn will be in-spec, but could cause disruptions to other services if they are not prepared.

Search updates: The search.bsky.social service, which was never a formal part of the Bluesky API (Lexicons) will be deprecated soon, with both actor (profile) and post search possible through app.bsky.* Lexicons, served by the AppView (and proxied by the PDS).

Spam mitigation systems: Anti-spam efforts will be a crucial part of opening federation. It is not a protocol-level change per-say, but we expect to deploy automated systems to combat spam by labeling posts and accounts at scale.

Network Growth Control: We don't have a specific proposal for feedback from the ecosystem yet, but expect to have some form of resource use limits in place as we open federation to prevent the social graph, firehose consumers, and human and technical systems from being overwhelmed.

New Application Development

Atproto has had a clear system for third-party application development and application-layer protocol extension built in from the beginning: the Lexicon schema system. The current PDS reference implementation restricts record creation for unknown schemas, even when the "skip validation" query parameter is set, but we expect to relax that constraint in coming months.

There is a lot of supporting framework and documentation needed to make atproto a developer-friendly foundation for application development, and a few protocol pieces still need to be worked out:

Lexicon schema resolution: There should be a clear automated method for resolving new and unknown NSIDs (schema names) to a Lexicon JSON object over the web.

Clarify validation failure behaviors: What each piece of infrastructure is expected to do when records fail to validate against the Lexicon schema needs to be specified in detail, with test cases provided.

Lexicon schema evolution and extension: The backwards-compatibility rules for changes to application Lexicon schemas are mostly in place, but they’re not well documented. There is also wiggle room for third parties to inject extension fields into third-party records, and the norms and best practices for this type of extension need documentation.

Further Down the Road

It is worth mentioning a few parts of the protocol that we do not expect to work on in the near future. We think these are pretty important (which is why we are mentioning them!), but our current priority is federation.

Private content and messaging (DMs): We intend to eventually incorporate group-private E2EE (end-to-end encryption) content in the protocol, as well as a robust E2EE messaging solution. These are important and rightfully expected features. We expect this to be a major additional component of atproto, not a simple bolt-on to the current public repository data structure, even if we adopt an existing standard like MLS, Matrix, or the Signal protocol. As a result, this will be a large body of integration work, and will not be a focus until after federation.

Record versioning: The low-level protocol allows updates to existing records. It should be possible to store "old" versions of records in the repo, allowing (optional) transparency into history and edits. This may involve additions to repository paths, AT URIs, and core repo read and update endpoints.

Floating point numbers: The data model currently only supports integers, not floats. The concerns around floats in content-addressed systems are well documented, and there are alternatives and work-arounds (like string encoding). But if an elegant solution presents itself, it would be nice to support floats as a first-class data type.

Static CAR file repository hosting: For many use cases, it would be convenient if atproto repositories could be served as static files (probably CAR files), instead of requiring a full-featured PDS instance.

Protocol Governance

Development of atproto to date has been driven by a single company, Bluesky PBC. Once the network opens to federation, protocol changes and improvements will still be necessary, but will impact multiple organizations, communities, and individuals, each with separate priorities and development interests. If the protocol is successful, there certainly will be disagreements and competitive tensions at play. Having a single company controlling the protocol will not work long-term.

The plan is to bring development and governance of the protocol itself to an established standards body around the time the network opens to federation. Our current hope is to bring this work to the IETF, likely as a new working group, which would probably be a multi-year process. If the IETF does not work out as a home, we will try again with other bodies. While existing work can be proposed exactly “as-is", it is common to have some evolution and breaking changes come out of the standardization processes.

Which parts of the protocol would be in-scope for independent standardization? Like many network protocols, atproto is abstracted into multiple layers, with distinctions between "application-specific" bits and the core protocol. The identity, auth, data model, repositories, streams, and Lexicon schema language are all part of the core protocol, and in-scope for standardization. Some API endpoints under the com.atproto.* namespace are also relatively core, and that namespace specifically could fall under independent governance.

Other application-specific APIs and namespaces, including the app.bsky.* microblogging application, would not be in-scope for standardization as part of the core protocol. The authority for these APIs and Lexicons is encoded in the name itself, and Bluesky PBC intends to retain control over future development of the bsky.app application. The API schemas (Lexicons) will be open, and the expectations and rules for third-party interoperability and extensions for any Lexicon will be part of the core protocol specification.

· 3 min read

We have a number of protocol and infrastructure changes rolling out in the next three months, and want to keep everybody in the loop.

This update was also emailed to the developer mailing list, which you can subscribe to here.

TL;DR

  • As of this week, the Bluesky AppView instance now consumes from a Bluesky BGS, instead of directly from the PDS. Devs can access the current streaming API at https://bsky.network/xrpc/com.atproto.sync.subscribeRepos or for WebSocket directly, wss://bsky.network/xrpc/com.atproto.sync.subscribeRepos
    Your existing cursor for bsky.social will not be in sync with bsky.network, so check the live stream first to grab a recent seq before connecting!
  • We are updating the DID document public key syntax to “Multikey” format next week on the main network PLC directory (plc.directory). This change is already live on the sandbox PLC directory.

How will this affect me?

  • For today, if you're consuming the firehose, grab a new cursor from bsky.network and restart your firehose consumer pointed at bsky.network.

Bluesky BGS

The Bluesky services themselves are moving to a federated deployment, with multiple Bluesky (the company) PDS instances aggregated by a BGS, and the AppView downstream of that. As of yesterday, the Bluesky Appview instance (api.bsky.app) consumes from a Bluesky PBC BGS (bsky.network), which consumes from the Bluesky PDS (bsky.social). Until now, the AppView consumed directly from the PDS.

How close are we to federation?

Technically, the main network BGS could start consuming from independent PDS instances today, the same as the sandbox BGS does. We have configured it not to do so until we finish implementing some more details, and do our own round of security hardening. If you want to bang on the BGS implementation (written in Go, code in the indigo github repository), please do so in the sandbox environment, not the main network.

This change impacts devs in two ways:

  • In the next couple weeks, new Bluesky (company) PDS instances will appear in the main network. Our plan is to optionally abstract this away for most client developers, so they can continue to connect to bsky.social as a virtual PDS. But the actual PDS hostnames will be distinct and will show up in DID documents.
  • Firehose consumers (feed generators, mirrors, metrics dashboards, etc) will need to switch over and consume from the BGS instead of the PDS directly. If they do not, they will miss content from the new (Bluesky) PDS instances.

The firehose subscription endpoint, which works as of today, is https://bsky.network/xrpc/com.atproto.sync.subscribeRepos (or wss:// for WebSocket directly). Note that this endpoint has different sequence numbers. When switching over, we recommend folks consume from both the BGS and PDS for a period to ensure no events are lost, or to scroll back the BGS cursor to ensure there is reasonable overlap in streams.

We encourage developers and operators to switch to the BGS firehose sooner than later.

DID Document Formatting Changes

We also want to remind folks that we are planning to update the DID document public key syntax to “Multikey” format next week on the main network PLC directory (plc.directory). These changes are described here, with example documents for testing, and are live now on the sandbox PLC directory.