Work in progress

This document is not complete. Please check back soon for updates.

Authenticated Transfer Protocol#

Glossary#

  • Client: The application running on the user's device. Interacts with the network through a PDS.
  • Personal Data Server (PDS): A server hosting user data. Acts as the user's personal agent on the network.
  • Name server. A server mapping domains to DIDs via the com.atproto.handle.resolve() API. Often a PDS.
  • Crawling indexer. A service that is crawling the server to produce aggregated views.

Wire protocol (XRPC)#

ATP uses a light wrapper over HTTPS called XRPC. XRPC uses Lexicon, a global schema system, to unify behaviors across hosts. The atproto.com lexicons enumerate all XRPC methods used in ATP.

Identifiers#

The following identifiers are used in ATP:

Identifier Usage
Domain names A unique global identifier which weakly identify repositories.
DID A unique global identifier which strongly identify repositories.
NSID A unique global identifier which identifies record types and XRPC methods.
TID A timestamp-based ID which identifies records.

Domain names#

Domain names (aka "handles") weakly identify repositories. They are a convenience which should be used in UIs but rarely used within records to reference data as they may change at any time. The repo DID is preferred to provide a stable identifier.

DIDs#

DIDs are unique global identifiers which strongly identify repositories. They are considered "strong" because they should never change during the lifecycle of a user. They should rarely be used in UIs, but should always be used in records to reference data.

ATP supports two DID methods:

  • Web (did:web). Should be used only when the user is "self-hosting" and therefore directly controls the domain name & server. May also be used during testing.
  • Placeholder (did:plc). A method developed in conjunction with ATP to provide global secure IDs which are host-independent.

DIDs resolve to "DID Documents" which provide the address of the repo's host and the public key used to sign the repo's updates.

Timestamp IDs (TID)#

Describe TIDs

URI scheme#

ATP uses the at:// URI scheme (specified here). Some example at URLs:

Repository at://alice.host.com
Repository at://did:plc:bv6ggog3tya2z3vxsub7hnal
Collection at://alice.host.com/io.example.song
Record at://alice.host.com/io.example.song/3yI5-c1z-cc2p-1a
Record Field at://bob.com/io.example.song/3yI5-c1z-cc2p-1a#/title

Schemas#

ATP uses strict schema definitions for XRPC methods and record types. These schemas are identified using NSIDs and defined using Lexicon.

Repositories#

A "repository" is a collection of signed records.

It is an implementation of a Merkle Search Tree (MST). The MST is an ordered, insert-order-independent, deterministic tree. Keys are laid out in alphabetic order. The key insight of an MST is that each key is hashed and starting 0s are counted to determine which layer it falls on (5 zeros for ~32 fanout).

This is a Merkle tree, so each subtree is referred to by its hash (CID). When a leaf is changed, every tree on the path to that leaf is changed as well, thereby updating the root hash.

Repo data layout#

Provide a more detailed description of the data layout and how the MST is organized.

The repository data layout establishes the units of network-transmissible data. It includes the following three major groupings:

Grouping Description
Repository Repositories are the dataset of a single "user" in the ATP network. Every user has a single repository which is identified by a DID.
Collection A collection is an ordered list of records. Every collection is identified by an NSID. Collections only contain records of the type identified by their NSID.
Record A record is a key/value document. It is the smallest unit of data which can be transmitted over the network. Every record has a type and is identified by a TID.

Every node is an IPLD object (dag-cbor to be specific) which is referenced by a CID hash.

Node Type Description
Signed Root ("commit") The Signed Root, or “commit”, is the topmost node in a repo. It contains:
  • root The CID of the Root node.
  • sig A signature.
Root The Root node contains:
  • did The DID of this repository.
  • prev The CID(s) of the previous commit node(s) in this repository’s history.
  • data The Merkle Search Tree topmost node.
  • auth_token The jwt-encoded UCAN that gives authority to make the write which produced this root.
MST Node The Merkle Search Tree Nodes contain:
  • l (Optional) The CID of the leftmost subtree.
  • e An array of MST Entries.
MST Entry The Merkle Search Tree Entries contain:
  • p Prefix count of utf-8 chars that this key shares with the prev key.
  • k The rest of the key outside the shared prefix.
  • v The CID of the value of the entry.
  • t (Optional) The CID of the next subtree (to the right of the leaf).

Repo encodings#

All data in the repository is encoded using CBOR. The following value types are supported:

null A CBOR simple value (major type 7, subtype 24) with a simple value of 22 (null).
boolean A CBOR simple value (major type 7, subtype 24) with a simple value of 21 (true) or 20 (false).
integer A CBOR integer (major type 0 or 1), choosing the shortest byte representation.
float A CBOR floating-point number (major type 7). All floating point values MUST be encoded as 64-bits (additional type value 27), even for integral values.
string A CBOR string (major type 3).
list A CBOR array (major type 4), where each element of the list is added, in order, as a value of the array according to its type.
map A CBOR map (major type 5), where each entry is represented as a member of the CBOR map. The entry key is expressed as a CBOR string (major type 3) as the key.
Are we missing value types? Binary? CID/Link?

Repo CBOR normalization#

Describe normalization algorithm

Repo records#

Repo records are CBOR-encoded objects (using only JSON-compatible CBOR types). Each record has a "type" which is defined by a lexicon. The type defines which collection will contain the record as well as the expected schema of the record.

ATP uses dollar ($) prefixed fields as system fields. The following fields are given a system-meaning:

Field Usage
$type Declares the type of a record. (Required)
$ext Contains extensions to a record's base schema.
$required Used by extensions to flag whether their support is required.
$fallback Used by extensions to give a description of the missing data.

Client-to-server API#

The client-to-server API drives communication between a client application and the user's PDS. The APIs are dictated by the lexicons implemented by the PDS. It's recommended that every PDS support the full atproto.com lexicon. Application-level lexicons such as bsky.app are also recommended.

Authentication#

Describe how the client authenticates with the PDS. (It's a simple JWT-based session.)

ATP core lexicon#

The com.atproto.* lexicons provides the following behaviors:

Additional lexicons#

For ATP to be practically useful, it needs to support a variety of sophisticated queries and behaviors. While these sophisticated behaviors could be implemented on the user device, doing so would perform more slowly than on the server. Therefore, the PDS is expected to implement lexicons which provide higher-level APIs. The reference PDS created by Bluesky implements the bsky.app lexicon.

Server-to-server API#

The server-to-server APIs enable federation, event delivery, and global indexing. They may also be used to provide application behaviors such as mail delivery and form submission.

Authentication#

Describe how servers may authenticate with each other

See what's next.Join the private beta.

The AT Protocol will launch soon.
Join the waitlist to try the beta before it's publicly available.

Join the waitlist