Labels

Labels are a form of metadata about any account or content in the atproto ecosystem.

They exist as free-standing, self-authenticated data objects, though they are also frequently distributed as part of API responses (in which context the signatures might not be included). Additionally, label "values" may be directly embedded in records themselves ("self-labels").

Labels primarily consist of a source (DID), a subject (URI), and value. The value is a short string, similar to a tag or hashtag, which presumably has pre-defined semantics shared between the creator and consumer of the label. Additional metadata fields can give additional context, but at any point of time there should be only one coherent set of metadata for the combination of source, subject, and value. If there are multiple sets of metadata, the created-at timestamp is used to clarify which label is current.

The label concept and protocol primitive is flexible in scope, use cases, and data transport. One of the original design motivations was to enable some forms of composable moderation, with labels generated by moderation services. But labels are not moderation-exclusive, and may be reused freely for other purposes in atproto applications.

The core label schema is versioned, and this document describes labels version 1.

Schema and Data Model

Labels are protocol objects, similar to repository commits or MST nodes. They are canonically encoded as DAG-CBOR (a strict, normalized subset of CBOR) for for signing (see sections below). There is a Lexicon definition (com.atproto.label.defs#label) which represents labels, but it has slightly different field requirements than the core protocol object: the version field (ver) and signature (sig) are both optional in that version.

The fields on the label object are:

ver (integer, required): label schema version. Current version is always 1.
src (string, DID format, required): the authority (account) which generated this label
uri (string, URI format, required): the content that this label applies to. For a specific record, an at:// URI. For an account, the did:.
cid (string, CID format, optional): if provided, the label applies to a specific version of the subject uri
val (string, 128 bytes max, required): the value of the label. Semantics and preferred syntax discussed below.
neg (boolean, optional): if true, indicates that this label "negates" an earlier label with the same src, uri, and val.
cts (string, datetime format, required): the timestamp when the label was created. Note that timestamps in a distributed system are not trustworthy or verified by default.
exp (string, datetime format, optional): a timestamp at which this label expires (is not longer valid)
sig (bytes, optional): cryptographic signature bytes. Uses the bytes type from the Data Model, which encodes in JSON as a $bytes object with base64 encoding

When labels are being transferred as full objects between services, the ver and sig fields are required.

If the neg field is false, best practice is to simply not include the field at all.

The use of short three-character field names is different from most parts of atproto, and aligns more closely with JWTs. The motivation for this difference is to minimize the size of labels when many of them are included in network requests and responses.

Value

The val field is core to the label. To keep the protocol flexible, and allow future development of ontologies, norms, and governance structures, very little about the semantics, behavior, and known values of this string are specified here.

The current expectation is that label value strings are "tokens" with fixed vocabulary. They are similar to hashtags.

At this time, we strongly recommend against the following patterns:

packing additional structure in to value fields. for example, base64-encoded data, key/value syntax, lists or arrays of values, etc
encoding arbitrary numerical values (eg, "scores" or "confidence")
using punctuation characters (like ., :, ;, #, _, ', >, or others) to structure the namespace of labels
using URLs or URIs in values
use of any whitespace
use of non-ASCII characters, including emoji

These are all promising ideas, but we hope to coordinate and more formally specify this sort of syntax extension.

One current convention is to use a bang punctuation character (!) as a prefix for system-level labels which specify an expected behavior on a subject, but don't describe the content or indicate a reason for the behavior. For example, !warn as a behavior, as opposed to scam as a descriptive label which might result in the same warning behavior.

The behavior, definition, meaning, and policies around labels are generally communicated elsewhere. The value does not need to be entirely descriptive.

Recommended String Syntax

The current recommended syntax for label strings is lower-case kebab-syntax (using - internally), using only ASCII letters. Specifically:

lower-case alphabetical ASCII letters (a to z)
dash (-) used for internal separation, but not as a first or last character
no other punctuation or whitespace
128 bytes maximum length. Shorter is better (try to keep labels to a couple dozen characters at most), while still being somewhat descriptive.

Label Lifecycle: Negation and Expiration

Labels are generally broadcast and persisted internally by receiving services. Some services may bulk re-broadcast or re-distribute labels to downstream services. They may ignore and drop any labels which are not relevant to their use-case. They may "hydrate" labels in to requests from clients.

When hydrating labels, services should generally only include "active" and relevant labels.

If the authoritative creator of a label wishes to retract or remove the label, they do so by publishing a new label with the same source, subject, and value, but with the negated field (neg) set to true, and a current timestamp (later than any previous timestamps). A negation label does not mean that the inverse of the label is “true”, only that the previous label has been retracted. For example, a label with value spam and neg true does not mean the subject is not spam, only that a previous spam label should be disregarded.

Receiving services that encounter a valid negation label may store the negation internally, and may re-broadcast the negation, but should not hydrate the negated label in API responses.

Likewise, may continue to persist expired labels (after the expiration timestamp), but should not continue to hydrate them in API responses.

Signatures

Labels are signed using public-key Cryptography, similar to repository commit objects. Signatures should be validated when labels are transferred between services. It is assumed that most end-clients will not validate signatures themselves, and signatures may be removed from API responses sent to clients for network efficiency. Clients and other parties should have a mechanism to verify signatures, by querying individual signatures from labeling authorities, and receiving back the full label, including signature.

The process to sign or verify a signature is to construct a complete version of the label, using only the specified schema fields, and not including the sig field. This means including the ver field, but not any $type field or other un-specified fields which may have been included in a Lexicon representation of the label. This data object is then encoded in CBOR, following the deterministic IPLD/DAG-CBOR normalization rules. The CBOR bytes are hashed with SHA-256, and then the direct hash bytes (not a hex-encoded string) are signed (or verified) using the appropriate cryptographic key. The signature bytes are stored in the sig field as bytes (see Data Model for details representing bytes).

The key used for signing labels is found in the DID document for the issuing identity, and has fragment identifier #atproto_label. This key may have the same value as the #atproto signing key used for repository signatures. At this time, if an #atproto_label key is not found, implementation should not attempt to use other keys present in the DID document to verify the signature, they should simply consider the signature invalid and ignore the label.

Signature Lifecycle

Signatures are verified at the time the label is received. They do not need to be re-verified before hydration in to API responses.

Signing key rotation can be difficult and disruptive for a large labeling service. The rough mechanism for doing a rotation is:

labeling services should persist signatures alongside labels, and also persist an indicator of which key was used to sign the label
when starting a rotation, pause creation and signing of new signatures
update the DID document with the new key
resume signing new labels with the new key
when servicing queries for old labels, check which key was used for signing. if out of date, re-sign and persist the signatures for that batch of labels. the created-at timestamp should not be changed.
older labels in the label event stream backfill period may have invalid signatures; this is acceptable

When encountering a label with an invalid signature, a good practice is to re-resolve the issuer identity (DID document) and check if there is an updated signing key. If there is, validation should be retried.

A downstream service can decide for themselves whether to bulk query and receive updated signatures when an upstream key rotation has occurred; or to fetch updated signatures on demand; or to consider old labels still valid even if the signature no longer validates against the current label signing key.

Self-Labels in Records

One Lexicon design pattern is to include an array of label values inside a record. Downstream clients can interpret these as "self-labels", similarly to labels coming from external sources.

Note that the repository data storage mechanism provides context and lifecycle support similar to a full label object:

the source of the label is the account controlling the repository
the subject is the record itself, or possibly the overall repository account (depending on context)
the CID is the current version of the record
negation and expiration are not necessary, as the record can be deleted or updated to change the set of labels
the created-at timestamp would be the same as the record itself (eg, via a createdAt field)
authenticity (signature) is provided by the repository commit signing mechanism

Labeler Service Identity

Labeler services each have a service identity, meaning a DID document. This is the DID that appears in label source (src) field.

The DID document will also have a key used for signing labels (with ID #atproto_label; see above for signature details), and a service endpoint (with ID #atproto_labeler and type AtprotoLabeler) which indicates the server URL (the URL includes method, hostname, and optional port, but no path segment at this time). Note that in real world use-cases, use of HTTPS on the default port (443) is strongly recommended and may be required by service operators.

Depending on the application, the identity may also have an atproto repository containing a “declaration record” which describes application-specific context about the labeler. This may be required for integration with a specific application, client, or AppView, but is not a requirement at the base atproto level.

Label Distribution Endpoints

Two Lexicon endpoints are defined for labeler services to distribute labels:

com.atproto.label.subscribeLabels: an event stream (WebSocket) endpoint, which broadcasts new labels. Implements the seq backfill mechanism, similar to repository event stream, but with some small differences: the “backfill” period may extend to cursor=0 (meaning that the full history of labels is available via the stream). Labels which have been redacted have the original label removed from the stream, but the negation remains.

com.atproto.label.queryLabels: a flexible query endpoint. Can be used to scroll over all labels (using a cursor parameter), or can filter to labels relating to a specific subject.

Note that unlike public repository content, labels are not required to be publicly enumerable. It is acceptable for labeler services to make all labels publicly available using these endpoints, or to require authorization and access control, or to not implement these endpoints at all (if they have another mechanism for distributing labels).

Labeler HTTP Headers

Labels are often “hydrated” in to HTTP API responses by atproto services, such as AppViews. To give clients control over which label sources they want included, two special HTTP headers are used, which PDS implementations are expected to pass-through when proxying requests:

atproto-accept-labelers: used in requests. A list of labeler service DIDs, with optional per-DID flags.

atproto-content-labelers: used in responses. Same content, syntax, and semantics as the accept header, but indicates which labelers could actually be queried. Presence in this header doesn’t mean that any labels from a given DID were actually included, only that it they would have been if such labels existed.

The syntax of these headers follows IETF RFC-8941 (”Structured Field Values for HTTP”), section 3.1.2 (”Parameters”). Values are separated by comma (ASCII , character), and values from repeated declaration of the header should be merged in to a single list. One or more optional parameters may follow the item value (the DID), separated by a semicolon (ASCII ; character). For boolean parameters, the full RFC syntax (just as param=?0 for false) is not currently supported. Instead, the presence of the parameter indicates it is “true”, and the absence indicates “false”. No other parameter values types (such as integers or strings) are supported at this time.

The only currently supported parameter is the boolean parameter redact. This flag indicates that the service hydrating labels should handle the special protocol-level label values !takedown and !suspend by entirely redacting content from the API response, instead of simply labeling it. This may result in an application-specific tombstone entry, which might indicate the Labeler responsible for the redaction, or could result in the content being removed without a tombstone.

Complete example syntax for these headers:

# on a request
atproto-accept-labelers: did:web:mod.example.com;redact, did:plc:abc123, did:plc:xyz789

# on a response:
atproto-content-labelers: did:web:mod.example.com;redact, did:plc:abc123, did:plc:xyz789

If the syntax of the request header is invalid or can not be parsed, the service should return an error instead of ignoring the header.

If a labeler DID is repeated in the header, parameters should be combined from each instance. For example, if a DID is included once with redact and once without, the service should treat this the same as if the DID was included once, with redact. The atproto-content-labelers response header should represent how the request header was de-duplicated and interpreted.

If the request header is not supplied at all, the service may substitute a default. This is distinct from supplying the header with no value, in which case the service should not hydrate or apply any labels.

If any labeler DID is indicated with correct syntax, but the identity does not exist; does not include labeler service or key entries in the DID doc; has been taken down at the service level; or is otherwise inactive or non-functional; then that labeler should not be included in the atproto-content-labelers response header, but does not need to be treated as an error.

A service implementation may decide, as a policy matter, that specific conditions must be met or the request will error. For example, that a specific labeler DID must be included; or a minimum or maximum number of labelers can be included; or a minimum or maximum number of labelers with redact are included.

Security Considerations

Note that there is no "domain differentiation" of the signature, meaning that there is potential security risk of signing a label which is also a valid object (and signature) in an entirely different context, like an authentication bearer token. This makes it important to ensure that no additional or unexpected fields are included in the object that is being signed.

Usage and Implementation Guidelines

It is strongly recommended to stick to the “recommended string syntax” for label values at this time.

Possible Future Changes

More mature governance, namespacing, and style guide recommendations on label values.