Lexinomicon
Here are some recommended conventions and best practices for designing Lexicon schemas.
Name casing conventions:
- Schemas & attributes: Use
lowerCamelCasecapitalization for schemas and names (as opposed toUpperCamelCase,snake_case,ALL_CAPS, etc). - API error names:
UpperCamelCase - Fixed strings (eg
knownValues):kebab-case
Acceptable characters:
- Field names should stick to the same character set as schema names (NSID name segments): ASCII alphanumeric, first character not a digit, no hyphens, case-sensitive
- Exceptions may be justifiable in some situations, such as preservation of names in existing external schemas
- Data objects should never contain schema-specified field names starting with
$at any level of nesting; these are reserved for future protocol-level extensions
Naming conventions:
- Use singular nouns for
recordschemas- eg
post,like,profile
- eg
- Use “verb-noun” for
queryandprocedureendpoints- eg
getPost,listLikes,putProfile - Common verbs for
queryendpoints are:get,list,search(for full-text search),query(for flexible matching or filtering filtering) - Common verbs for
procedureendpoints:create,update,delete,upsert,put
- eg
- Use “subscribe-plural-noun” for
subscription- eg
subscribeLabels
- eg
- Conventions for
permission-setschema naming has not be established yet, but probably has “auth” prefix (eg,authBasic) - If an endpoint is experimental, unstable, or not intended for interoperability, indicate that in the NSID name
- eg, include
.temp.or.unspecced.in the NSID hierarchy
- eg, include
- Avoid generic names which conflict with popular programming language conventions
- eg, avoid using
defaultorlengthas schema names
- eg, avoid using
Documentation and Completeness:
- Add a description to every
mainschema definition (records, API endpoints, etc)- for API endpoints, mention in the description if authentication is required, and whether responses will be personalized if authentication is optional
- Add descriptions to potentially ambiguous fields and properties. This is particularly important for fields with generic names like
uriorcid: CID of what?
NSID namespace grouping:
- Many applications and projects will have multiple distinct functions or features, and schemas of all types can have that grouping represented in the NSID hierarchy
- eg
app.bsky.feed.*,app.bsky.graph.*
- eg
- Very simple applications can include all endpoints under a single NSID “group”
- use a
.defsschema for definitions which might be reused by multiple schemas in the same namespace, or by third parties- eg
app.bsky.feed.defs - putting these in a separate schema file means that deprecation or removal of other schema files doesn’t impact reuse
- eg
- Avoid conflicts and confusion between groups, names, and definitions
- eg
app.bsky.feed.post#mainvsapp.bsky.feed.post.main, orcom.example.record#fooandcom.example.record.foo - or defining both
app.bsky.feed(as a record) andapp.bsky.feed.post(withapp.bsky.feedas a group)
- eg
Other guidelines:
- Specify the format of string fields when appropriate
- String fields in records should almost always have a maximum length if they don’t have a format type
- Don’t redundantly specify both a format and length limits
- If limiting the length of a string for semantic or visual reasons, grapheme limits should be used to ensure a degree of consistency across human languages. A data size (bytes) limit should also be added in these cases. A ratio of between 10 to 20 bytes to 1 grapheme is recommended.
- The string and bytes record data types are intended for constrained data size use-cases. For text or binary data of larger size, blob references should be used. This can include longer-form text and structured data.
- Enum sets are “closed” and can not be updated or extended without breaking schema evolution rules. For this reason they should almost always be avoided.
- For strings,
knownValuesprovides more flexible alternative
- For strings,
- String
knownValuesmay include simple string constants, or may include schema references to atoken(eg, the string"com.example.defs#tokenOne")- Tokens provide an extension mechanism, and work well for values that have subjective definitions or may be expanded over time
- See
com.atproto.moderation.defs#reasonTypeandcom.atproto.sync.defs#hostStatusfor two contrasting instances, the former extensible and the later more constrained
- Take advantage of re-usable definitions, such as
com.atproto.repo.strongRef(for versioned references to records) orcom.atproto.label.defs#label(in an array, for hydrated labels) - API endpoints which take an account identifier as an argument (eg, query parameter) should use
at-identifierso that clients can avoid callingresolveHandleif they only have an account handle - Record schemas should always use persistent identifiers (DIDs) for references to other accounts, instead of handles
- API endpoints should always specify an
outputwithencoding, even if they have no meaningful response data- a good default is
application/jsonwith the schema being an object with no defined properties
- a good default is
- Optional
booleanfields should be phrased such thatfalseis the default and expected value- For example, if an endpoint can return a mix of “foo” and “bar”, and the common behavior is to include “foo” but not “bar”, then controlling parameters should be named
excludeFoo(defaultfalse) andincludeBar(defaultfalse), as opposed toexcludeBar(defaulttrue)
- For example, if an endpoint can return a mix of “foo” and “bar”, and the common behavior is to include “foo” but not “bar”, then controlling parameters should be named
- Content hashes (CIDs) may be represented as a string format or in binary encoding (
cid-link)- In most situations, including versioned references between records, the string format is recommended.
- Binary encoding is mostly used for protocol-level mechanisms, such as the firehose.
Schema Evolution and Extension
All schemas should be flexible to extension and evolution over time, without breaking the Lexicon schema evolution rules. This is particularly true for record schemas. Given the distributed storage model of atproto, developers do not have a reliable mechanism to update all data records in the network. Extensions could come from the original designer, or other developers and projects.
Experimental schemas and projects can use variant NSIDs (eg, including .temp. in the name hierarchy) to develop in the live network without committing to a stable record data schemas.
Major non-backwards-compatible schema changes are possible by declaring a new schema. The current naming convention is to append “V2” to the original name (or “V3”, etc).
Design recommendations to make schemas flexible to future evolution and extension:
- do not mark data fields or API parameters as
requiredunless they are truly required for functionalityrequiredfields can not be made optional or deprecated under the evolution rules
- you can add new
optionalfields to a schema without changing backwards compatibility or requiring a V2 schema, but you can’t add newrequiredfields - use object types containing a single element/field instead of atomic data types in arrays, to allow additional context to be included in the future
- for example, in an API response listing accounts (DIDs), return an array of objects each with an
accountfield listing the DID, instead of an array of strings
- for example, in an API response listing accounts (DIDs), return an array of objects each with an
- make unions “open” in almost all situations, to allow future addition of types or values
- open unions can be an extension mechanism for third parties to include self-defined data types
Design Patterns
- There is a basic convention for pagination of
queryAPI endpoints:- query parameters include an optional
limit(integer) and optionalcursor(string) - the output body includes optional
cursor(string) and a required array of response objects (with context-specific pluralized field name) - the initial client request does not define a
cursor. If the response includes acursor, then more results are available, and the client should query again with the newcursorto get more results - the
limitvalue is an upper limit, and the response may include fewer (or even zero) results, while further results are still available. It is the lack ofcursorin responses that indicates pagination is complete. The response set may have items removed if they are tombstoned or have been otherwise filtered from the response set.
- query parameters include an optional
- There is also a convention for subscription endpoints which support “sequencing” and backfill cursors:
- the endpoint has an optional
cursorquery parameter (integer) - all core message types include a
seqfield (integer). Theseqof messages increases monotonically, though may have gaps. - if the
cursoris not provided, the server will start returning new messages from the current point forward - if the
cursoris provided, the server will attempt to return historical messages starting with the matchingseq, continuing through to the current stream - if the
cursoris in the future (higher than the current sequence), an error is returned and the connection closed - if the
cursoris older than the earliest available message (or is 0), the server returns an info message of nameOutdatedCursor, then returns messages starting from the oldest available
- the endpoint has an optional
- A common pattern in API responses is to include “hydrated views” of data records. For example, when viewing an account’s profile, the response might include CDN or thumbnail URLs for any media files, moderation labels, global aggregations, and viewer-specific social graph context.
- For detailed views, a best practice to include the original record verbatim, instead of defining a new schema with a superset of fields. This is easier to maintain (can’t forget to update fields), and ensures any off-schema extension data is included.
- Viewer-specific metadata should be optional and either indicated in descriptions or grouped under a sub-object. This makes schemas reusable between “public” and “logged-in” views, and makes it clearer what information will be available when.
- A helpful pattern for application developers is to ensure there is an API endpoint that accepts a reference to a record (eg, a AT URI or equivalent; or multiple references) returns the hydrated data object(s).
- the
app.bsky.richtext.facetsystem can be used to annotate short text strings in a way that is simpler and safer to work with than full-featured markup languages- for more details see "Why RichText facets in Bluesky"
- the feature type system is an open union which can be extended with additional types
- more powerful systems like Markdown are more appropriate for long-form text
- One pattern for extending or supplementing a record is to define “sidecar” records in the same account repository with the same record key and different types (collections).
- Sidecar records can be defined and managed by the original Lexicon designer or by independent developers.
- The sidecar records can be updated (mutated) without breaking strong references to the original record.
- Sidecar context can be included in API responses.
- Because atproto accounts can be used flexibly with any application in the network, it can be ambiguous which accounts are participating in a particular app modality. This can be clarified if there is a known representative record type for the modality, and that clients create such a record for active accounts. Deletion of this record can be a way to indicate the user is no longer active. This works best if the record has a single known instance (fixed record key).
- For example, an-app specific “profile” or “declaration” record can indicate that the account has logged in to an associated app at least once, even if the record is “empty”.
- Backfill services can enumerate all accounts in the network with the given signaling record, and also process deletion of that record as deactivation of that modality.
- This design pattern is strongly recommended for new app modalities.