Pulling Leaflet posts into Astro as a content source

I’ve been writing on Leaflet lately — it’s quick, it lives on the atmosphere, and I don’t have to open my editor to jot something down. I’ve been putting my more informal / technical posts about the atmosphere there. The tradeoff is that those posts sit on keith.leaflet.pub instead of keith.is, which means my own site doesn’t know they exist. No RSS mixing, no tag pages, no sidenotes-treated-the-way-I-like-them. Just a separate little (cute) island.

So I wrote a content-layer loader for Astro that pulls my Leaflet posts down from my PDS at build time and drops them into the site as a real content collection. The posts still live canonically on Leaflet11 Which is the whole point of atproto — the data is mine, the renderer is Leaflet’s, and anything else that can read the lexicon (including a build script on my laptop) gets to join the party. — this is just a mirror so they show up in the feed here too.

The shape of a Leaflet record

Leaflet publishes documents as ATProto records in the site.standard.document collection. Each record has a site field pointing at a site.standard.publication URI, a title, a publishedAt, optional tags, and a content.pages array of typed blocks (pub.leaflet.blocks.text, header, code, unorderedList, and so on). Text blocks carry Bluesky-style rich-text facets — byte-offset ranges that say “this span is bold”, “this span is a link to X”. Not markdown. Not HTML. Byte offsets.

So the loader has two jobs: pull the records from the right repo, and turn those typed blocks into something Astro’s markdown pipeline can render.

Resolving the PDS

ATProto DIDs don’t tell you where the data lives. You have to ask the PLC directory, which returns a DID document with a #atproto_pds service endpoint. That’s the host you actually listRecords against.

typescript

async function resolvePds(did: string): Promise<string> {
  const res = await fetch(`https://plc.directory/${did}`);
  if (!res.ok) throw new Error(`plc.directory: ${res.status}`);
  const doc = await res.json();
  const pds = doc.service?.find(s => s.id === '#atproto_pds')?.serviceEndpoint;
  if (!pds) throw new Error(`no PDS endpoint for ${did}`);
  return pds;
}

I hardcoded my own DID and publication URI at the top of the file. If you’re doing this for yourself, grab yours from https://plc.directory/did:plc:your-did-here and from Leaflet’s publication settings.

Paginating listRecords

Once you have the PDS, com.atproto.repo.listRecords is the unauthenticated endpoint you want. It’s cursor-paginated, so an async generator keeps the loader clean:

typescript

async function* iterRecords(pds: string, did: string) {
  let cursor: string | undefined;
  do {
    const url = new URL(`${pds}/xrpc/com.atproto.repo.listRecords`);
    url.searchParams.set('repo', did);
    url.searchParams.set('collection', 'site.standard.document');
    url.searchParams.set('limit', '100');
    if (cursor) url.searchParams.set('cursor', cursor);

    const res = await fetch(url);
    if (!res.ok) throw new Error(`listRecords: ${res.status}`);
    const body = await res.json();
    for (const record of body.records) yield record;
    cursor = body.cursor;
  } while (cursor);
}

Every Leaflet document you’ve ever written lives in that collection, across all your publications. I filter to just this site by checking record.value.site against my publication URI.

The loader itself

Astro 5’s content layer expects a Loader with a load method. Inside load you get a store, a logger, a renderMarkdown helper, and a parseData function that runs your schema. The shape is boring:

typescript

import type { Loader } from 'astro/loaders';
import { blocksToMarkdown } from './leaflet-blocks';

export function leafletLoader(): Loader {
  return {
    name: 'leaflet-loader',
    async load({ store, logger, renderMarkdown, parseData }) {
      store.clear();
      const pds = await resolvePds(KEITH_DID);

      for await (const record of iterRecords(pds, KEITH_DID)) {
        if (record.value.site !== KEITH_SITE_URI) continue;

        const rkey = record.uri.split('/').pop()!;
        const id = `leaflet-${rkey}`;
        const markdown = blocksToMarkdown(record.value.content?.pages);

        const data = await parseData({
          id,
          data: {
            title: record.value.title,
            description: record.value.description,
            date: record.value.publishedAt,
            tags: [...(record.value.tags ?? []), 'atmosphere'],
            leafletUri: record.uri,
            leafletUrl: `https://keith.leaflet.pub/${rkey}`,
            rkey,
          },
        });

        store.set({
          id,
          data,
          body: markdown,
          rendered: await renderMarkdown(markdown),
        });
      }
    },
  };
}

Every post gets an automatic atmosphere tag so I can filter them on the tag page, and a leafletUrl so the layout can link back to the canonical copy.

Facets → markdown

The annoying part. Leaflet stores rich text as plaintext plus an array of facets over UTF-8 byte ranges. That’s the Bluesky convention, and it’s a reasonable choice when you don’t want to commit to markdown or HTML at the lexicon level — but it means I have to reassemble the string myself.

The core loop: sort facets by start offset, walk the bytes, wrap each facet span with whatever markdown the feature types call for, and escape the unfaceted slices so a stray * doesn’t accidentally become italic.

typescript

const encoder = new TextEncoder();
const decoder = new TextDecoder();

export function applyFacets(plaintext: string, facets: Facet[] = []): string {
  if (!facets.length) return escapeMarkdown(plaintext);

  const bytes = encoder.encode(plaintext);
  const sorted = [...facets].sort((a, b) => a.index.byteStart - b.index.byteStart);

  let result = '';
  let cursor = 0;

  for (const facet of sorted) {
    const { byteStart, byteEnd } = facet.index;
    if (byteStart < cursor) continue; // skip overlaps

    if (byteStart > cursor) {
      result += escapeMarkdown(decoder.decode(bytes.slice(cursor, byteStart)));
    }

    const inner = decoder.decode(bytes.slice(byteStart, byteEnd));
    // inspect facet.features and wrap with **bold**, *italic*, `code`, [link](uri)
    result += wrapFeatures(inner, facet.features);
    cursor = byteEnd;
  }

  if (cursor < bytes.length) {
    result += escapeMarkdown(decoder.decode(bytes.slice(cursor)));
  }
  return result;
}

Do the byte slicing with TextEncoder/TextDecoder, not string indexing. JavaScript strings are UTF-16 code units, and the facet offsets are UTF-8 bytes. The moment someone puts an emoji in a post, naïve substring calls start slicing inside a surrogate pair and everything downstream is wrong.

Block rendering is a straightforward switch on $type: pub.leaflet.blocks.text gets faceted, header uses block.level, code fences with block.language, lists recurse. Horizontal rule is ---. Blockquote prefixes each line with > . I’m still punting on images until I decide whether to download blobs at build time or just hotlink to the PDS.

Wiring it into the collection config

Once the loader exists, Leaflet posts become a first-class collection. Same shape as every other one:

typescript

import { defineCollection, z } from 'astro:content';
import { leafletLoader } from './loaders/leaflet';

const leaflet = defineCollection({
  loader: leafletLoader(),
  schema: z.object({
    title: z.string(),
    description: z.string().optional(),
    date: z.coerce.date(),
    tags: z.array(z.string()).default([]),
    leafletUri: z.string(),
    leafletUrl: z.string(),
    rkey: z.string(),
  }),
});

export const collections = { posts, leaflet, /* ... */ };

My posts page now does getCollection('posts') and getCollection('leaflet'), merges them, sorts by date, and renders the combined list. Tag pages pick up atmosphere automatically. The RSS feed includes both.

What got me

A few things, in the order they bit me:

PDS discovery is a separate hop. I forgot the first time and hardcoded bsky.social as the host, which works for accounts whose data actually lives there and fails silently for the rest. Always resolve through plc.directory.
The site filter is load-bearing. Every Leaflet doc you’ve ever drafted is in the same collection, including ones attached to other publications, test pubs, and old drafts. Without filtering by site, I was pulling in every stray scribble.
Byte offsets, not character offsets. See above. Emoji will humble you.
store.clear() at the top of load. Otherwise deleted Leaflet posts stick around in the content store across rebuilds, which is a very confusing debugging experience.

The whole thing is maybe 200 lines across leaflet.ts and leaflet-blocks.ts, and it rebuilds in a couple of seconds. Worth it to keep writing on Leaflet without abandoning my own feed.

Questions, corrections, and “you’re doing this the hard way” notes all welcome — I’m @keithlaugh.love on Bluesky.

Full source

If you want to drop this into your own Astro site, here are both files in full. Swap KEITH_DID and KEITH_SITE_URI for your own values.

`src/content/loaders/leaflet.ts`

typescript

// Astro content-layer loader that fetches Leaflet posts from the ATProto PDS.
// Flow: resolve DID → PDS, paginate listRecords, filter by publication URI,
// convert Leaflet's typed blocks to markdown, hand the result to Astro.

import type { Loader } from 'astro/loaders';
import { blocksToMarkdown } from './leaflet-blocks';

// Your DID (find it at https://plc.directory/ or via `bsky` CLI).
const KEITH_DID = 'did:plc:5qartdsce62n2wfyvtocmoob';

// Leaflet stores every document you've ever written in this one collection,
// across every publication you own. We filter to a single publication below.
const LEAFLET_COLLECTION = 'site.standard.document';

// The AT-URI of the specific Leaflet publication we want to mirror.
// Grab it from Leaflet's publication settings (or from any record's `site` field).
const KEITH_SITE_URI =
  'at://did:plc:5qartdsce62n2wfyvtocmoob/site.standard.publication/3mdlag6wb3s2k';

// Used to build canonical URLs back to the Leaflet-hosted copy of each post.
const LEAFLET_HOST = 'https://keith.leaflet.pub';

// Automatically applied to every imported post so they're easy to filter/group.
const AUTO_TAG = 'atmosphere';

type PlcService = { id: string; type: string; serviceEndpoint: string };
type PlcDoc = { service: PlcService[] };

type LeafletRecord = {
  uri: string;
  cid: string;
  value: {
    $type: string;
    title: string;
    description?: string;
    site?: string;
    publishedAt: string;
    tags?: string[];
    content?: {
      $type?: string;
      pages?: any[];
    };
  };
};

type ListRecordsResponse = {
  records: LeafletRecord[];
  cursor?: string;
};

// ATProto DIDs don't embed their host. We ask the PLC directory where the
// account's data actually lives, then hit that host for listRecords.
async function resolvePds(did: string): Promise<string> {
  const res = await fetch(`https://plc.directory/${did}`);
  if (!res.ok) throw new Error(`plc.directory: ${res.status}`);
  const doc = (await res.json()) as PlcDoc;
  const pds = doc.service?.find((s) => s.id === '#atproto_pds')?.serviceEndpoint;
  if (!pds) throw new Error(`no PDS endpoint for ${did}`);
  return pds;
}

// Async generator so callers can stop early without buffering every page.
// listRecords is cursor-paginated; 100 is the max page size.
async function* iterRecords(pds: string, did: string) {
  let cursor: string | undefined;
  do {
    const url = new URL(`${pds}/xrpc/com.atproto.repo.listRecords`);
    url.searchParams.set('repo', did);
    url.searchParams.set('collection', LEAFLET_COLLECTION);
    url.searchParams.set('limit', '100');
    if (cursor) url.searchParams.set('cursor', cursor);

    const res = await fetch(url);
    if (!res.ok) throw new Error(`listRecords: ${res.status}`);
    const body = (await res.json()) as ListRecordsResponse;
    for (const record of body.records) yield record;
    cursor = body.cursor;
  } while (cursor);
}

// The rkey is the final path segment of an AT-URI — we use it as a stable slug.
function rkeyFromUri(uri: string): string {
  return uri.split('/').pop() ?? uri;
}

export function leafletLoader(): Loader {
  return {
    name: 'leaflet-loader',
    async load({ store, logger, renderMarkdown, parseData }) {
      // Clear on every build — otherwise deleted Leaflet posts linger
      // in Astro's content store and re-render forever.
      store.clear();

      let pds: string;
      try {
        pds = await resolvePds(KEITH_DID);
      } catch (err) {
        // Don't fail the whole build just because plc.directory is flaky.
        logger.warn(`[leaflet] could not resolve PDS: ${(err as Error).message}`);
        return;
      }

      logger.info(`[leaflet] fetching records from ${pds}`);

      let count = 0;
      try {
        for await (const record of iterRecords(pds, KEITH_DID)) {
          // Every Leaflet doc you've ever written is in this collection,
          // including ones attached to other publications. Filter hard.
          if (record.value.site !== KEITH_SITE_URI) continue;

          const rkey = rkeyFromUri(record.uri);
          const id = `leaflet-${rkey}`;
          const markdown = blocksToMarkdown(record.value.content?.pages);
          const leafletUrl = `${LEAFLET_HOST}/${rkey}`;

          // Dedupe via Set in case a user already tagged the post `atmosphere`.
          const tags = Array.from(
            new Set([...(record.value.tags ?? []), AUTO_TAG]),
          );

          const data = await parseData({
            id,
            data: {
              title: record.value.title,
              description: record.value.description,
              date: record.value.publishedAt,
              tags,
              leafletUri: record.uri,
              leafletUrl,
              rkey,
            },
          });

          // renderMarkdown runs Astro's full markdown pipeline (remark + rehype
          // plugins, shiki highlighting, etc.), so Leaflet posts get the same
          // treatment as file-backed ones.
          const rendered = await renderMarkdown(markdown);

          store.set({
            id,
            data,
            body: markdown,
            rendered,
          });

          count++;
        }
      } catch (err) {
        logger.warn(`[leaflet] fetch failed: ${(err as Error).message}`);
        return;
      }

      logger.info(`[leaflet] loaded ${count} post(s)`);
    },
  };
}

`src/content/loaders/leaflet-blocks.ts`

typescript

// Converts Leaflet's typed-block document structure into plain markdown,
// preserving rich-text formatting via Bluesky-style facets over UTF-8 byte ranges.

type Facet = {
  index: { byteStart: number; byteEnd: number };
  features: Array<{
    $type: string;
    uri?: string;
  }>;
};

type TextLike = {
  plaintext: string;
  facets?: Facet[];
};

type Block = {
  $type: string;
  [key: string]: any;
};

type PageBlockWrapper = {
  $type: string;
  block: Block;
};

type Page = {
  $type: string;
  blocks?: PageBlockWrapper[];
};

// Facet offsets are UTF-8 bytes, not UTF-16 code units. A single TextEncoder/
// TextDecoder pair handles the conversion for the whole module.
const encoder = new TextEncoder();
const decoder = new TextDecoder();

// Escape characters markdown treats as syntax, so user text doesn't accidentally
// become bold/italic/etc. Applied only to unfaceted slices — faceted spans
// are already being wrapped, so double-escaping would break them.
function escapeMarkdown(text: string): string {
  return text.replace(/([\\`*_{}\[\]()#+\-.!])/g, '\\$1');
}

// Walk facets left-to-right, slicing the byte buffer and wrapping each span
// with the markdown equivalent of its features. Overlapping facets are skipped
// (facets beginning before our cursor has advanced past them).
export function applyFacets(plaintext: string, facets: Facet[] = []): string {
  if (!facets.length) return escapeMarkdown(plaintext);

  const bytes = encoder.encode(plaintext);
  const sorted = [...facets].sort((a, b) => a.index.byteStart - b.index.byteStart);

  let result = '';
  let cursor = 0;

  for (const facet of sorted) {
    const { byteStart, byteEnd } = facet.index;
    if (byteStart < cursor) continue; // overlapping facet; skip

    // Emit any plain text between the last facet and this one.
    if (byteStart > cursor) {
      result += escapeMarkdown(decoder.decode(bytes.slice(cursor, byteStart)));
    }

    const inner = decoder.decode(bytes.slice(byteStart, byteEnd));
    let wrapped = inner;
    let linkUri: string | undefined;
    let isCode = false;
    let isBold = false;
    let isItalic = false;

    // A single facet can carry multiple features (e.g. bold + link).
    for (const feature of facet.features) {
      switch (feature.$type) {
        case 'pub.leaflet.richtext.facet#bold':
          isBold = true;
          break;
        case 'pub.leaflet.richtext.facet#italic':
          isItalic = true;
          break;
        case 'pub.leaflet.richtext.facet#code':
          isCode = true;
          break;
        case 'pub.leaflet.richtext.facet#link':
          linkUri = feature.uri;
          break;
      }
    }

    // Inline code wins over bold/italic — markdown can't combine them meaningfully.
    if (isCode) {
      wrapped = '`' + inner.replace(/`/g, '\\`') + '`';
    } else {
      wrapped = escapeMarkdown(inner);
      if (isBold) wrapped = `**${wrapped}**`;
      if (isItalic) wrapped = `*${wrapped}*`;
    }

    if (linkUri) {
      wrapped = `[${wrapped}](${linkUri})`;
    }

    result += wrapped;
    cursor = byteEnd;
  }

  // Trailing plain text after the last facet.
  if (cursor < bytes.length) {
    result += escapeMarkdown(decoder.decode(bytes.slice(cursor)));
  }

  return result;
}

function renderText(block: TextLike): string {
  return applyFacets(block.plaintext ?? '', block.facets);
}

// Headers carry a `level` field — clamp it to 1–6 so we can't emit `#######`.
function renderHeader(block: Block): string {
  const level = Math.min(Math.max(block.level ?? 2, 1), 6);
  return '#'.repeat(level) + ' ' + renderText(block as TextLike);
}

// Code blocks arrive as raw plaintext with an optional `language` hint that
// maps straight onto a markdown fence.
function renderCode(block: Block): string {
  const lang = block.language ?? '';
  const content = block.plaintext ?? '';
  return '```' + lang + '\n' + content + '\n```';
}

// Lists are recursive — children can themselves be lists, text blocks, etc.
// Indent by two spaces per depth level to keep markdown parsers happy.
function renderListItems(
  items: any[] | undefined,
  ordered: boolean,
  depth: number,
): string {
  if (!items?.length) return '';
  const indent = '  '.repeat(depth);
  const lines: string[] = [];

  items.forEach((item, i) => {
    const marker = ordered ? `${i + 1}.` : '-';
    const content = item.content ? renderBlock(item.content) : '';
    // Only the first line goes on the marker line; nested stuff recurses below.
    const firstLine = content.split('\n')[0] ?? '';
    lines.push(`${indent}${marker} ${firstLine}`);

    if (item.children?.length) {
      const nested = renderListItems(item.children, ordered, depth + 1);
      if (nested) lines.push(nested);
    }
  });

  return lines.join('\n');
}

// Dispatch on block type. Unknown types log a warning and render nothing
// rather than exploding the build.
function renderBlock(block: Block | undefined): string {
  if (!block || typeof block !== 'object') return '';
  const type = block.$type;

  switch (type) {
    case 'pub.leaflet.blocks.text':
      return renderText(block as TextLike);

    case 'pub.leaflet.blocks.header':
      return renderHeader(block);

    case 'pub.leaflet.blocks.code':
      return renderCode(block);

    case 'pub.leaflet.blocks.unorderedList':
      return renderListItems(block.children, false, 0);

    case 'pub.leaflet.blocks.orderedList':
      return renderListItems(block.children, true, 0);

    case 'pub.leaflet.blocks.horizontalRule':
      return '---';

    case 'pub.leaflet.blocks.blockquote':
      return '> ' + renderText(block as TextLike).replace(/\n/g, '\n> ');

    case 'pub.leaflet.blocks.image':
      // TODO: decide whether to download blobs at build time or hotlink the PDS.
      console.warn('[leaflet] image blocks not yet supported, skipping');
      return '';

    default:
      if (type) console.warn('[leaflet] unknown block type:', type);
      return '';
  }
}

// Entry point: walk every page, unwrap each block, join with blank lines so
// markdown paragraphs separate cleanly.
export function blocksToMarkdown(pages: Page[] | undefined): string {
  if (!pages?.length) return '';
  const parts: string[] = [];

  for (const page of pages) {
    for (const wrapper of page.blocks ?? []) {
      const rendered = renderBlock(wrapper.block);
      if (rendered) parts.push(rendered);
    }
  }

  return parts.join('\n\n');
}

`src/content/config.ts` (the relevant bit)

typescript

import { defineCollection, z } from 'astro:content';
import { leafletLoader } from './loaders/leaflet';

const leaflet = defineCollection({
  loader: leafletLoader(),
  schema: z.object({
    title: z.string(),
    description: z.string().optional(),
    date: z.coerce.date(),
    tags: z.array(z.string()).default([]),
    leafletUri: z.string(),
    leafletUrl: z.string(),
    rkey: z.string(),
  }),
});

export const collections = {
  posts,
  leaflet,
  // ...the rest of your collections
};

That’s the whole thing. Drop both files into src/content/loaders/, register the collection, and your Leaflet posts show up in getCollection('leaflet') on the next build.

The shape of a Leaflet record

Resolving the PDS

Paginating listRecords

The loader itself

Facets → markdown

Wiring it into the collection config

What got me

Full source

src/content/loaders/leaflet.ts

src/content/loaders/leaflet-blocks.ts

src/content/config.ts (the relevant bit)

`src/content/loaders/leaflet.ts`

`src/content/loaders/leaflet-blocks.ts`

`src/content/config.ts` (the relevant bit)