Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

This is the monorepo for the DSP Repository.

The DaSCH Service Platform (DSP) consists of two main components:

  • DSP VRE: The DSP Virtual Research Environment (VRE), where researchers can work on their data during the lifetime of the project. It consists of the DSP-APP, DSP-API and DSP-TOOLS.
    The DSP VRE is developed in various other git repositories.
  • DSP Repository: The DSP Repository is the long-term archive for research data. It consists of the DSP Archive and the Discovery and Presentation Environment (DPE).
    The DSP Repository is developed in this monorepo.

Additionally, the monorepo contains the Mosaic component library (design system).

For system architecture details, see DPE Architecture and Project Structure.

This documentation provides an overview of the project structure. It covers the different components of the system architecture, the design system we use for the development, and the processes we follow for working on the DSP Repository, including onboarding information, etc.

About this Documentation

This documentation is built using mdBook.

Pre-requisites

Before contributing, please ensure you have the following installed:

Any further dependencies can be installed using just commands:

just install-requirements

Building and Serving the Documentation

To run the documentation locally, use:

just docs-serve

Contributing to the Documentation

mdBook uses Markdown for documentation.

The documentation is organized into chapters and sections, which are defined in the SUMMARY.md file. Each section corresponds to a Markdown file in the src directory.

To configure the documentation (e.g. adding plugins), modify the book.toml file.

Deployment

This documentation is deployed to GitHub Pages automatically on every push to main via the gh-pages.yml workflow.

Workflows and Conventions

Entry Points

The first entry point of this repository is the README file, which should give anyone an indication of where to find any information they need.

For any interaction or coding-related workflow, the justfile is the primary source of truth. Run just without arguments to see all available commands with descriptions.

Key Development Commands

CommandDescription
just checkRun formatting and linting checks
just buildBuild all targets
just testRun all tests
just fmtFormat all Rust code (cargo fmt + leptosfmt)
just runRun server (release mode)
just watchWatch for changes and run tests
just watch-dpeRun DPE with hot reload
just watch-mosaic-playgroundRun Mosaic playground with hot reload
just install-requirementsInstall all development dependencies
just install-e2e-requirementsInstall Playwright browsers for E2E tests
just docs-serveServe documentation locally at localhost:3000
just validate-dataValidate all data files in the default data directory

Git Workflow

We use a rebase workflow. All changes are made on a branch, then rebased onto main before being merged. This keeps a clean, linear commit history.

  • Rebase-merge: PRs are integrated using rebase-merge (not squash or merge commits). Every commit on a branch becomes a commit on main.
  • Clean commit history: Before merging, clean up the branch so that each commit represents one logical unit of change. Squash fixups, reword messages, and reorder commits so the history reads well on main.

Commit Conventions

Follow Conventional Commits. Scopes match crate names: dpe-server, dpe-core, dpe-web, dpe-api-oai, mosaic-tiles, mosaic-playground, mosaic-playground-macro.

Types

PrefixMeaningChangelogVersion bump
feat:New user-visible functionalityFeaturesminor
fix:Bug fixBug Fixespatch
perf:Performance improvementPerformancepatch
revert:Revert a previous commitRevertspatch
refactor:Code restructuringhiddennone
test:Testshiddennone
ci:CI/CDhiddennone
docs:Documentationhiddennone
build:Build system, depshiddennone
style:Formattinghiddennone
chore:Maintenancehiddennone

Commit Organization

Group commits by user-visible impact, not by implementation journey.

  1. Each feat: or fix: commit = one changelog entry visible to deployers
  2. Internal work (build:, ci:, refactor:, docs:, chore:, test:) is hidden from changelog — squash aggressively
  3. Ask: "would a developer deploying this care?" If yes → feat: or fix:. If no → hidden type.
  4. Debugging journeys (trial-and-error, reverts, iterative fixes) belong in the PR description, not the commit history

Pull Request Workflow

PR Template

Fixes LINEAR-ID, LINEAR-ID, ...

## Motivation
Why this work was needed. What problem it solves for users.

## Summary
1-3 bullet points of user-visible changes.

## Key Changes
### [Topic]
- change details

## Challenges and Decisions
What was tried, what failed, and key architecture decisions.
Structure as sub-sections when multiple challenges exist:

### [Challenge title]
**Problem:** description of the issue encountered
**Tried:** approaches that didn't work and why
**Solution:** what worked and why it's the right approach

## Gotchas
Things future developers should know. Each gotcha should be
actionable — not just "this is hard" but "do X instead of Y".

## Test Plan
- [ ] verification steps

Why This Format Matters

The "Challenges and Decisions" section captures the debugging journey that would otherwise be lost when commits are squashed. Well-structured challenges become high-quality learnings automatically.

PR Creation Process

  1. Create as draft: gh pr create --draft
  2. Assign to the requesting developer: gh pr edit [PR_NUMBER] --add-assignee [USERNAME]
  3. Include a "Review Notes" section mentioning that separate commits should be checked for easier review

What Goes Where

InformationPut it in...
New feature / breaking changeCommit message (feat: / feat!:)
Bug fixCommit message (fix:)
Build/CI/refactor detailsCommit message (hidden type)
Why the work was neededPR Motivation section
What was tried and failedPR Challenges section
Architecture decisions + rationalePR Challenges section
Things to watch out forPR Gotchas section
Structured, searchable knowledgeLearnings doc (dasch-specs)

Release Workflow

Releases are automated via Release Please. On every push to main, Release Please reads conventional commit messages and either creates or updates a release PR. Merging the release PR creates a GitHub Release with auto-generated release notes.

Code Review

See Review Guidelines for the review checklist.

CI/CD

GitHub Actions workflows run automatically on pushes and pull requests. See Release, Deployment and Versioning for details on the CI/CD pipelines.

Project Structure and Code Organization

Overview

This repository is a Rust workspace structured as a monorepo. All Rust crates are organized as subdirectories within the modules/ directory.

modules/
├── dpe/                       # Discovery and Presentation Environment
│   ├── core/                  # Pure domain types, repositories, data loading (crate: dpe-core)
│   ├── api-oai/               # OAI-PMH 2.0 API (crate: dpe-api-oai)
│   ├── web/                   # Web layer: Leptos components, pages (crate: dpe-web)
│   ├── server/                # Server binary: route composition, Datastar fragments (crate: dpe-server)
│   ├── telemetry/             # Telemetry types and validation (crate: dpe-telemetry)
│   ├── web-e2e-tests/         # Playwright E2E tests
│   ├── public/                # Static assets
│   ├── style/                 # CSS / Tailwind
│   └── Dockerfile             # Production container image
└── mosaic/                    # Mosaic component library (design system)
    ├── tiles/                 # Reusable Leptos UI components (crate: mosaic-tiles)
    ├── playground/            # Component playground application (crate: mosaic-playground)
    ├── playground_macro/      # Proc macro for playground page generation (crate: mosaic-playground-macro)
    └── playground-e2e-tests/  # Playwright E2E tests for the playground

Crate and Folder Naming Convention

Crate names follow the {module}-{role} pattern. Folder names strip the module prefix, keeping only the role part. Hyphens in crate names become underscores in folder names when needed for Rust compatibility (proc macro crates).

CrateFolderRole
dpe-coredpe/corePure domain types and data access (zero framework deps)
dpe-api-oaidpe/api-oaiOAI-PMH 2.0 API (depends on dpe-core only)
dpe-webdpe/webLeptos SSR components, pages, #[server] functions
dpe-serverdpe/serverServer binary — composes all routes
dpe-telemetrydpe/telemetryTelemetry types, validation, and origin checking
mosaic-tilesmosaic/tilesReusable UI component library
mosaic-playgroundmosaic/playgroundComponent showcase application
mosaic-playground-macromosaic/playground_macroProc macro for playground page generation

API Crate Pattern

Each API is a separate crate under modules/dpe/:

  • Naming: dpe-api-{name} (e.g., dpe-api-oai)
  • Dependencies: dpe-core for domain types; never depends on other API crates or dpe-web
  • Entry point: Exports a handler function (e.g., pub async fn oai_handler(...))
  • Composition: dpe-server wires the handler into the Axum router

For detailed crate responsibilities and the dependency graph, see DPE Project Structure.

Release, Deployment and Versioning

CI/CD Pipelines

All CI/CD workflows are defined as GitHub Actions in .github/workflows/.

Checks and Tests

Every push and pull request runs:

  • check.yml — Formatting (rustfmt, leptosfmt) and linting (clippy)
  • test.yml — Runs the full test suite
  • scout-dpe.yml / scout-mosaic-playground.yml — Docker image vulnerability scanning (see Security)

Accessibility Testing

Defined in a11y-dpe.yml.

Runs on PRs and pushes to main that touch DPE UI code (modules/dpe/web/, modules/dpe/style/, modules/dpe/public/). Builds the DPE, then runs Playwright accessibility tests with axe-core against WCAG 2.1 AA.

Fuzz Testing

Defined in fuzz.yml.

Runs nightly at 02:00 UTC (and on manual dispatch). Fuzzes tab_validation and query_params targets for 10 minutes each using cargo-fuzz on nightly Rust. Corpus is cached between runs. On crash, automatically creates a GitHub issue with reproduction instructions.

Reusable Actions

Common CI steps are extracted into composite actions in .github/actions/:

ActionPurpose
build-dpeCompile DPE (Rust musl binary + Leptos site assets) and stage artifacts
docker-publishSet up Buildx, log in to Docker Hub, build and push an image
docker-scoutRun Docker Scout CVE scan and upload SARIF results

Mosaic Playground

The Mosaic component library playground has two deployment paths:

PR Preview (Cloud Run)

Defined in cloud-run-mosaic-pull-request.yml.

When a pull request modifies files under modules/mosaic/, a preview of the Mosaic playground is automatically deployed to Google Cloud Run. The preview URL is posted as a comment on the PR and updated on each push.

  • Trigger: PRs that touch modules/mosaic/** (same-repo only, not forks)
  • Service: Ephemeral Cloud Run service per PR
  • Cleanup: The Cloud Run service and container image are deleted when the PR is closed or merged

Authentication uses Workload Identity Federation (keyless, OIDC-based).

Production (Docker Hub + Jenkins)

Defined in mosaic-docker-publish.yml.

When changes to modules/mosaic/ are merged to main, the playground image is built, pushed to Docker Hub, and a Jenkins webhook triggers the production deployment.

DPE

PR Preview (Cloud Run)

Defined in cloud-run-dpe-pull-request.yml.

When a pull request modifies files under modules/dpe/, a preview of the DPE is automatically deployed to Google Cloud Run. Works the same way as the Mosaic preview: ephemeral service per PR, cleaned up on close/merge.

Continuous Deployment (Docker Hub + Jenkins)

Defined in dpe-docker-publish.yml.

On every push to main:

  1. Builds site assets with cargo-leptos
  2. Builds a static musl-linked binary
  3. Pushes the Docker image to Docker Hub (daschswiss/dpe:{tag})
  4. Triggers a Jenkins webhook for DEV deployment

Release Publishing

Defined in dpe-release-publish.yml.

When a GitHub Release is published (tag starting with v), builds and pushes a release-tagged Docker image.

Release Please

Defined in release-please.yml.

On every push to main, Release Please reads conventional commit messages and creates or updates a release PR with auto-generated changelog. Merging the release PR creates a GitHub Release.

Configuration lives in .github/release-please/config.json and .github/release-please/manifest.json.

Documentation (GitHub Pages)

Defined in gh-pages.yml. The mdBook documentation is built and deployed to GitHub Pages on pushes to main.

Claude Code

Defined in claude.yml.

Responds to @claude mentions in PR comments and issue comments. Supports code review (@claude review) and general assistance. Runs with limited permissions (contents: read, pull-requests: write).

Security

Why Security Scanning Matters

Software depends on a deep stack of third-party components: base OS images, system libraries, language runtimes, and application dependencies. Vulnerabilities are regularly discovered in these components — the CVE database publishes thousands each year. A single unpatched dependency in a Docker image can become an entry point for attackers in production.

Manual tracking of vulnerabilities across all dependencies is not practical. Automated scanning integrates into the development workflow so that new vulnerabilities are surfaced early — ideally before code reaches production.

Container Image Scanning with Docker Scout

We use Docker Scout to scan our Docker images for known vulnerabilities (CVEs). Scout analyzes the Software Bill of Materials (SBOM) of each image — the full inventory of OS packages, libraries, and application dependencies — and matches them against vulnerability databases.

What Gets Scanned

ImageWorkflowTrigger
DPE (daschswiss/dpe)scout-dpe.ymlPRs touching modules/dpe/** or Cargo.lock
Mosaic Playground (daschswiss/mosaic-playground)scout-mosaic-playground.ymlPRs touching modules/mosaic/** or Cargo.lock

How It Works

Each Scout workflow:

  1. Builds the Docker image locally — the image is loaded into the runner's Docker daemon (load: true) but never pushed to a registry. This means Scout scans exactly what would be deployed, without exposing unreviewed images.

  2. Runs a CVE analysis — Docker Scout compares the image's SBOM against known vulnerability databases, filtering for critical and high severity issues.

  3. Posts a PR comment — a summary of findings is posted directly on the pull request, giving developers immediate visibility without leaving their review workflow.

  4. Uploads a SARIF report — results are uploaded to the GitHub Security tab in SARIF format (Static Analysis Results Interchange Format), the industry standard for security tool output. This integrates with GitHub's code scanning alerts.

What To Do With Results

Scout results are currently informational — they do not block merging. When a scan reports vulnerabilities:

  • Critical/High in base image — check if a newer base image version is available that patches the issue. For DPE (distroless), these are rare. For Mosaic (Debian-based), update the base image tag.
  • Critical/High in dependencies — check if a dependency update resolves the issue. Run cargo update and re-test.
  • False positives — some CVEs may not be exploitable in our context. Document the rationale if choosing to accept the risk.

Prerequisites

  • Docker Scout is enabled for the daschswiss Docker Hub organization
  • Repository secrets DOCKER_USER and DOCKER_HUB_TOKEN (shared with publish workflows)
  • GitHub Advanced Security or a public repository (for SARIF upload)

Future Enhancements

  • Production comparison — using Docker Scout's compare command to show only new vulnerabilities introduced by a PR (requires configuring Docker Scout environments on Docker Hub)
  • Main-branch scanning — continuous monitoring of production images
  • Blocking on critical CVEs — failing the PR check when critical vulnerabilities are detected

DPE Architecture

The Discovery and Presentation Environment (DPE) serves research project metadata as a web application.

Crate Structure

dpe-core          Pure domain types, repositories, data loading
                  Dependencies: serde, serde_json only
                       │
          ┌────────────┼────────────┐
          │            │            │
     dpe-api-oai   dpe-web     (future APIs)
     OAI-PMH 2.0  Leptos SSR
     + axum        + Datastar
          │            │
          └────────────┘
                 │
           dpe-server
           Route composition
           (binary: dpe)
  • dpe-core: Framework-free domain layer. All types, repository traits, Fs implementations, and data loading.
  • dpe-api-oai: OAI-PMH 2.0 endpoint (see OAI-PMH Endpoint). Depends only on dpe-core — no Leptos.
  • dpe-web: Leptos SSR components, pages, and #[server] wrappers. Re-exports dpe-core types for backward compatibility.
  • dpe-server: Thin composition root. Wires Leptos routes (dpe-web) and API handlers (dpe-api-oai) into a single Axum server.

Hypermedia-Driven Architecture

The DPE uses a hypermedia-driven architecture where the server is the single source of truth for UI state. Interactivity is provided by Datastar (~14KB JS) instead of client-side frameworks or WASM.

Why Datastar over Leptos islands:

  • No WASM compilation step (faster builds)
  • Smaller client-side footprint (~14KB vs ~200KB+ WASM)
  • Server controls all state (HATEOAS)
  • Graceful degradation — works as plain HTML links without JavaScript
  • Simpler mental model — HTML attributes, not reactive signals

Rendering Model

Pages are rendered server-side by Leptos SSR. Dynamic content updates (tab switching, search autocomplete) are handled by Datastar SSE fragments.

Initial page load:
  Browser → GET /projects/ABC1 → Leptos SSR → Full HTML page

Tab switch (with JS):
  Browser → GET /projects/ABC1/tab/publications (SSE)
         ← PatchElements (#project-tabs replacement)
         ← ExecuteScript (history.replaceState for URL)

Tab switch (without JS):
  Browser → GET /projects/ABC1?tab=publications → Full page reload

Fragment Route Convention

Fragment endpoints are pure Axum handlers (not Leptos routes) that render Leptos components to HTML strings and deliver them as Datastar SSE events.

Route pattern: resource-action nesting

GET /projects/{id}              → Full page (Leptos SSR)
GET /projects/{id}/tab/{tab}    → SSE fragment (Axum + Datastar)

Different path depths in Axum's radix trie mean no conflict and no header-based discrimination.

HATEOAS Tab Pattern

The server returns the complete tab component (tab bar + panel) in each SSE response. This means:

  • Server controls which tab is active (aria-selected)
  • Server controls which tabs are visible (e.g., hide Publications if none exist)
  • Server pushes the bookmarkable URL via ExecuteScript + history.replaceState

The client never needs to track tab state — the server-rendered HTML IS the state.

Datastar Attribute Patterns

<!-- Tab link with Datastar enhancement -->
<a href="/projects/ABC1?tab=publications"
   role="tab" aria-selected="false"
   data-on:click__prevent="@get('/projects/ABC1/tab/publications', {retry: 'never'})"
   data-indicator:_tab_loading>
  Publications
</a>

<!-- SSE failure fallback on container -->
<div id="project-tabs"
     data-on:datastar-fetch="
       (evt.detail.type === 'error' || evt.detail.type === 'retries-failed')
       && evt.detail.el.closest('#project-tabs')
       && (window.location.href = evt.detail.el.getAttribute('href'))
     ">

Datastar Attribute Conventions

  • Signal naming: Use _ prefix for client-only signals (e.g., _tab_loading). The underscore excludes the signal from server payloads.
  • No __debounce on __prevent anchors: Do NOT combine __prevent with __debounce or __throttle on anchor elements — known Datastar timing issue.
  • retry: 'never': Use on @get() calls where fallback to full navigation is preferred over retrying.
  • Graceful degradation: Every Datastar-enhanced <a> must have a valid href for no-JS fallback.

See Also

Metadata Model (v2)

This is the metadata model that the DPE serves. It describes DaSCH research projects and their context as a hierarchy of entities. The model originated in the now-retired dsp-meta repository, where it was designed as the successor to the earlier v1 model. The metadata has since been migrated to this repository and converted to v2; the model is implemented as Rust types in the dpe-core crate (modules/dpe/core/src/) and the data lives under modules/dpe/server/data/.

note

Conceptual model vs. implementation. This page documents the conceptual v2 model — the full design as it was worked out. The implementation in dpe-core is a pragmatic subset: some entities and fields are not (yet) implemented, and a few differ in shape from the design. Each section carries an As implemented callout describing the current state. Treat the conceptual tables as the design intent and the callouts as the source of truth for what the code does today.

The enhancements over v1 are designed to better accommodate the inherent complexity of humanities projects, while still supporting simpler project structures. The two main additions are:

  • a hierarchical level above the research project — the project cluster — which represents overarching initiatives that span multiple projects over long periods;
  • collections, which allow more precise referencing and grouping of parts of the data, including cross-project and nested groupings. Collections replace the v1 dataset concept.

note

For each property, two cardinalities may be given:

  • the archival cardinality, which applies once the entity is finished/finalized for archival;
  • the in-progress (WIP) cardinality, which applies while the entity is still in progress.

If only one cardinality is given, it applies to both stages.

Licensing

All metadata is considered public domain. By signing the deposit agreement, projects consent to that. This is unlike the domain metadata which is part of the project's data and can be licensed as the project wishes.

Whenever metadata is served to a client, it is served with legal information. Legal information on metadata, as everywhere else, consists of the license, the copyright holder and the authorship. For metadata the license is always "public domain", the copyright holder is always "DaSCH" and the authorship is always the project and DaSCH.

Metadata is always publicly available, even if the corresponding project, collection or record is not. This ensures the metadata stays findable and reusable even if the data itself is not. The only exception is the status "embargoed", during which the metadata is only available on the project level.

Model Overview

The metadata model is a hierarchical structure of metadata elements.

flowchart TD
    projectCluster[Project Cluster]
    project[Research Project]
    record[Record]
    collection[Collection]

    projectCluster -->|0-n| project
    projectCluster -->|0-n| collection

    project -->|0-n| collection
    project -->|1-n| record
    collection --> |0-n| record
    collection --> |0-n| collection
  • A Project Cluster collects research projects (or nested project clusters). It is typically institutional in nature, not directly tied to a specific funding grant, and may be long-lived. Examples are EKWS/CAS, BEOL or LHTT.
  • A Research Project is the main entity of the model. It corresponds to a project in the DSP. It is typically tied to a specific funding grant and hence has a limited lifetime of ~3–5 years; multiple funding rounds and a longer lifetime are possible. A research project is part of 0–n project clusters and contains both collections and records. All records in the project are listed in the project's records array, regardless of collection membership.
  • A Collection is a flexible grouping of records that can span multiple projects or be nested within other collections. Collections enable cross-project organization and support subsetting and specialized access patterns. They may contain both individual records and nested collections.
  • A Record is a single entry within a project — the smallest unit that can meaningfully have an identifier. It maps to a knora-base:Resource (DSP-API) or an Asset (SIPI/Ingest) in the DSP. For DSP resources, the metadata of the record is the existence of the resource itself plus information such as label, access rights and provenance; the core data are the values on that resource. For assets, the metadata is the existence of the asset and its access rights; the core data is the binary content. A record is part of exactly 1 research project and may be part of 0–n collections.

Additionally, Person and Organization are entities independent of the project hierarchy, related to various entities within it (e.g. as contacts, contributors or funders).

Entity Types

Project Cluster

FieldTypeCard.
idinternal_id1
pidstring1
namestring1
projectsinternal_id[]0-n
projectClustersinternal_id[]0-n
collectionsinternal_id[]0-n
descriptionlang_string0-1
urlurl0-1
howToCitestring0-1
alternativeNameslang_string[]0-n
contactPointinternal_id[]0-n
documentationMaterialurl[]0-n
  • id: A unique internal identifier; not exposed to the user and not persistent.
  • pid: A unique persistent identifier (currently an ARK URL).
  • name: The name of the project cluster.
  • projects: Identifiers of the projects in the cluster.
  • projectClusters: Identifiers of nested project clusters.
  • description: The description of the cluster.
  • url: The URL to the web presence of the cluster.
  • howToCite: How to cite the cluster. If not provided, the standard form <name> (<year>). [Project Cluster]. DaSCH. <ARK> is used.
  • alternativeNames: Alternative names of the cluster.
  • contactPoint: Persons or organizations responsible for the cluster.
  • documentationMaterial: URLs pointing to documentation material.

Most fields are optional, to keep the entity flexible. There is no difference in cardinality between the archival and in-progress stages.

warning

As implemented (cluster.rs): a cluster only has id, name, description (a lang_string) and projects; pid is parsed but optional. The fields projectClusters (nested clusters), collections, url, howToCite, alternativeNames, contactPoint and documentationMaterial are not implemented. Within a project, a cluster is exposed as a lightweight reference (id, name, flattened description). There are currently 5 cluster files.

Project

FieldTypeCard.WIP Card.
idinternal_id11
pidstring11
shortcodestring11
officialNamestring11
statusstring11
namestring11
shortDescriptionstring10-1
descriptionlang_string11
startDatedate10-1
endDatedate10-1
dataPublicationYeardate10-1
urlurl1-20-2
howToCitestring11
accessRightsaccessRights11
legalInfolegalInfo[]1-n0-n
dataManagementPlanstring / url11
typeOfDatastring[]1-n0-n
dataLanguagelang_string[]1-n0-n
collectionsinternal_id[]0-n0-n
recordsinternal_id[]0-n0-n
keywordslang_string[]1-n0-n
disciplineslang_string / authorityFileReference[]1-n0-n
temporalCoveragelang_string / authorityFileReference[]1-n0-n
spatialCoverageauthorityFileReference[]1-n0-n
attributionsattribution[]1-n0-n
abstractlang_string0-10-1
contactPointinternal_id[]0-n0-n
publicationspublication[]0-n0-n
fundingstring / grant[]1-n0-n
alternativeNameslang_string[]0-n0-n
documentationMaterialurl[]0-n0-n
provenancestring0-10-1
additionalMaterialurl[]0-n0-n
  • id: A unique internal identifier; not exposed to the user and not persistent.
  • pid: A unique persistent identifier (currently an ARK URL).
  • shortcode: The project's DSP shortcode, internal only. Four hexadecimal characters, upper case.
  • officialName: The official name of the project.
  • status: The status of the project — either "Ongoing" or "Finished".
  • name: The name of the project.
  • shortDescription: A short teaser. Maximum length: 200 characters.
  • description: The full description of the project.
  • startDate: The start date of the project.
  • endDate: The end date of the project.
  • dataPublicationYear: The year the data is published — normally the year the project finishes and the data moves to the archive. Under embargo, the year the embargo is lifted. Projects published while in the VRE may set a specific year.
  • url: The URL(s) to the web presence of the project. The first should point to where the data is available; the second, optional, may point to the project website.
  • howToCite: How to cite the project. If not provided, the standard form <contributors> (<year>). <project name> [Database]. DaSCH. <ARK> is used.
  • accessRights: The access rights of the project (see Access Rights). Defines to what extent the project data is accessible in the DPE. If the project is embargoed, the metadata is only available on the project level.
  • legalInfo: Legal information about the project. Calculated from records; cannot be specified explicitly on the project.
  • dataManagementPlan: A data management plan (string or URL); use "not accessible" if not available.
  • typeOfData: The type(s) of data — "XML", "Text", "Image", "Video", "Audio". Computed from the records where available and optionally added manually.
  • dataLanguage: Languages contained in the project. Computed from the records where available and optionally added manually.
  • collections: Collection identifiers that optionally group project data.
  • records: Identifiers of all records that make up the project data. This is the canonical list of all records in the project.
  • keywords: Keywords describing the project.
  • disciplines: Disciplines the project relates to.
  • temporalCoverage: Epochs or time periods the project relates to.
  • spatialCoverage: References to spatial entities (places, regions, …).
  • attributions: Roles people/organizations have in the project. Entered manually, since there may be people without authorship (reviewers, organizers, …).
  • abstract: An abstract of the project.
  • contactPoint: Persons or organizations responsible for the project.
  • publications: Publications related to the project.
  • funding: Either a string ("No funding") or a list of grants.
  • alternativeNames: Alternative names of the project.
  • documentationMaterial: URLs pointing to documentation material.
  • provenance: The history of the project, if applicable.
  • additionalMaterial: Additional URLs related to the project.

note

All records of a project are referenced in its records array, regardless of collection membership — this is the canonical list.

warning

As implemented (project.rs): the project is the most complete entity. Notable differences from the conceptual table:

  • url is parsed into a single primary url plus a separate secondaryUrl, both authority file references. The legacy string-array form (["<data url>", "<website url>"]) is still accepted: the first element becomes url, the second secondaryUrl. Placeholder strings ("MISSING", "CALCULATED") are filtered out so they never render as live links.
  • clusters and collections are stored as ID lists and resolved to lightweight references on demand.
  • Many fields are optional in the code regardless of the archival cardinality above (e.g. dataManagementPlan, dataPublicationYear, typeOfData, dataLanguage, records, publications, provenance, additionalMaterial).

Collection

FieldTypeCard.WIP Card.
idinternal_id11
pidstring11
namestring11
accessRightsaccessRights11
legalInfolegalInfo[]1-n1-n
howToCitestring11
descriptionlang_string0-10-1
typeOfDatastring[]1-n0-n
dateCreateddate10-1
dateModifieddate0-10-1
recordsinternal_id[]0-n0-n
collectionsinternal_id[]0-n0-n
languageslang_string[]1-n0-n
additionalMaterialurl[]0-n0-n
provenancestring0-10-1
keywordslang_string[]0-n0-n
documentationMaterialurl[]0-n0-n
  • id: A unique internal identifier; not exposed to the user and not persistent.
  • pid: A unique persistent identifier (currently an ARK URL).
  • name: The name of the collection.
  • accessRights: The access rights of the collection (see Access Rights).
  • legalInfo: Legal information about the collection. Calculated from records/sub-collections; may be added manually.
  • howToCite: How to cite the collection. If not provided, the standard form <contributors> (<year>). <collection name> [Collection]. DaSCH. <ARK> is used.
  • description: The description of the collection.
  • typeOfData: The type(s) of data — "XML", "Text", "Image", "Video", "Audio".
  • dateCreated: When the collection was created.
  • dateModified: When the collection was last modified.
  • records: Identifiers of the records in the collection.
  • collections: Identifiers of nested collections.
  • languages: Languages contained in the collection.
  • additionalMaterial: Additional URLs related to the collection.
  • provenance: The history of the collection, if applicable.
  • keywords: Keywords for search purposes.
  • documentationMaterial: URLs pointing to documentation material.

warning

As implemented (collection.rs): only a lightweight reference type exists (id, name, description). The full collection entity above is not implemented, and there are currently no collection data files — projects reference collections by ID, but no collections are populated. Collections remain part of the design but are unused in practice.

Record

FieldTypeCard.WIP Card.
idinternal_id11
pidstring11
labellang_string11
accessRightsstring11
legalInfolegalInfo11
howToCitestring11
publisherstring11
sourcestring0-10-1
descriptionlang_string0-10-1
dateCreateddate0-10-1
dateModifieddate0-10-1
datePublisheddate0-10-1
typeOfDatastring0-10-1
sizestring0-10-1
keywordslang_string[]0-n0-n
  • id: A unique identifier for the record.
  • pid: A unique persistent identifier (an ARK URL).
  • label: The label of the record. For assets, this may be the original file name. For IIIF URLs, it is useful for the case when the URL is no longer available. In the long run IIIF Manifests, rather than image URLs, would let labels be extracted from there.
  • accessRights: The access rights of the record. Defines to what extent the record data is accessible in the DPE.
  • legalInfo: Legal information about the record.
  • howToCite: How to cite the record. If not provided, the standard form <label> (<creation year>). [Data Record]. DaSCH. <ARK> is used.
  • publisher: The publisher of the record. Literal "DaSCH"; required for OpenAIRE compliance.
  • source: The provenance of the record. Recommended for OpenAIRE: use only if the record is a digitization of a non-digital source, in which case it should identify the original source.
  • description: The description of the record. If the project does not want descriptions to be public domain and always open, it must not use this property but instead create a custom property.
  • dateCreated, dateModified: Creation and modification dates.
  • datePublished: When the record was made publicly available — normally when it moved to the archive, or when an embargo is lifted.
  • typeOfData: The type of data — "XML", "Text", "Image", "Video", "Audio".
  • size: The size of the record (OpenAIRE Size).
  • keywords: Keywords for search purposes.

warning

As implemented (record.rs): records are implemented and the fields match the table closely. The pid is parsed into its ARK components (host, shortcode, record id). Note that record-level metadata — which the original design flagged as not yet feasible — is now implemented. Coverage is still partial: only a few projects currently have record files. Records are exposed through the OAI-PMH endpoint.

Person

FieldTypeCard.
idinternal_id1
pidstring1
sameAsauthorityFileReference[]0-n
givenNamesstring[]1-n
familyNamesstring[]1-n
honoraryPrefixstring[]0-n
honorarySuffixstring[]0-n
affiliationsinternal_id[]0-n
emailstring0-n
addressaddress0-1

Cardinality is the same for both stages.

  • id: A unique internal identifier; not exposed to the user and not persistent.
  • pid: A unique persistent identifier (currently an ARK URL).
  • sameAs: References to external authority files (ORCID, VIAF, GND, …).
  • givenNames: The given names of the person.
  • familyNames: The family names of the person.
  • honoraryPrefix: Honorary prefixes, e.g. "Prof. Dr.".
  • honorarySuffix: Honorary suffixes, e.g. "PhD", "MA".
  • affiliations: Organizations the person is affiliated with.
  • email: The email address of the person.
  • address: The postal address of the person — the address at their organization, not a personal address.

warning

As implemented (person.rs): the person has id, givenNames, familyNames, jobTitles (string[]), affiliations, sameAs and email. Differences from the conceptual table: there is no pid, honoraryPrefix, honorarySuffix or address, and there is an additional jobTitles field not in the original design. Note that project-contribution roles (e.g. "Project leader") belong in a project's attributions, not in jobTitles — a guard enforces this for an explicit list of role words.

Organization

FieldTypeCard.
idinternal_id1
pidstring1
sameAsauthorityFileReference[]0-n
namestring1
urlurl1
addressaddress0-1
emailstring0-1
alternativeNamelang_string0-1

Cardinality is the same for both stages.

  • id: A unique internal identifier; not exposed to the user and not persistent.
  • pid: A unique persistent identifier (currently an ARK URL).
  • sameAs: References to external authority files (e.g. ROR).
  • name: The name of the organization.
  • url: The URL of the organization.
  • address: The address of the organization.
  • email: The email address of the organization.
  • alternativeName: Alternative names of the organization.

warning

As implemented (organization.rs): matches the table except there is no pid.

Value Types

String with Language Tag (lang_string)

An object with ISO language codes as keys and strings as values:

{
  "en": "Lorem ipsum in English.",
  "de": "Lorem ipsum auf Deutsch."
}

A single lang_string value can hold multiple translations.

Authority File Reference

An object representing a reference to an external authority file.

FieldTypeCard.
typestring1
urlurl1
textstring0-1
  • type: The type of the reference — e.g. 'Geonames', 'Pleiades', 'Skos', 'Periodo', 'Chronontology', 'GND', 'VIAF', 'Grid', 'ORCID', 'Creative Commons', 'COAR'. Used to determine the semantics of the URL. The implementation also uses 'URL' for plain links.
  • url: The URL itself.
  • text: A human-readable text for display.

PID

A persistent identifier — may be an ARK or a DOI. Used e.g. on publications.

FieldTypeCard.
urlurl1
textstring0-1

Publication

FieldTypeCard.
textstring1
pidpid0-1
  • text: The text of the publication.
  • pid: A URL to the publication, e.g. a DOI, if available.

Address

FieldTypeCard.
streetstring1
postalCodestring1
localitystring1
countrystring1
cantonstring0-1
additionalstring0-1

Grant

FieldTypeCard.Restrictions
fundersinternal_id[]1-nPerson or Organization IDs
numberstring0-1
namestring0-1
urlurl0-1
FieldTypeCard.
licenselicense1
copyrightHolderstring1
authorshipstring[]1-n

License

FieldTypeCard.
licenseIdentifierstring1
licenseDatedate1
licenseURIurl1

Attribution

Modelled according to the OpenAIRE guidelines.

FieldTypeCard.
contributorinternal_id1
contributorTypestring[]1-n

Access Rights

FieldTypeCard.
accessRightsstring1
embargoDatedate0-1
  • accessRights: One of "Full Open Access", "Open Access with Restrictions", "Embargoed Access", "Metadata only Access".
  • embargoDate: The date when the embargo ends.

warning

As implemented: accessRights is one of the four literals above (the conceptual design referred to a COAR authority-file reference). The value is wrapped in an object, e.g. { "accessRights": "Full Open Access" }, with an optional embargoDate.

Internal ID

An internal ID (internal_id) is a unique identifier for an entity within the system. It is not intentionally exposed to the user and is presented as a string.

OpenAIRE Mapping

The model includes a mapping to the OpenAIRE Guidelines for Data Archives, which are based on the DataCite Metadata Schema. Currently only projects are exposed as OpenAIRE datasets. For the endpoint that serves this, see the OAI-PMH Endpoint page.

The OpenAIRE Guidelines specify 18 fields, with cardinalities Mandatory (M), Recommended (R), Mandatory if Applicable (MA) and Optional (O):

  1. Identifier (M)
  2. Creator (M)
  3. Title (M)
  4. Publisher (M)
  5. PublicationYear (M)
  6. Subject (R)
  7. Contributor (MA/O)
  8. Date (M)
  9. Language (R)
  10. ResourceType (R)
  11. AlternateIdentifier (O)
  12. RelatedIdentifier (MA)
  13. Size (O)
  14. Format (O)
  15. Version (O)
  16. Rights (MA)
  17. Description (MA)
  18. GeoLocation (O)

Project → OpenAIRE Dataset Mapping

Project FieldOpenAIRE FieldMapping Notes
pidIdentifier (M)Direct mapping
attributions (creator roles)Creator (M)Which roles count as creators is project-specific
nameTitle (M)Direct mapping
Fixed "DaSCH"Publisher (M)Static value
TBD date fieldPublicationYear (M)startDate or endDate year — project-specific
keywordsSubject (R)Direct mapping
attributions (non-creator roles)Contributor (MA/O)Remaining attributions
startDate, endDateDate (M)Multiple dates
Computed from recordsLanguage (R)Aggregated from project records
Fixed "Dataset"ResourceType (R)Static value for projects
shortcodeAlternateIdentifier (O)DSP shortcode as alternate ID
collections refsRelatedIdentifier (MA)Collection relationships
Computed from recordsSize (O)Aggregated from project records
Computed from recordsFormat (O)Aggregated typeOfData from records
Not applicableVersion (O)Projects don't have versions
legalInfoRights (MA)Direct mapping
descriptionDescription (MA)Direct mapping
spatialCoverageGeoLocation (O)Direct mapping

Open questions in the design include which attribution roles map to Creator vs. Contributor, how PublicationYear should be derived per project, and whether collections should also be exposed as OpenAIRE datasets.

Examples

The following examples show the conceptual JSON shape. Real data files live under modules/dpe/server/data/ and may differ in the ways noted in the As implemented callouts above (e.g. the legacy url array form).

Project Cluster

{
  "id": "cluster-0001",
  "pid": "https://ark.dasch.swiss/ark:/72163/1/cluster-0001",
  "name": "Project Cluster Name",
  "projects": ["project-0001", "project-0002"],
  "projectClusters": ["cluster-0002"],
  "description": {
    "en": "Project Cluster Description",
    "de": "Projektcluster Beschreibung"
  },
  "url": "https://example.com/project-cluster",
  "howToCite": "Project Cluster Name (2025). [Project Cluster]. DaSCH. https://ark.dasch.swiss/ark:/72163/1/cluster-0001",
  "alternativeNames": [{ "en": "Alternative Name", "de": "Alternativer Name" }],
  "contactPoint": ["person-0001", "organization-0001"],
  "documentationMaterial": ["https://example.com/documentation"]
}

Project

{
  "id": "project-0001",
  "pid": "https://ark.dasch.swiss/ark:/72163/1/project-0001",
  "shortcode": "1234",
  "officialName": "Project Official Name",
  "status": "Ongoing",
  "name": "Project Name",
  "shortDescription": "Short description of the project.",
  "description": { "en": "Project Description", "de": "Projektbeschreibung" },
  "startDate": "2023-01-01",
  "endDate": "2028-01-01",
  "url": { "type": "URL", "url": "https://data.dasch.swiss/projects/project-0001" },
  "secondaryUrl": { "type": "URL", "url": "https://example.com/project-website" },
  "howToCite": "Project Name (2025). [Project]. DaSCH. https://ark.dasch.swiss/ark:/72163/1/project-0001",
  "accessRights": { "accessRights": "Full Open Access" },
  "legalInfo": [
    {
      "license": {
        "licenseIdentifier": "CC BY 4.0",
        "licenseDate": "2023-01-01",
        "licenseURI": "https://creativecommons.org/licenses/by/4.0/"
      },
      "copyrightHolder": "DaSCH",
      "authorship": ["Project XYZ"]
    }
  ],
  "dataManagementPlan": "https://example.com/dmp",
  "collections": ["collection-0001"],
  "records": ["record-0001", "record-0002"],
  "keywords": [{ "en": "Keyword 1", "de": "Stichwort 1" }],
  "disciplines": [{ "en": "Discipline 1", "de": "Disziplin 1" }],
  "temporalCoverage": [{ "en": "2006-2016", "de": "2006-2016" }],
  "spatialCoverage": [
    { "type": "Geonames", "url": "https://www.geonames.org/2658434/", "text": "Switzerland" }
  ],
  "attributions": [
    { "contributor": "person-0001", "contributorType": ["Project leader", "Data curator"] }
  ],
  "abstract": { "en": "Project Abstract", "de": "Projektzusammenfassung" },
  "contactPoint": ["person-0001", "organization-0001"],
  "publications": [{ "text": "Publication Title", "pid": { "url": "https://doi.org/10.1234/5678" } }],
  "funding": [
    { "funders": ["organization-0001"], "number": "123456", "name": "Grant Name", "url": "https://example.com/grant" }
  ],
  "alternativeNames": [{ "en": "Alternative Name", "de": "Alternativer Name" }]
}

Person

{
  "id": "person-0001",
  "givenNames": ["Jane"],
  "familyNames": ["Doe"],
  "jobTitles": ["Senior lecturer"],
  "affiliations": ["organization-0001"],
  "sameAs": [{ "type": "ORCID", "url": "https://orcid.org/0000-0000-0000-0000" }],
  "email": "jane.doe@example.org"
}

Organization

{
  "id": "organization-0001",
  "name": "Université de Lausanne",
  "url": "https://www.unil.ch/",
  "address": {
    "street": "Unicentre",
    "postalCode": "1015",
    "locality": "Lausanne",
    "country": "Switzerland"
  }
}

DPE Project Structure

Workspace Layout

modules/dpe/
├── core/             dpe-core          Pure domain (serde only)
├── telemetry/        dpe-telemetry     Telemetry types and validation (serde only)
├── api-oai/          dpe-api-oai       OAI-PMH 2.0 endpoint
├── web/              dpe-web           Leptos SSR + Datastar fragments
├── server/           dpe-server        Axum binary (composition root)
├── web-e2e-tests/                      Playwright E2E tests
├── public/                             Static assets
└── style/                              Tailwind CSS

Dependency Graph

dpe-core              ← pure domain, no framework deps
  ↑
  ├── dpe-api-oai     ← OAI-PMH endpoint
  ├── dpe-web         ← Leptos SSR pages + components
  └── dpe-server      ← composition root, Datastar fragment handlers
       ↑
       dpe-telemetry  ← telemetry types and validation (serde only)

Crate Responsibilities

dpe-core (core/)

Framework-free domain layer. Contains:

  • Domain types: Project, Record, Person, Organization, Attribution, etc.
  • Repository traits: ProjectRepository, RecordRepository
  • Fs implementations: FsProjectRepository, FsRecordRepository (backed by in-memory caches)
  • Data loading: Project and record caches (OnceLock<Vec<T>>) loaded from JSON on first access
  • Utilities: lang_value(), get_data_dir()

Dependencies: serde, serde_json only.

dpe-api-oai (api-oai/)

OAI-PMH 2.0 Data Provider. Implements the six required verbs (Identify, ListMetadataFormats, ListSets, ListIdentifiers, ListRecords, GetRecord). Usage is documented in OAI-PMH Endpoint.

Depends on dpe-core for domain types — no Leptos or web framework dependency.

dpe-web (web/)

Leptos SSR web layer. Contains:

  • Pages: home, about, project, projects (with filters and pagination)
  • Components: navbar, footer, project cards, tab panels, search input
  • Domain re-exports: domain/mod.rs re-exports dpe-core types for a single import path
  • Server functions: #[server] wrappers around dpe-core functions

dpe-telemetry (telemetry/)

Telemetry types and validation logic. Extracted as a library crate so fuzz targets can test the real code. Contains:

  • Beacon types: BeaconPayload, Signal, WebVitalSignal, ErrorSignal, etc. (serde deserialization for browser beacons)
  • Origin validation: is_allowed_origin() — validates dasch.swiss subdomains
  • URL normalization: normalize_page_url() — cardinality-safe page URL mapping
  • Traceparent validation: is_valid_traceparent() — W3C traceparent format validation

Dependencies: serde only.

dpe-server (server/)

Composition root and Axum binary. Contains:

  • Route wiring: Leptos SSR routes, OAI-PMH handler, Datastar fragment endpoints, /healthz, /telemetry/collect
  • Fragment handlers: fragments.rs — pure Axum handlers that render Leptos components to HTML and return Datastar SSE events
  • Telemetry collector: telemetry_collector.rs — converts browser beacons to OTel metrics and structured logs (uses types from dpe-telemetry)
  • Configuration: config.rs — figment-based layered config (defaults → dpe.tomlDPE_* env vars)
  • Logging: OTel-aware subscriber via init-tracing-opentelemetry

Key Patterns

  • Domain types in dpe-core, not in web or API crates
  • API crates depend on dpe-core only, never on each other or on dpe-web
  • dpe-server contains no business logic — only route composition and fragment rendering
  • Fragment handlers use Owner::new() + view! { ... }.to_html() to render Leptos components from pure Axum handlers

DPE Testing Strategy

The DPE follows a 4-layer testing pyramid, adapted from the Sipi testing strategy. Target distribution: ~50% unit, ~30% E2E, ~15% snapshot, ~5% fuzz.

Testing Pyramid

          ╱╲
         ╱  ╲         Layer 4: Fuzz Testing (nightly CI)
        ╱────╲        cargo-fuzz, corpus persisted
       ╱      ╲
      ╱  E2E   ╲     Layer 3: E2E Tests (Playwright)
     ╱──────────╲     Tab switching, search, accessibility (axe-core)
    ╱            ╲
   ╱  Snapshots   ╲   Layer 2: Snapshot Tests (insta)
  ╱────────────────╲   SSR output, SSE fragments, ARIA attributes
 ╱                  ╲
╱    Unit Tests      ╲ Layer 1: Unit Tests (cargo test)
╱────────────────────╲ Fragment handlers, OAI protocol, domain logic

Layer 1: Unit Tests

  • Location: #[cfg(test)] modules in each crate
  • Runner: cargo test --workspace
  • Scope: Fragment handlers, OAI protocol, domain types, data loading, filtering/pagination
  • Crate: dpe-core tests run independently — cargo test -p dpe-core

Layer 2: Snapshot Tests (insta)

  • Dependency: insta with yaml and filters features
  • Location: Adjacent snapshots/ directories
  • CI: Set INSTA_UPDATE=new so failures produce .snap.new artifacts for review
  • Scope: SSR output, SSE fragment response bodies, ARIA attributes

Layer 3: E2E Tests (Playwright)

  • Location: modules/dpe/web-e2e-tests/
  • Runner: npx playwright test
  • Scope: Tab switching, search autocomplete, scroll preservation, accessibility (axe-core), visual regression
  • Accessibility: Full-page axe-core scans against WCAG 2.1 AA

Layer 4: Fuzz Testing

  • Tool: cargo-fuzz (nightly Rust)
  • Schedule: Nightly CI, 10 minutes per target
  • Targets: Tab name validation, SSE response construction, query parameter parsing
  • Corpus: Persisted between runs

CI Pipeline Budget

Target: ≤ 10 minutes wall-clock per PR.

Parallel job group 1 (~2 min):
  cargo fmt --check
  cargo clippy --all-targets -Dwarnings
  cargo-deny check

Parallel job group 2 (~5 min):
  cargo nextest run --workspace
  cargo leptos build --release
  cargo-llvm-cov (coverage → Codecov)

Parallel job group 3 (~5 min):
  Playwright E2E tests
  axe-core accessibility scans
  Lighthouse CI performance budgets

Testing Conventions

Test naming: Use descriptive names following the test_{what}_{condition}_{expected} pattern. For example: test_parse_project_missing_title_returns_error.

Test locations:

  • Unit tests: In-crate #[cfg(test)] modules or adjacent _tests.rs files
  • Snapshot files: .snap files committed to git in snapshots/ directories
  • E2E tests: web-e2e-tests/ for DPE, playground-e2e-tests/ for Mosaic
  • Fuzz corpus: Persisted in the repository under fuzz/corpus/

Test file naming: {feature}_tests.rs for Rust, {feature}.spec.ts for Playwright.

Snapshot tests: Use the insta crate. Use with_settings! for scrubbing dynamic values (timestamps, IDs). CI runs with INSTA_UPDATE=new so unexpected changes produce .snap.new files for review.

DPE Observability

Developer guide for working with DPE's observability instrumentation.

Overview

DPE uses OpenTelemetry for distributed tracing, metrics, and structured logging. The telemetry pipeline has two halves:

  • Server-side: OTel-native tracing via axum-tracing-opentelemetry middleware. Every HTTP request (except /healthz) produces W3C-compliant spans exported via OTLP.
  • Client-side: A lightweight JavaScript module (telemetry.js) captures Core Web Vitals, JS errors, Long Animation Frames, and navigation timing. Signals are sent via navigator.sendBeacon to a server-side collector (POST /telemetry/collect), which converts them into OTel metrics and structured logs flowing through the same OTLP pipeline.

Trace correlation between server and client uses the W3C traceparent standard: the server renders a <meta name="traceparent"> tag in the HTML shell, and the client includes it in every beacon payload.

Local Observability Stack

Run the Grafana LGTM (Loki, Grafana, Tempo, Mimir) all-in-one container alongside DPE:

# Terminal 1: Start local LGTM stack
just lgtm-up

# Terminal 2: Run DPE with OTel enabled (exports to localhost:4317)
just watch-dpe-otel

# Terminal 3: Generate traffic
curl http://localhost:4000/projects
curl http://localhost:4000/dpe/oai?verb=Identify
curl http://localhost:4000/healthz

watch-dpe-otel sets OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_SERVICE_NAME, OTEL_RESOURCE_ATTRIBUTES, and PYROSCOPE_ENDPOINT for you. Run just --list to see the underlying commands.

Open http://localhost:3000 (no login required):

  • Tempo (Explore → Tempo): traces for /projects and /dpe/oai, none for /healthz
  • Service map: "dpe" service with Rust tech icon
  • Loki (Explore → Loki): OTel log records bridged from the tracing subscriber (severity, span context, structured fields)
  • Mimir (Explore → Mimir): browser telemetry metrics (browser.web_vital, browser.error, etc.)
  • Pyroscope (Explore → Pyroscope): CPU flame graphs for dpe-server

Adding Instrumentation

Use #[tracing::instrument] on new handler and service functions:

#![allow(unused)]
fn main() {
#[tracing::instrument(
    skip_all,
    fields(
        otel.kind = "internal",
        otel.name = "descriptive name",
    )
)]
pub async fn my_handler(/* ... */) -> /* ... */ {
    // ...
}
}
  • Use otel.kind = "internal" on handler-level spans — the OTel middleware (OtelAxumLayer) already creates the SPAN_KIND_SERVER span for the HTTP request.
  • Do not create nested "server" spans — that confuses the trace waterfall.

Client Telemetry

The telemetry.js module (served from /telemetry.js) captures:

  • Core Web Vitals (LCP, INP, CLS, TTFB, FCP) with attribution data
  • JavaScript errors and unhandled promise rejections
  • Datastar SSE errors
  • Long Animation Frames (LoAF, ≥200ms threshold)
  • Navigation timing breakdown

All signals are buffered and flushed via navigator.sendBeacon on visibilitychange (page hide). The server collector converts them to OTel metrics (bounded attributes only) and structured logs (high-cardinality attribution data).

Logging

  • Production (LEPTOS_ENV=PROD): JSON-formatted logs to stdout only. No OTel log export — traces and metrics are exported via OTLP, but logs stay on stdout.
  • Local development (LEPTOS_ENV=DEV with OTEL_EXPORTER_OTLP_ENDPOINT set): Logs go to both stdout and Loki via OTLP. An OpenTelemetryTracingBridge layer converts tracing events into OTel log records, which are batched and exported alongside traces and metrics. Query them in Grafana Explore → Loki.
  • Set RUST_LOG to control log levels. Use RUST_LOG=debug for verbose output.
  • When OTEL_EXPORTER_OTLP_ENDPOINT is not set, the OTel SDK falls back to no-op export — no traces, metrics, or logs are sent, but structured stdout logging still works.

DPE Operations Guide

Operations documentation for the DPE infrastructure team.

Docker Image

  • Base: gcr.io/distroless/static-debian12:nonroot
  • User: uid 65534 (nonroot, built-in to distroless)
  • Shell: None (distroless — no SSH possible)
  • Binary: Static musl-linked dpe (CLI with subcommands)

CLI Commands

The dpe binary provides three subcommands:

CommandDescription
dpe serveStart the web server
dpe validate <data_dir>Validate all data files under the given directory
dpe healthcheck [--url URL]Check if the server is healthy (default: http://localhost:8080/healthz)

dpe validate

Validates JSON data files for structural correctness and cross-reference integrity.

dpe validate ./data

What it checks:

  • JSON schema validity for all data file types (projects, persons, organizations, records, clusters, collections)
  • Cross-references between projects, persons, and organizations
  • Orphaned files that are not referenced by any parent entity
  • Project roles misplaced in a person's jobTitles (e.g. "Project Leader", "Project staff", "Creator"). Such a role belongs in the project's attributions (contributorType), where the OAI-PMH creator/contributor logic can read it. The role vocabulary is JOB_TITLE_ROLE_WORDS in dpe-core.

Exit codes:

  • 0 — all data files are valid
  • 1 — validation errors found (details printed to stderr)

dpe healthcheck

Lightweight probe for Docker HEALTHCHECK or monitoring:

dpe healthcheck                                    # default: http://localhost:8080/healthz
dpe healthcheck --url http://localhost:9090/healthz # custom URL

Ports

PortProtocolPurpose
8080HTTPApplication server

Environment Variables

VariableRequiredDefaultDescription
RUST_LOGNoinfoLog level filter (e.g., dpe_server=info,tower_http=debug)
DPE_DATA_DIRNomodules/dpe/server/dataPath to project/record JSON data files. Legacy alias: DATA_DIR (checked if DPE_DATA_DIR is unset)
DPE_FATHOM_SITE_IDNo(none)Fathom Analytics site ID (not a secret)
DPE_SHOW_PLACEHOLDER_VALUESNofalseShow placeholder values (MISSING, CALCULATED) in the UI, styled in red. Enable on DEV/STAGE for QA visibility.
DPE_OAI_BASE_URLNohttps://repository.dasch.swiss/dpe/oaiPublic base URL emitted as the OAI-PMH baseURL and echoed in <request> elements. Set per environment to match the public endpoint (e.g. https://api.dev.dasch.swiss/dpe/oai on DEV, http://localhost:4000/dpe/oai locally). See OAI-PMH.
OTEL_EXPORTER_OTLP_ENDPOINTNo(none)OTLP gRPC endpoint (e.g., http://alloy:4317). When unset, OTel falls back to no-op export.
OTEL_SERVICE_NAMENo(none)Service name for OTel resource attributes (e.g., dpe)
OTEL_RESOURCE_ATTRIBUTESNo(none)Comma-separated OTel resource attributes (e.g., service.namespace=dpe,service.version=0.2.1,deployment.environment=prod)
PYROSCOPE_ENDPOINTNo(none)Pyroscope HTTP endpoint (e.g., http://pyroscope:4040). When unset, profiling is disabled.
LEPTOS_SITE_ADDRNo0.0.0.0:8080Listen address and port
LEPTOS_SITE_ROOTNositePath to static site assets
LEPTOS_SITE_PKG_DIRNopkgJS/CSS package subdirectory
LEPTOS_OUTPUT_NAMENodpeCSS/JS output filename prefix
LEPTOS_ENVNo(auto)Leptos environment (DEV or PROD). Set automatically by cargo-leptos: watch = DEV, build --release = PROD. Do not set manually.

Health Check

  • Endpoint: GET /healthz
  • Response: 200 OK (no body)
  • Purpose: Lightweight probe for Traefik/load balancers. Does not hit Leptos SSR.

Data Volume

  • Mount point: Value of DPE_DATA_DIR
  • Access: Read-only
  • Contents: Project metadata JSON files, organized by type (projects/, persons/, organizations/, clusters/, collections/, records/)

Resource Requirements

The DPE is lightweight — it serves static data with no database.

  • Memory: ~50-100 MB typical
  • CPU: Minimal (SSR rendering is fast, data is cached in-memory)
  • Disk: Data files + static assets (~50 MB)

Logging

Structured logging via init-tracing-opentelemetry (OTel-aware tracing subscriber). In production (LEPTOS_ENV=PROD), logs are JSON-formatted to stdout only. In local development (LEPTOS_ENV=DEV), logs are additionally exported via OTLP to Loki when OTEL_EXPORTER_OTLP_ENDPOINT is set. Configure levels with RUST_LOG:

# Default (info level)
RUST_LOG=info

# Debug HTTP requests
RUST_LOG=dpe_server=info,tower_http=debug

# Verbose debugging
RUST_LOG=debug

Observability

Fathom Analytics

Privacy-friendly, GDPR-compliant analytics. No cookies, no personal data collected.

Configuration: Set the DPE_FATHOM_SITE_ID environment variable to your Fathom site ID (not a secret). The tracking script is automatically injected into the HTML shell.

What gets tracked:

  • Page views
  • Tab switches (detected automatically via history.replaceState)

Disable: Omit the DPE_FATHOM_SITE_ID environment variable — no tracking script is rendered.

OpenTelemetry

DPE exports traces, metrics, and structured logs via OTLP gRPC. In production, the OTLP endpoint points to Grafana Alloy, which forwards to Grafana Cloud (Tempo for traces, Mimir for metrics, Loki for logs).

When OTEL_EXPORTER_OTLP_ENDPOINT is not set, the OTel SDK falls back to no-op export — the application runs normally without telemetry export. See docs/src/dpe/observability.md for the developer guide.

Continuous Profiling (Pyroscope)

CPU profiling via Grafana Pyroscope. Samples at 100Hz and pushes profiles to the configured endpoint.

Configuration: Set PYROSCOPE_ENDPOINT to the Pyroscope HTTP endpoint. When unset, no profiling agent runs and there is zero overhead.

What gets profiled:

  • CPU time per function (sampling-based, 100 samples/second)
  • Flame graphs viewable in Grafana (Explore > Pyroscope)

JSON API

Read-only JSON endpoints that expose DaSCH research project metadata. They serve the same project data as the DPE pages and the OAI-PMH endpoint, sourced from the in-process project cache that is loaded from the data directory at startup. The handlers live in the dpe-server crate (server/src/fragments.rs).

Endpoints

MethodPathReturns
GET/dpe/api/v2/projectsJSON array of all projects
GET/dpe/api/v2/projects/{shortcode}A single project object
  • Method: GET only.
  • Response: Content-Type: application/json.
  • Authentication: none.
EnvironmentBase URL
Local development (just watch-dpe)http://localhost:4000
DEVhttps://api.dev.dasch.swiss
ProductionNot yet deployed

List all projects

curl "https://api.dev.dasch.swiss/dpe/api/v2/projects"

Returns a JSON array containing every project. The list is not paginated and not filtered — the entire collection is returned in one response. (The HTML listing at /dpe/projects supports search and faceting; this JSON endpoint does not.)

Fetch a single project

The path segment is the project shortcode, not the id field. Matching is case-insensitive, so 0803 and any case variant of an alphanumeric shortcode (e.g. 080c for 080C) resolve to the same project.

curl "https://api.dev.dasch.swiss/dpe/api/v2/projects/0803"
StatusReturned when
200 OKThe shortcode resolves to a project.
400 Bad RequestThe shortcode is not alphanumeric (e.g. contains /, -, _, or other characters). Empty body.
404 Not FoundThe shortcode is well-formed but matches no project. Empty body.

Response format

Both endpoints serialize the project metadata with camelCase keys. The shape mirrors the stored project JSON files; the single-project endpoint returns one object, the list endpoint an array of the same. For the meaning and cardinality of each field, see the Metadata Model (v2); the table below describes only the wire format.

KeyTypeNotes
idstringInternal project id
pidstringPersistent identifier (ARK URL)
namestringDisplay name
shortcodestringProject shortcode (the lookup key for the single-project endpoint)
officialNamestring
statusstring"Ongoing" or "Finished" (PascalCase, unlike the rest of the payload)
shortDescriptionstring
descriptionobjectLanguage code → text
startDate, endDatestringYYYY-MM-DD
urlobject | array | nullRaw stored value — either a structured reference object or a legacy string array; not normalized
secondaryUrlobject | nullSecondary reference (new-format files only)
howToCitestring
accessRightsobjectAccess-rights type and optional embargo date
legalInfoarrayLicense, copyright holder, and authorship entries
dataManagementPlanstring | null
dataPublicationYearstring | null
typeOfDataarray<string> | null
dataLanguagearray<string> | null
clustersarray<string> | nullCluster ids
collectionsarray<string> | nullCollection ids
recordsarray<string> | nullRecord ids
keywordsarray<object>Language maps
disciplinesarray
temporalCoveragearray
spatialCoveragearrayAuthority-file references
attributionsarray
abstractobject | nullLanguage map (the field is named abstract, not abstractText)
contactPointarray<string> | null
publicationsarray | null
fundingobject
alternativeNamesarray<object> | null
documentationMaterialarray<string> | null
provenancestring | null
additionalMaterialarray<string> | null

Notes and limitations

  • Read-only. GET only; there are no create/update/delete operations.
  • No pagination or filtering on the list endpoint — it always returns the full collection.
  • Cached at startup. Data is read from the configured data directory when the server starts; changes to project files require a restart to surface.
  • url is a raw passthrough. Unlike the typed project model used to render pages, the JSON output emits the stored url value verbatim (structured object or legacy array), including any placeholder values.

OAI-PMH Endpoint

Usage guide for the DPE OAI-PMH 2.0 data provider, which exposes DaSCH research project and record metadata for harvesting. The implementation lives in the dpe-api-oai crate (see Project Structure).

Endpoint

  • Path: GET /dpe/oai — GET only; POST requests are not supported
  • Response: always 200 OK with Content-Type: text/xml; charset=utf-8. Protocol errors are reported as OAI <error> elements inside the XML body, not as HTTP error codes.
  • Protocol version: 2.0
EnvironmentBase URL
Local development (just watch-dpe)http://localhost:4000/dpe/oai
DEVhttps://api.dev.dasch.swiss/dpe/oai
Productionhttps://repository.dasch.swiss/dpe/oai

The baseURL advertised by Identify (and echoed in every <request> element) is configured per environment via the DPE_OAI_BASE_URL environment variable, so it matches the URL harvesters actually use (see operations). If unset it defaults to the production endpoint above.

Verbs

All six OAI-PMH 2.0 verbs are implemented:

VerbRequired argumentsOptional arguments
Identify
ListMetadataFormatsidentifier
ListSets
ListIdentifiersmetadataPrefixfrom, until, set
ListRecordsmetadataPrefixfrom, until, set
GetRecordidentifier, metadataPrefix

Arguments outside these lists are rejected with badArgument, with two exceptions that are silently ignored: set on ListSets and metadataPrefix on ListMetadataFormats. A resumptionToken is rejected with badResumptionToken on every verb — see Known Limitations.

Identify

curl "https://api.dev.dasch.swiss/dpe/oai?verb=Identify"

Abbreviated example response (values vary by environment):

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" ...>
  <request verb="Identify">https://repository.dasch.swiss/dpe/oai</request>
  <Identify>
    <repositoryName>DaSCH Service Platform Repository</repositoryName>
    <baseURL>https://repository.dasch.swiss/dpe/oai</baseURL>
    <protocolVersion>2.0</protocolVersion>
    <adminEmail>info@dasch.swiss</adminEmail>
    <earliestDatestamp>2008-06-01</earliestDatestamp>
    <deletedRecord>no</deletedRecord>
    <granularity>YYYY-MM-DD</granularity>
  </Identify>
</OAI-PMH>

ListMetadataFormats

curl "https://api.dev.dasch.swiss/dpe/oai?verb=ListMetadataFormats"

With an identifier argument, the supported formats for that item are returned. Note: identifiers are currently validated against projects only — a record identifier that works with GetRecord returns idDoesNotExist here.

ListSets

curl "https://api.dev.dasch.swiss/dpe/oai?verb=ListSets"

Abbreviated example response — the full response also contains one project:{shortcode} set per project and one cluster:{id} set per cluster:

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" ...>
  <request verb="ListSets">https://repository.dasch.swiss/dpe/oai</request>
  <ListSets>
    <set>
      <setSpec>entityType:ProjectCluster</setSpec>
      <setName>Project Clusters</setName>
    </set>
    <set>
      <setSpec>entityType:ResearchProject</setSpec>
      <setName>Research Projects</setName>
    </set>
    <set>
      <setSpec>project:0803</setSpec>
      <setName>Die Bilderfolgen der Basler Frühdrucke</setName>
    </set>
    <set>
      <setSpec>cluster:cluster-003</setSpec>
      <setName>Bernoulli-Euler Online (BEOL)</setName>
    </set>
    <!-- further project: and cluster: sets omitted -->
  </ListSets>
</OAI-PMH>

The response is a single un-paginated list and always includes the static entityType:* sets, so it is never empty.

Metadata Formats

metadataPrefixSchemaNamespace
oai_dchttp://www.openarchives.org/OAI/2.0/oai_dc.xsdhttp://www.openarchives.org/OAI/2.0/oai_dc/
oai_datacitehttp://schema.datacite.org/oai/oai-1.1/oai.xsdhttp://schema.datacite.org/oai/oai-1.1/

The oai_datacite payload contains DataCite kernel 4 metadata (schemaVersion 4.6, datacentreSymbol DASCH.DSP), following the DaSCH Metadata to DataCite mapping specification. oai_dc is the simpler Dublin Core representation; oai_datacite carries richer structured metadata (contributors, related identifiers, rights, geolocations, funding references).

Temporal coverage

A project's temporalCoverage is emitted as a DataCite date element with dateType="Coverage". The DataCite schema requires a structured W3CDTF value here (a year or a start/end interval, e.g. 1250/1500, with negative years for BCE and .. for an open bound), so the human-readable period name is carried in the dateInformation attribute rather than in the element body:

<date dateType="Coverage" dateInformation="Late Middle Ages">1250/1500</date>

Ranges are resolved offline (no network or LLM calls at request time), in two tiers:

  1. ChronOntology references — entries with a https://chronontology.dainst.org/period/... URL resolve against modules/dpe/server/data/chronontology-periods.json (a slimmed mirror of ChronOntology timespans, regenerated by scripts/fetch-chronontology-periods.py).
  2. Everything else — free-text names resolve against modules/dpe/server/data/temporal-coverage-enrichment.json, a reviewed lookup table built by scripts/build-temporal-coverage-enrichment.py. The tool does not parse names itself; it collects each distinct name and assigns a range one of two ways. A name that carries a ChronOntology URL with a known timespan is resolved from the periods file (source: "chronontology"). Any other name is emitted as a skeleton row (date: null, source: "unresolved") for an operator or LLM agent to fill with a W3CDTF range and re-tag source: "llm". Names that are not time periods at all (e.g. a cultural style like "Swiss") keep date: null and stay source: "unresolved". Re-running is merge-preserving — existing rows are never overwritten, so reviewed rows survive; --check fails if a distinct dataset name has no row yet.

When no range resolves (a date: null row, or a name absent from both sources), the element is emitted with the dateInformation attribute only and an empty body, so the original label is never dropped.

Identifiers

OAI identifiers are derived from DaSCH ARK identifiers:

Item typePatternExample
Research projectoai:dasch.swiss:ark:/72163/1/{shortcode}oai:dasch.swiss:ark:/72163/1/0803
Recordoai:dasch.swiss:ark:/72163/1/{shortcode}/{record_id}oai:dasch.swiss:ark:/72163/1/0803/lklK7rVuVOmpBZYWrF8o=gh

These differ from the resolvable ARK URLs (https://ark.dasch.swiss/ark:/72163/1/...), which appear in the metadata payloads as dc:identifier / DataCite identifier.

Sets

Selective harvesting uses the set argument on ListIdentifiers and ListRecords. Two kinds of sets exist: static entity-type sets and dynamic project/cluster sets.

setSpecContentsNotes
entityType:ResearchProjectAll research project metadata entriesAdvertised by ListSets
entityType:ProjectClusterProject clustersAdvertised but currently empty — always returns noRecordsMatch
entityType:RecordAll record-level metadata entriesAccepted as a filter and stamped on record headers, but not advertised by ListSets; subject to change
project:{shortcode}The Records belonging to one research projectOne set per project, setName = project name. Shortcode matching is case-insensitive.
cluster:{id}All entities under one project cluster: the cluster's research project metadata entries plus all of those projects' RecordsOne set per cluster, setName = cluster name. The {id} is the stable cluster id (e.g. cluster-003).

The hierarchy is Cluster → Projects → Records, and the two dynamic set kinds deliberately differ in breadth: project:{shortcode} is a record-harvesting scope (it does not include the project's own metadata entry), while cluster:{id} is the discovery container for a whole cluster and therefore also surfaces the project metadata entries a harvester needs to navigate. To fetch a single project's metadata entry, use entityType:ResearchProject or the project's cluster:{id} set.

A project belongs to a cluster if the cluster's member list contains the project's shortcode (case-insensitive); records inherit their parent project's cluster membership.

Set membership on headers. Every item header lists its full set membership, so membership can be determined from an item alone: record headers carry entityType:Record, their project:{shortcode}, and a cluster:{id} for each cluster of the parent project; project headers carry entityType:ResearchProject, their own project:{shortcode}, and their cluster:{id} sets.

Set validation. An unrecognised set value — bad prefix, empty value, or a project:/cluster: value matching no known project or cluster — is rejected with badArgument. A recognised set that matches zero items (e.g. a known project with no records yet, possibly after date filtering) returns noRecordsMatch.

Harvesting

Full harvest of all items:

curl "https://api.dev.dasch.swiss/dpe/oai?verb=ListRecords&metadataPrefix=oai_dc"

Selective harvest of research projects only:

curl "https://api.dev.dasch.swiss/dpe/oai?verb=ListRecords&metadataPrefix=oai_datacite&set=entityType:ResearchProject"

Selective harvest of one project's records, or of everything under one cluster:

# All records of project 0803
curl "https://api.dev.dasch.swiss/dpe/oai?verb=ListRecords&metadataPrefix=oai_dc&set=project:0803"

# All project entries and records under cluster cluster-003
curl "https://api.dev.dasch.swiss/dpe/oai?verb=ListRecords&metadataPrefix=oai_dc&set=cluster:cluster-003"

Headers only (no metadata payloads):

curl "https://api.dev.dasch.swiss/dpe/oai?verb=ListIdentifiers&metadataPrefix=oai_dc"

List responses are returned complete and unpaged — there is no resumptionToken element, and its absence means the response is complete, not truncated.

Fetching a single item

# A research project
curl "https://api.dev.dasch.swiss/dpe/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:dasch.swiss:ark:/72163/1/0803"

# A record within a project
curl "https://api.dev.dasch.swiss/dpe/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:dasch.swiss:ark:/72163/1/0803/lklK7rVuVOmpBZYWrF8o=gh"

Abbreviated example response:

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" ...>
  <request verb="GetRecord" identifier="oai:dasch.swiss:ark:/72163/1/0803" metadataPrefix="oai_dc">https://repository.dasch.swiss/dpe/oai</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:dasch.swiss:ark:/72163/1/0803</identifier>
        <datestamp>2008-06-01</datestamp>
        <setSpec>entityType:ResearchProject</setSpec>
        <setSpec>project:0803</setSpec>
        <!-- plus a cluster:{id} setSpec for each cluster the project belongs to -->
      </header>
      <metadata>
        <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" ...>
          <dc:title>Die Bilderfolgen der Basler Frühdrucke</dc:title>
          <dc:publisher>DaSCH</dc:publisher>
          <dc:date>2008-06-01</dc:date>
          <dc:type>Project</dc:type>
          <dc:identifier>https://ark.dasch.swiss/ark:/72163/1/0803</dc:identifier>
          <!-- further Dublin Core elements omitted -->
        </oai_dc:dc>
      </metadata>
    </record>
  </GetRecord>
</OAI-PMH>

Datestamps and Date Filtering

ListIdentifiers and ListRecords accept from and until arguments. Both bounds are inclusive. Before relying on them, understand what the datestamps mean:

  • Project datestamp: the project's research start date (fallback 2015-01-01) — not the date the metadata was created or last modified.
  • Record datestamp: the record's dateModified, falling back to datePublished, then dateCreated, then 2015-01-01. Record datestamps may carry a full timestamp (e.g. 2012-06-19T14:33:33Z) even though Identify advertises YYYY-MM-DD granularity.

Filtering compares from/until against datestamps as plain strings (lexicographic comparison):

  • Use YYYY-MM-DD values. Other formats are not rejected with an error but compare incorrectly.
  • Because record datestamps may include a time component, until=2012-06-19 excludes a record stamped 2012-06-19T14:33:33Z. To include a full day, filter with until set to the following day.
curl "https://api.dev.dasch.swiss/dpe/oai?verb=ListIdentifiers&metadataPrefix=oai_dc&from=2010-01-01&until=2020-12-31"

Incremental harvesting is not reliable with these semantics. Project datestamps reflect research start dates, so new or updated project metadata does not surface through from-based incremental harvests, and deletions are not tracked (deletedRecord: no). Harvesters should periodically re-harvest the full repository instead of relying on date-based increments.

Errors

Errors are returned as OAI <error> elements with HTTP status 200:

CodeReturned when
badVerbThe verb argument is missing or not one of the six verbs. The <request> element omits the verb attribute in this case.
badArgumentAn argument is missing, repeated, or not allowed for the verb (e.g. set on GetRecord); also returned for an unrecognised set value (bad prefix, empty value, or unknown project/cluster).
badResumptionTokenA resumptionToken argument is supplied (resumption tokens are not supported).
cannotDisseminateFormatmetadataPrefix is not oai_dc or oai_datacite.
idDoesNotExistThe identifier does not resolve — including malformed identifiers (wrong prefix, bare shortcode), which are not reported as badArgument.
noRecordsMatchA list request matches nothing: empty result, a recognised set with no items, or a from/until window with no matches. An empty list is never returned.

Known Limitations

  • No resumption tokens. List responses are complete and unpaged; any resumptionToken argument yields badResumptionToken.
  • baseURL is a fixed configured value. It is set per environment via DPE_OAI_BASE_URL rather than derived from each incoming request. Deploying behind a hostname that does not match the configured value will make the advertised baseURL disagree with the request URL, which OAI validators flag — keep the env var in sync with the public endpoint.
  • No deleted-record tracking (deletedRecord: no). Items that disappear are not announced; see the re-harvesting recommendation above.
  • GET only. OAI-PMH 2.0 also requires POST; harvesters that default to POST will not get an OAI response.
  • ListMetadataFormats with a record identifier returns idDoesNotExist, even for records that GetRecord resolves.
  • entityType:ProjectCluster is advertised by ListSets but contains no items yet.

Discovery, Design and Development Process

Any work on the DSP Repository is done in collaboration between Management, Product Management (PM), the Research Data Unit (RDU) and Development (Dev). The process should look as outlined below, but may be adjusted to fit the needs of the project and the team.

In discovery, PM validates that an opportunity is aligned with DaSCH's strategy. In collaboration with the RDU, PM verifies that the opportunity can provide a desirable outcome.

PM will create a project description, including low-fidelity wireframes (Balsamiq or pen and paper), based on which they define user flows and journeys.

If any design components are needed, these will be added to the design system.

Finally, high-fidelity wireframes will be created in Figma, if needed.

Based on the project description and the wireframes, Dev will refine the project description, create Linear tickets and implement it accordingly.

When the implementation is done, PM will verify that the outcome was achieved and identify opportunities for further improvements.

Tech Stack

Core

TechnologyPurpose
Rust (Edition 2021)Primary development language
AxumHTTP web framework
LeptosReactive UI framework (SSR for DPE, islands for Mosaic)
DatastarSSE-based interactivity for DPE (~14KB JS, no WASM)
Tailwind CSS v4Utility-first CSS framework
DaisyUITailwind component plugin
TokioAsync runtime
figmentLayered configuration (defaults → TOML → env vars)

Data & Persistence

TechnologyPurpose
serde / serde_jsonSerialization and deserialization
Static JSON filesCurrent data storage (database TBD)

Testing & Quality

TechnologyPurpose
cargo test / nextestRust test runner
instaSnapshot testing for SSR output
PlaywrightEnd-to-end browser tests
axe-coreAccessibility scanning (WCAG 2.1 AA)
cargo-fuzzFuzz testing (nightly CI)

Build & Development

TechnologyPurpose
cargo-leptosLeptos build tool (handles Tailwind, WASM, site assets)
justCommand runner for development workflows
leptosfmtLeptos-aware code formatter
BiomeLinter/formatter for E2E test TypeScript

Documentation & Observability

TechnologyPurpose
mdBook + mdbook-alertsProject documentation
Fathom AnalyticsPrivacy-friendly web analytics (GDPR-compliant, no cookies)
tracing + tracing-subscriberStructured logging

Architecture Principles

We keep the design evolutionary, starting from the simplest possible solution and iterating on it. At first, providing data from static JSON files is sufficient. Following clean architecture principles, swapping out the persistence layer is easy.

TypeScript is used exclusively for testing and development tooling, not for production runtime code. The core application remains purely Rust-based.

Testing and Quality Assurance

We follow the Testing Pyramid approach to testing, the majority of tests are unit tests, with a smaller number of integration tests, and a few end-to-end tests.

Unit and integration tests are written in Rust, end-to-end tests are written either in Rust or in JavaScript using Playwright.

Design System Testing

The design system playground includes comprehensive testing infrastructure:

Interactive Testing (MCP):

  • Start playground server: just watch-mosaic-playground
  • Use Claude Code with Playwright MCP commands for visual verification
  • Commands whitelisted in .claude/settings.json
  • Best for: Component development, design verification, manual testing

Automated Testing (CI/CD):

  • TypeScript-based Playwright setup in modules/mosaic/playground-e2e-tests/
  • Functional, accessibility, and responsive design testing in CI
  • HTML + JSON reporters for CI/CD integration
  • Best for: End-to-end user flows, automated regression detection

For single component interactions, prefer Rust tests. Playwright is for complete user flows.

Unit tests are the foundation of our testing strategy. They test individual components in isolation, ensuring that each part of the codebase behaves as expected. Unit tests are fast to write and to execute, and they provide immediate feedback on the correctness of the code.

Integration tests verify the interaction between different components, ensuring that they work together as expected. Integration tests may check the integration between the business logic and the presentation layer, or between the view and the business logic.

End-to-end tests verify the entire system. They simulate real user interactions and check that the system behaves as expected.

Additional to the functional tests, we also need to implement performance tests.

We aim to follow the practice of Test Driven Development (TDD), where tests are written before the code is implemented. This helps to ensure that the code is testable and meets the requirements.

Code Review Guidelines

Review checklist for the DSP Repository. Organized by priority.

Always Check

Fragment Endpoints

  • New fragment endpoints follow resource-action nesting convention (see DPE Architecture)
  • New Datastar interactions have <a href> fallback for graceful degradation
  • ARIA semantics present on interactive components (role, aria-selected, aria-controls)

Testing

  • insta snapshots added/updated for changed SSR output
  • E2E test covers the user-facing behavior
  • axe-core scan passes on affected pages
  • Unit tests for fragment handler edge cases (invalid tab, missing project, etc.)

Architecture

  • New API crates follow the dpe-api-{name} pattern with dpe-core as only domain dependency (see Project Structure)
  • dpe-core has no framework dependencies (no leptos, no axum)
  • Validate command covers all data file types (DPE)
  • E2E test directory naming: web-e2e-tests/ for DPE, playground-e2e-tests/ for Mosaic

CLI

  • CLI subcommands are documented in help text

Documentation

Commits

  • Commits follow conventional commits (correct prefix, scope matches crate name) — see Workflows and Conventions
  • One topic per commit — apply the "and" test
  • Each commit builds and passes tests

Security

  • No secrets in config files, Cargo.toml, or git
  • Path parameters validated before filesystem access

Style

  • Follow existing Datastar attribute patterns (signal naming with _ prefix) — see DPE Architecture
  • Fragment handlers in fragments/ module, not inline in main.rs
  • Domain types belong in dpe-core, not in web or API crates
  • API crate exposes a handler function (e.g., pub async fn oai_handler(...)) for composition in dpe-server
  • Leptos components use view! macro consistently
  • Test files follow naming convention: {feature}_tests.rs for Rust, {feature}.spec.ts for Playwright

Skip

  • Snapshot .snap file contents — verify accepted, don't review formatting
  • Formatting-only changes (cargo fmt / leptosfmt diffs)
  • Cargo.lock changes from dependency updates

Onboarding

Rust

The main technology we use is Rust. A solid understanding of Rust is needed, though particularly the frontend work does not require deep knowledge of Rust.

Rust HTTP Server

We use Axum as our HTTP server.

Serialization and Deserialization

We use serde for serialization and deserialization of data.

Web UI

We use Leptos as our UI framework for building reactive web applications in Rust.

Leptos is a full-stack web framework that allows writing both server and client code in Rust. It provides reactive primitives and a component model similar to modern JavaScript frameworks.

Key features:

  • The islands Cargo feature is enabled workspace-wide (Leptos 0.8 build requirement)
  • Only the Mosaic component library uses actual island components with client-side WASM hydration
  • DPE uses SSR-only with Datastar for interactivity — no client-side WASM
  • The architecture follows the MPA paradigm, a "multi-page app"
  • Server-side rendering
  • Fine-grained reactivity
  • Component-based architecture
  • Full Rust syntax support

Architectural Design Patterns

We follow concepts such as Clean Architecture (there is also a book), Hexagonal Architecture or Onion Architecture. Familiarity with these concepts will be helpful.

Some of the patterns must be adapted to the idioms of Rust, but the general principles are the same.

Testing

We follow the Testing Pyramid approach to testing, the majority of tests are unit tests, with a smaller number of integration tests, and a few end-to-end tests.

Unit and integration tests are written in Rust, following the Rust testing best practices. End-to-end tests can be written using Playwright. Leptos has some built-in support for Playwright.

Domain Driven Design

We do not follow strict Domain Driven Design (DDD) principles, but we try to follow some of the concepts. In particular, we try to keep the language used in code aligned with the domain language.

Test Driven Development

We should absolutely do TDD and BDD.

Database

We are still evaluating the database to use.

For the initial development, we work with static content or JSON files.

Mosaic Component Library

The Mosaic component library provides reusable UI components built with Leptos and Tailwind CSS.

Components are defined in modules/mosaic/tiles/ and can be previewed in the playground application at modules/mosaic/playground/.

To run the playground locally:

just watch-mosaic-playground

Pull requests that modify files in modules/mosaic/ automatically receive a Cloud Run preview deployment. The preview URL is posted as a comment on the PR.