Digital sovereignty for a fragmented life

Personal knowledge management has always been a vision for me since university times where the amount of digital information grew and the tools to handle and interconnect them often felt unsatisfying.

As a violin and music teacher, orchestral musician, software developer, and member of several community groups, I generate data on many devices and clouds across all of those roles.

For over a decade I have slowly been building the infrastructure that lets me actually keep track of it. semanticdesk.top is that work, made public: a personal semantic data fabric, open source and self-hosted, so the answer to “where is my data?” is mine to give.

See how it works Why this exists Explore on GitHub

Context

Fragmented tools, one life

Is this photo folder already backed up, or only on this SD card?
Which orchestra score is the canonical version, and who has access to it?
What was I working on last Tuesday at the music school?
Show me the most relevant ressources for construction of the summer kitchen in the community garden

These questions go unanswered today because each tool only sees the data it owns.

Architecture

A personal semantic fabric

Click any box for more information

With JavaScript off, the stacked list is still readable. On wide screens, an SVG image is the fallback.Interactive React Flow view requires JavaScript. Static image version:

Layer 0 — Data sources

Files
NAS, PC, SD, servers
Exists
Filesystems across workstation, laptop, SD cards, NAS, and long-lived servers. The fast scan pass records path, content hash, and MIME; deep plugins add richer meaning later.
Repositories
- graviola-semantic-file-analyzer
Prior art
- Strigi (Nepomuk era)
- Baloo (post-semantic KDE)
PIM
Mail, cal, tasks
Planned
Thunderbird mail, calendar, and tasks across many mailboxes and calendar sources, translated into a unified PIM view in the graph. Adapter work connects Gloda and sources to the RDF layer.
Prior art
- Akonadi (KDE PIM unification, non-RDF)
- NIE/NFO contact and calendar terms
Browser
Bookmarks, history, downloads
Planned
Browser plugin path for bookmarks, history, and downloads so web-origin artefacts can be correlated with file and PIM data in the same index.
Prior art
- Solid (WebID, user-controlled data)
- W3C Web Annotation
Messengers
Bridge → unified
Planned
Chat channels bridged (for example via Matrix) into a single normalized representation, so conversation artefacts participate in the same fabric as files and PIM.
Prior art
- XMPP/IRC bridging patterns
Context
eBPF, GPS, time
Experimental
Raw file-access events, coarse location, and timestamps: the stream that makes AccessEvent and context reconstruction possible, without pretending every event belongs in the triple store verbatim.
Repositories
- graviola-semantic-file-analyzer
Prior art
- eBPF observability
- Plasma Activities

Layer 1 — Ingestion pipeline

Stage 1: fast scan
Path, hash, MIME
Exists
First stage indexing optimised for speed: paths, content hashes, and MIME types so the system always knows what exists and where, before any expensive analyzer work.
Repositories
- graviola-semantic-file-analyzer
Prior art
- Strigi two-stage split
- This project’s 2014-era pipeline
Stage 2: deep plugins
EXIF, NLP, git, …
Exists
Second stage: pluggable extractors for EXIF, perceptual fingerprinting, NLP, PDF content identity, git detection, and other depth-first signals, composed without hard-wiring a single stack.
Prior art
- Strigi’s plugin model
eBPF adapter
Events → AccessEvent
Experimental
Monitors file activity and process calls in selected directories using eBPF, producing the raw stream from which AccessEvent triples are derived. The prototype exists; throttling, aggregation, and RDF emission are the current development focus.
Repositories
- desktop-activity-watcher
Prior art
- Linux tracepoints
- eBPF BCC/CO-RE community
PIM / cloud adapter
Gloda, NC API → RDF
Experimental
Maps Thunderbird Gloda, contacts, and calendar, plus where applicable Nextcloud-style APIs, into the vocabulary layer so PIM and files share a joined-up query surface.
Prior art
- NIF/NCO/NCAL usage in desktop stacks
- NEPOMUK-era mappings

Layer 2 — Semantic core

Triple store
SPARQL 1.1/1.2 · federated
Exists
The semantic spine of the system. Any SPARQL 1.1–compliant endpoint. In production: read-optimised and federated patterns are composed as a deployment choice; the file indexer outputs Turtle or SPARQL Update. The specific triple store product is not an architectural lock-in.
Repositories
- graviola-framework (sparql backend)
- graviola-semantic-file-analyzer
Prior art
- Nepomuk / Soprano + legacy triple stores (2007–2013)
- GNOME Tracker / TinySPARQL (2008–present)
- Solid / Comunica (federation on the web)
Full-text index
Unified facade
Exists
Pluggable full-text over file bodies, email, and long-form metadata, driven from ontology flags so the index stays in sync with what matters semantically, without hand-maintained per-field config.
Repositories
- graviola-framework
Prior art
- Xapian in Baloo
- ES/Solr as commodity FTS
Vector index
IRI-aligned
Experimental
Vector similarity keyed to the same entity IRIs as the graph so hybrid recall can combine structure from SPARQL with neighbourhood similarity, without a second ID scheme.
Prior art
- GraphRAG patterns
- Vector stores as pluggable retrievers
GraphRAG
SPARQL + vectors → model
Experimental
Hybrid retrieval: a bounded subgraph from SPARQL with vector similarity to ground answers in the graph you own, for assistants that are allowed to be wrong only in prose, not in identity.
Prior art
- Retrieval augmented generation (community patterns)
MCP (SPARQL tools)
For LLM clients
Planned
Exposes a constrained SPARQL-shaped tool surface to MCP-compatible clients so local models can query your fabric with the same contract as your UI, not a bespoke API per app.
Prior art
- Model context protocol (MCP)
- Comunica as prior art in Linked Data clients

Layer 3 — Integration and presentation

KDE Plasma
KRunner, input
Planned
Desktop integration: KRunner to query the index from the system search bar, and input-method adjacent hooks where voice or special input can feed the same graph-backed actions.
Prior art
- KRunner runners
- Plasma Activities (context)
PKM plugins
Obsidian, Logseq, …
Planned
Note tools query the same semantic index so personal knowledge and indexed artefacts stay aligned instead of forking into yet another silo.
Prior art
- Local-first PKM
- RDF/JSON-LD round trips in the wild
Graviola UI
SemanticTable, forms
Exists
JSON Schema–driven forms and tables over SPARQL: the CRUD and exploration surface that makes the store legible, including fit testing between schema and data at deploy time, not only compile time.
Repositories
- graviola-framework
Prior art
- CRUD on RDF in research prototypes
- LinkML for schema-first data
Voice assistant
Vox, MCP, local LLM
Experimental
Speech stack and MCP client so voice-driven queries can use the same tools as the desktop UI, against a local or self-hosted model boundary you control.
Repositories
- nix-vox (fork)
Prior art
- Whisper-family STT
- MCP for tool use
Mobile / remote
Android + API
Experimental
Android-side indexing where the platform only allows polling, plus a small REST API so other devices can query the fabric without mounting every disk on every machine.
Repositories
- wnix/packages (integration)
Prior art
- Nextcloud client patterns
- REST facades on SPARQL in enterprise search

Loading diagram…

Lineage

Semantic history

This problem has been attempted before. Here is what happened and what was learned — including the TU Dresden / NubiSave / NubiVis / nubixtract thread that still runs in production code.

1999
Semantic Web vision — W3C / Tim Berners-Lee
RDF, OWL, SPARQL proposed as the foundation for machine-readable linked data
Outcome: survived
The foundation everything else is built on. Still the correct abstraction.
2004
NEPOMUK project — EU FP6 / multiple universities
Networked Environment for Personalized, Ontology-based Management of Unified Knowledge — a full semantic desktop layer for enterprise and personal use
Outcome: transformed
Produced the NIE/NFO/NCO/NMM ontology suite still used by TinySPARQL today. The vision was right; the implementation weight was too high for volunteer maintenance.
2007
Nepomuk-KDE — KDE
Full integration of Nepomuk semantic layer into KDE 4 — RDF-backed file metadata, semantic tags, relationship graphs, Virtuoso triple store
Outcome: retired
Retired in KDE SC 4.13 (2014). Resource consumption was too high for typical hardware of the era. Replaced by Baloo.
2007
Strigi indexer — KDE / Jos van den Oever
The file crawler and metadata extractor for Nepomuk-KDE. Plugin architecture for extracting metadata from files of different types.
Outcome: retired
Directly ancestral to the two-stage plugin indexer architecture in this project. The right shape; replaced by Baloo's simpler approach.
2008
GNOME Tracker — GNOME
SPARQL-backed file indexer for GNOME. Used the Nepomuk ontologies (NIE/NFO). D-Bus exposed SPARQL endpoint.
Outcome: survived
Still active as TinySPARQL (renamed 2024). The most direct living ancestor of the semantic desktop vision.
2012
Akonadi — KDE
PIM data storage framework for KDE — unified storage for email, contacts, calendar, notes via pluggable resource agents
Outcome: survived
Still the KDE PIM backend. Not RDF internally, but architecturally the closest thing to a unified PIM datastore on Linux.
2013 · Spillner et al., FGCS 29 (2013) 1062–1072
NubiSave — optimal cloud storage controller — TU Dresden / Spillner, Müller, Schill
RAID-like dispersion across cloud providers with a cloud storage ontology (WSML). First semantic modelling of storage provider properties in a personal storage context.
Outcome: published
Published in Future Generation Computer Systems. The cloud storage ontology is a direct conceptual ancestor of the Realm concept. Sebastian Tilsch worked in this lab.
View PDF
2014 · Spillner, Tilsch, Schill, MobiQuitous 2014
NubiVis — personal cloud file explorer — TU Dresden / Spillner, Tilsch, Schill
Web-based file manager integrating NubiSave (cloud distribution) and Strigi (semantic metadata) to answer 'Where is my data?' Map, timeline, tree, and distribution views.
Outcome: published
Published at MobiQuitous 2014. Co-authored by the project owner. The inadequacy of Strigi as a metadata backend motivated building nubixtract as a replacement.
View PDF
2014
Baloo — KDE
Replaced Nepomuk-KDE as KDE's file indexer. SQLite + Xapian full-text, deliberately NOT semantic — learned from Nepomuk's complexity.
Outcome: survived
The pragmatic retreat from semantics. Fast, reliable, but cannot answer 'which files did I work on during the Orchesterprobe last month'.
2014
nubixtract — first commit — this project
RDF-native replacement for Strigi. Plugin architecture, pluggable triple-store connector, SPARQL query interface, WGS84 geo, PDF via Grobid, image classification, Android cross-compilation. First commit 4 December 2014 (TU Dresden).
Outcome: active
It started out of curiosity and for educational purpose on how to build a robust highly extensible file indexer and metadata extractor as an alternative to strigi, which often crashed or blocked resources when running on my system
2016
Solid project — MIT / Tim Berners-Lee
Personal Online Datastore — user-controlled RDF pods, linked data, decentralised identity (WebID)
Outcome: active
The web-oriented answer to personal data sovereignty. Strong on federation and access control; weaker on local/offline and desktop integration.
2022
graviola-crud-framework — this project
JSON Schema → SPARQL/RDF CRUD framework with auto-generated forms and tables. Built for a university library semantic data project.
Outcome: active
The UI layer that makes the semantic index navigable. SemanticTable, GenericForm, multiple store backends.
2024
Samsung Personal Data Engine — Samsung / Oxford Semantic Technologies
On-device RDFox-powered personal knowledge graph on Galaxy S25. Commercial validation of the personal RDF concept.
Outcome: active
The first major consumer deployment of personal RDF. Closed source. Confirms the thesis; does not address the open, cross-device, self-hosted case.
2024
graviola-semantic-file-analyzer — this project
Second-generation file indexer with RDF output, eBPF context tracking, location correlation, plugin pipeline for deep metadata extraction
Outcome: experimental
The current active development frontier. Exists, runs, needs stabilisation and documentation.
2025
semanticdesk.top (this project, named) — this project
The full vision named and made public: personal semantic data fabric for digital serenity across all devices, domains, and data sources
Outcome: active
You are here.

Vocabulary

Core concepts

Plain language first — the formal definitions are a click away, like NFO/NIE terms you already know, but in this project’s own namespace over time.

Artifact

A piece of content with a stable identity regardless of where it lives.

Example: The PDF of “Träumerei” by Schumann is one Artifact. It might exist on three Nextclouds, an SD card, and a NAS — but it is one piece of content, with one identity.

Formal definition

A named individual in the semanticdesk.top ontology with one or more content hashes (SHA-256, perceptual hash, content-only hash), a MIME type, and semantic type annotations.

Manifestation

A specific occurrence of an Artifact at a particular location on a particular device.

Example: The copy of Träumerei.pdf on the Musikschule Nextcloud in the folder “/Scores/Piano/Schumann/” is one Manifestation. The copy on the personal NAS is another.

Formal definition

Links an Artifact to a filesystem path, a device IRI, a mount point at index time, and a Realm (sharing/access context).

Realm

A sharing and access domain — who can reach which files through which system.

Example: The school Nextcloud is a Realm. It has team folders; only members of the “music teachers” team folder can see certain scores. A colleague is either in that Realm or not. But also your home folder on your laptop is a Realm, your home folder on your NAS is a Realm.

Formal definition

A named individual representing a Nextcloud instance, a team folder, a local device, or any other access boundary. Manifestations in a Realm inherit its access rules.

AccessEvent

A recorded moment when a process touched a file, on a specific device, at a specific time.

Example: MuseScore opened Träumerei.pdf at 19:47 on a Tuesday, on the workstation, while the calendar showed “Orchesterprobe”. That is an AccessEvent.

Formal definition

Captured via eBPF kernel tracing. Triples: process IRI, file Manifestation IRI, timestamp, device IRI. Correlated with location and calendar data to derive Context.

Context / Domain

A named period of activity inferred from co-occurring signals.

Example: “Tuesday evening + location: Proberaum + calendar: Orchesterprobe + processes: MuseScore, Thunderbird” coheres into a Context. Files touched in that Context are likely orchestra-related.

Formal definition

Derived by statistical analysis of AccessEvent clusters correlated with calendar events, GPS location clusters, Plasma Activities, and time patterns. Assigned to a Domain (one of the owner’s life roles: violin teacher, school teacher, programmer, community gardener).

Reality

What already exists

“Exists” is production on the author’s own machines; “experimental” is honest about rough edges; “planned” is designed but not shipped.

nubixtract (file indexer, C++)
Since 2014
C++ file indexer with RDF output, plugin architecture, multi-backend triple store support. Started 2014 as Strigi replacement.
Repository
Semantic file analyzer
RDF output, eBPF context, location correlation
Experimental
Repository
Graviola Framework
JSON Schema → SPARQL/RDF, SemanticTable, GenericForm, multiple store backends. Concept book: gravio-la.github.io/graviola-concept-documentation
Exists
Repository
nix-vox (Vox fork)
Rust STT/TTS/VAD pipeline, NixOS flake with CUDA support
Experimental
Repository
wnix/packages
Personal NixOS package collection integrating the above
Experimental
Repository
Core ontology (semanticdesk.top)
Published LinkML schema for Artifact, Manifestation, Realm, AccessEvent, Context, Hash. Vocabulary IRI: http://semanticdesk.top/ontology#
Exists
Repository
eBPF → RDF adapter
Monitors file activity and process calls in chosen directories via eBPF. Prototype exists; AccessEvent triple emission is the current development focus.
Experimental
Repository
NixOS SPARQL service module
Declarative NixOS module for the personal SPARQL endpoint (triple store choice is configurable in deployment)
Experimental
(no public repo yet)
restricted MCP server for personal graph
Exposes parts of the personal graph as MCP tools for LLM assistants after user consent and with limited scope
Planned
(no public repo yet)
KRunner plugin
Query the semantic index from KDE’s universal search bar
Planned
(no public repo yet)
Thunderbird RDF adapter
Translates Gloda + contacts + calendar into NMO/NCO/NCAL triples
Experimental
(no public repo yet)
Android indexer
Polls new files on Android (polling due to platform restrictions)
Experimental
Repository

Sustain

Support the work

This is built by one person in the gaps between teaching, rehearsals, and client work. The code is real and has been running in production for over a decade — making it properly visible, documented, and installable by others is what sponsorship buys back.

If the vision resonates with you, even a small recurring amount makes a concrete difference.

Sponsor on GitHub

Digital sovereignty for a fragmented life

Fragmented tools, one life

A personal semantic fabric

Layer 0 — Data sources

Layer 1 — Ingestion pipeline

Layer 2 — Semantic core

Layer 3 — Integration and presentation

Semantic history

Semantic Web vision — W3C / Tim Berners-Lee

NEPOMUK project — EU FP6 / multiple universities

Nepomuk-KDE — KDE

Strigi indexer — KDE / Jos van den Oever

GNOME Tracker — GNOME

Akonadi — KDE

NubiSave — optimal cloud storage controller — TU Dresden / Spillner, Müller, Schill

NubiVis — personal cloud file explorer — TU Dresden / Spillner, Tilsch, Schill

Baloo — KDE

nubixtract — first commit — this project

Solid project — MIT / Tim Berners-Lee

graviola-crud-framework — this project

Samsung Personal Data Engine — Samsung / Oxford Semantic Technologies

graviola-semantic-file-analyzer — this project

semanticdesk.top (this project, named) — this project

Core concepts

Artifact

Manifestation

Realm

AccessEvent

Context / Domain

What already exists

Support the work