Dataset Monitor

The Dataset Monitor tracks changes to NFS-backed datasets with file-level granularity — similar to git log --follow for data files.

How It Works

Every 5 minutes (configurable):
    For each Dataset:
        1. Walk NFS path → build FileTree{path → {size, mtime, sha256}}
        2. Compare with previous snapshot
        3. Record added/modified/deleted events
        4. Bump dataset.version if changes found
        5. Write new snapshot record

Checksum Strategy

To avoid SHA256-hashing every large file on every scan, the monitor uses a two-phase approach:

Fast check — compare mtime + size. If unchanged, skip.
Slow verify — if mtime changed, compute SHA256 to confirm actual content change.

Manual Trigger

curl -X POST https://example.com/api/datasets/{id}/scan \
  -H "Authorization: Bearer $TOKEN"

History Retention

File history records accumulate over time. To prevent unbounded growth, configure MAX_HISTORY_DAYS (default: no limit). Pruning runs automatically when new scans are written.

UI: Single-File Timeline

The Datasets page provides a split-pane view:

Left: file tree of the current snapshot
Right: per-file change timeline showing added / modified / deleted events with checksum diffs and sizes

This gives a git log-style audit trail for any file in any dataset.

How It Works​

Checksum Strategy​

Manual Trigger​

History Retention​

UI: Single-File Timeline​

How It Works

Checksum Strategy

Manual Trigger

History Retention

UI: Single-File Timeline