Dataset Monitor
The Dataset Monitor tracks changes to NFS-backed datasets with file-level granularity — similar to git log --follow for data files.
How It Works
Every 5 minutes (configurable):
For each Dataset:
1. Walk NFS path → build FileTree{path → {size, mtime, sha256}}
2. Compare with previous snapshot
3. Record added/modified/deleted events
4. Bump dataset.version if changes found
5. Write new snapshot record
Checksum Strategy
To avoid SHA256-hashing every large file on every scan, the monitor uses a two-phase approach:
- Fast check — compare
mtime + size. If unchanged, skip. - Slow verify — if
mtimechanged, compute SHA256 to confirm actual content change.
Manual Trigger
curl -X POST https://example.com/api/datasets/{id}/scan \
-H "Authorization: Bearer $TOKEN"
History Retention
File history records accumulate over time. To prevent unbounded growth, configure MAX_HISTORY_DAYS (default: no limit). Pruning runs automatically when new scans are written.
UI: Single-File Timeline
The Datasets page provides a split-pane view:
- Left: file tree of the current snapshot
- Right: per-file change timeline showing
added/modified/deletedevents with checksum diffs and sizes
This gives a git log-style audit trail for any file in any dataset.