Skip to content

Metrics Reference

PrkDB exports Prometheus metrics from the shared registry in crates/prkdb/src/prometheus_metrics.rs.

Where Metrics Are Exposed

  • prkdb-server: http://<host>:9090 + NODE_ID/metrics
  • prkdb-cli serve --prometheus: http://<host>:<port>/metrics

The HTTP server now re-exports the real Prometheus registry instead of placeholder counters.

Core Health Metrics

MetricTypeDescription
prkdb_upGaugeServer liveness by node_id
prkdb_collections_activeGaugeNumber of active collections
prkdb_collection_size_bytesGaugeCollection size by node_id and collection

Raft Metrics

MetricTypeDescription
prkdb_raft_stateGaugeCurrent Raft state by node_id and partition
prkdb_raft_termGaugeCurrent term
prkdb_raft_commit_indexGaugeCommit index
prkdb_raft_leader_elections_totalCounterTotal leader elections
prkdb_raft_heartbeats_sent_totalCounterHeartbeats sent to peers
prkdb_raft_heartbeats_failed_totalCounterFailed heartbeats
prkdb_raft_append_entries_totalCounterAppendEntries RPCs by result
prkdb_raft_snapshot_indexGaugeLast snapshot index

Snapshot Metrics

MetricTypeDescription
prkdb_snapshots_created_totalCounterSnapshots created
prkdb_snapshot_creation_duration_secondsHistogramSnapshot creation latency

Operation Metrics

MetricTypeDescription
prkdb_ops_totalCounterTotal operations
prkdb_reads_totalCounterTotal reads
prkdb_writes_totalCounterTotal writes
prkdb_deletes_totalCounterTotal deletes
prkdb_delete_batches_totalCounterDelete batch operations
prkdb_operation_duration_secondsHistogramOperation latency
prkdb_read_duration_secondsHistogramRead latency
prkdb_write_duration_secondsHistogramWrite latency
prkdb_batch_duration_secondsHistogramBatch write latency

Cache Metrics

MetricTypeDescription
prkdb_cache_hits_totalCounterCache hits
prkdb_cache_misses_totalCounterCache misses
prkdb_cache_hit_ratioGaugeCache hit ratio

Example Checks

bash
curl http://127.0.0.1:9091/metrics | grep prkdb_up
curl http://127.0.0.1:9091/metrics | grep prkdb_raft_state
curl http://127.0.0.1:9091/metrics | grep prkdb_operation_duration_seconds

Alert Ideas

Node down

promql
prkdb_up == 0

Frequent leader elections

promql
rate(prkdb_raft_leader_elections_total[5m]) > 0

Heartbeat failures

promql
rate(prkdb_raft_heartbeats_failed_total[5m]) > 0

High write latency

promql
histogram_quantile(0.99, sum by (le) (rate(prkdb_write_duration_seconds_bucket[5m])))