Optimizing Apache HoraeDB for High-Cardinality Metrics at AntGroup

Jiacai Liu(刘家财)

Senior Engineer @ Ant Group
Community Over Code NA, October 2024. PDF

About ME

Agenda

  1. What's time series
  2. What's Apache HoraeDB
    1. Core design
  3. Write path optimization
  4. Query path optimization
  5. Looking Forward

1. What's time series

  • A collection of time-based data points that can be connected into (time) lines.
  • Use tags to differentiate between lines

Characteristics

  • Vertical writes, horizontal(-ish) reads

Scenarios

  • IoT
  • APM (Application Performance Monitoring)
  • Weather Forecasting
  • Stock Market Analysis

Time series database

Specialized database that efficiently stores and retrieves time-stamped data

  • Prometheus
  • InfluxDB
  • TimescaleDB
  • Apache HoraeDB
  • ……

Challenge

  • High write throughput
  • High-Cardinality tags, lead to BIG index
  • Real-time OLAP like query pattern

Inverted Index

2. What's Apache HoraeDB

Distributed, cloud native time-series database

2.1. Core design

  • Separating compute from storage with object storage
  • FDAP stack
  • BRIN(Block range index)
    • Min/Max Index
    • Xor Filters(faster, smaller than bloom filter)

Storage-Compute Separation

  • Table data are distributed in shards(also known as tablets)
  • Each node has N shards
  • Mapping of table/shard/node are stored in horaemeta

LSM-like engine

FDAP stack

  • (Arrow) Flight, RPC framework based on Arrow data
  • DataFusion, Query engine
  • Arrow, Memory format
  • Parquet, Storage format

How it works for time series

Arrow: columnar memory format for flat and hierarchical data

DataFusion: LLVM-like Infrastructure for Databases

3. Write path optimization

  • Metrics are sharded with partitioned table
    • Hash
    • Range
    • Round-robin
  • Reduce IO as possible as we can
    • Group commit for WAL
    • Skip building the index for recently-written metrics

4. Query path optimization

  • Reduced IO without inverted index
    • Memtable
    • SST
  • Distributed query
    • Partitioned table routing
    • Distributed execution

Reduced IO – Memtable


  • Active: Write optimized
  • Frozen: Read optimized

Reduced IO – SST

Order is important!

Automatic clustering based on history queries

Distributed

Open as a "normal" table (single point hotspot)

Open as a "virtual" table

Aggregation Pushdown

SELECT
    time_bucket(`timestamp`, '5 min') AS `timestamp`,
    SUM(`value`) AS `value_sum`
FROM
    `table`
WHERE
    `timestamp` >= '2023-12-15 07:17:00'
    AND `timestamp` < '2023-12-14 08:17:00'
    AND ((`col2` IN ('T')))
GROUP BY
    time_bucket (`timestamp`, '5 min')

Simplified physical plan

pub enum AggregateMode {
    /// Partial aggregate that can be applied in parallel across input partitions
    Partial,
    /// Final aggregate that produces a single partition of output
    Final,
    ....
}

AggregateMode

5. Looking Forward

  • Build inverted index based on query histories
  • Teach query engine aware of data distribution patterns(TSID)
  • Release more ASF-compliant versions, growing community

Thanks