봉황대 in CS

XIndex: A Scalable Learned Index for Multicore Data Storage 본문

Data Centric Computing

XIndex: A Scalable Learned Index for Multicore Data Storage

등 긁는 봉황대 2023. 10. 3. 02:31

PPoPP '20

 

XIndex | Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming February 2020 454 pages Copyright © 2020 ACM Permission to make digital or hard copies of all or part of this work for personal or classroom use is

dl.acm.org

 

Learned Index


The Key Idea

  • Using simple learned models (e.g., linear models, single-layer NNs)
    (1) to approximate the CDF of keys’ distribution and (2) predict record positions.
  • Staged model architecture(Recursive Model Indexes, RMI) to reduce the leaf stage model’s computation cost.

 

  1. Train the model with records’ keys and their positions.
    • fixed-length key-value records
    • sorted in an array
  2. Use the model to predict the position with the given key. 

 

Limitations

  1. Learned index doesn’t support any modifications: inserts, updates, removes.
    • → needs to reconstruct the layout & retrain the model to handle every write request
    • → concurrency problem occurs
  2. The performance is tied closely to workload characteristics. (both data distribution and query distribution)
    • Learned index assumes the workload has a relative static query distribution,
      but queries in real-world workloads tend to be 
      skewed.
    • Learned index trains & predicts by the min & max prediction errors of the model.
  3. With small datasets, learned index’s performance is limited by the model computation cost.

 

An Intuitive Solution to Provide Writes:

  1. Buffer all writes in a delta index.
  2. Periodically compact it with the learned index. (compaction : merge the data → retrain the models)

 

but..

  1. Severe slowdown occurs for queries by looking up delta index first, and then the learned index.
  2. Concurrent requests are blocked by the compaction.

 

Needs update in-place with a non-blocking compaction scheme!

but.. correctness issue occurs

 

 

 

XIndex


  • A scalable and concurrent learned index.
  • Data structure adjustment according to runtime workload characteristics.

 

Two-layer Architecture Design

  • The root node indexes all group nodes in the bottom layer, by learned RMI model.
  • Data is divided into groups by range partitioning
    and each group node uses learned linear models to index its data.

 

write

  • Updates in-place
  • Associates each group with a delta index to buffer insertions.

 

 

buffer_t *buf;      // delta index
		    // buffers all insertions

bool_t buf_frozen;  // set to true during compaction

buffer_t *tmp_buf;  // temporary delta index
		    // buffers all insertions during compaction

group_t *next;      // used by group split operation

 

Basic Operations

  1. Use the root to find the corresponding group.
    1. Predict a group number with the RMI model.
    2. Correct the group number with binary search the root.pivots within an error-bounded range.
  2. Look up the position of the requested record in data_array with the given key.
    1. Find the correct linear model for the prediction.
      (scan group.models → uses the first model whose smallest key is not larger than the target key)
    2. Use the model to predict a position in the group.data_array.
    3. Correct the position with binary search in a range bounded by the model’s error.

 

get

: search buf → search tmp_buf (if it is not null)

 

put & remove

If a matching record is found,

  1. Try to update/remove in-place.
  2. If it is failed, operate on buf or tmp_buf (if frozen_buf flag is true)

 

 

scan

: locate the smallest record that is ≥ requested key → reads n consecutive records

 


 

  • Fine-grained concurrency control
  • Readers can always fetch a consistent result
    with single read-write lock & version number matching in data_array & delta indexes(buf, temp_buf).

 

 

Two-Phase Compaction

  • to compact the delta index conditionally
  • performed in the background → non-blocking concurrent operations

 

1. merge phase

: current data array + delta index → new sorted array
(maintain data references, i.e., the values are pointers to existing records)

  • During merging, XIndex skips the logically removed records.

 

2. copy phase

: replaces each reference in the new data array with the latest record value (real data)

  • Copy phase is performed when it’s ensured no access on the old data array through an RCU barrier.
  • Wait for each worker to process one request, so the old group will not be accessed after the barrier.

 

 

 

Structure Update: Adjusting XIndex at Runtime

XIndex adjusts its structure according to runtime workload characteristics to improve lookup efficiency.

 

  • Basic idea : Keep both error bound and delta index size small with structure update operations.
  • Several background threads periodically checks them and performs structure update operations.

 

 

  • e : error bound threshold
  • m : model number threshold
  • f ∈ (0, 1) : tolerance factor
  • s : delta index size threshold

 

Model split / merge

If some group incurs high prediction error, add more linear models in that group.

 

Group split

If a group has too many models or its delta index is too large, replace the group with two new groups.

 

  1. step 1 : Create two logical groups, sharing data & delta index + temporary delta index for each.
  2. step 2
    1. merge phase :
      data_array + buftmp_array → split it with the key of g’_b.pivot→ initialize two groups (g_a, g_b)
    2. copy phase :
      The references in data_array are replaced with real values.

 

 

Group merge

  1. merge phase
  2. copy phase

 

Root update

If there is any group created or removed, root update is performed.

  • average error bound > e → increase the number of 2nd stage models of root’s 2-stage RMI)
  • average error bound ≤ e × f → reduce models

 

If there are too many groups,

  1. flatten root’s groups to reduce pointer access cost (create a new root node with a flattened groups)
  2. retrain the RMI model

to improve the accuracy.

 

 

Evaluation


내일 써야징

 

 

반응형
Comments