봉황대 in CS

Separation or Not: On Handing Out-of-Order Time-Series Data in Leveled LSM-Tree 본문

Data Centric Computing

Separation or Not: On Handing Out-of-Order Time-Series Data in Leveled LSM-Tree

등 긁는 봉황대 2023. 9. 7. 17:05

2022 IEEE 38th IDCE

 

Separation or Not: On Handing Out-of-Order Time-Series Data in Leveled LSM-Tree

LSM-Tree is widely adopted for storing time-series data in Internet of Things. According to conventional policy (denoted by <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\pi_{c}$</tex> ), when writing, the

ieeexplore.ieee.org

 

Apache IoTDB

- leveled LSM-Tree based, high-performance data engine tailored for time-series data

- separation policy (πs, MemTable : 1/2 C_seq, 1/2 C_nonseq)

C_seq    : in-order MemTable
C_nonseq : out-of-order MemTable

 

Two policies on MemTable

πc : conventional policy
- C_0 : 기존 LSM-Tree의 방식과 동일 (하나의 MemTable)

πs : separation policy
- C_seq    : stores in-order data points
- C_nonseq : stores out-of-order data pionts

 

 

하나의 append-only Skiplist overhead가 많이 큰가??

 

Separation Policy

- advantage : only requires compaction when C_nonseq is full

- problem : if there is only a few out-of-order data points, WA will increase than using conventional policy

 

 

Write amplification, WA

(data가 disk에 실제로 write 된 횟수) / (user가 data write을 요청한 횟수)

 

Time-series data points

p = <tg, ta, v>

tg : timestamp when the data point is generated
     unique → identifies a specific data point

ta : timestamp when the data point arrives in the database

v  : value

→ SST’s entries are sorted by generation time

 

Time-series S = {p1, p2, ... , pi, ...}
: collection of time-series data points
time interval to generate data points : Δt

 

 

Delay

p.td = p.ta - p.tg

in terms of generation time, time-series data are unordered because there are various delays

caused by clock skew, network delays, system recovery from failure, etc.

 

LAST(R)

- level L1 (notated in this paper) means disk level 0 = run = R

- LAST(R) : the entry with the latest generation time in R

즉, disk에 저장된 entry 중 generation time이 가장 느린(가장 최근에 생성된) data point를 말한다.

 

 

In-order data points

if new-arriving data point is generated later than all of the data points on the disk

p.tg > LAST(R).tg

 

Out-of-order data points

if new-arriving data point is generated earlier than all of the data points on the disk

p.tg < LAST(R).tg

 

 

 

Factors affecting WA

1. MemTable policy

choosing the policy should be dependent on the intensity of disorder of the writing workload

 

Experiments

: 선택한 정책과 workload 상황에 따라서 WA와 query performance가 어떻게 달라지는가?

 

[ datasets ]

µ : 평균
µ가 증가할수록 delay의 정도가 커진다는 것이다.

σ : 표준편차
σ가 증가할수록 delay의 정도가 많이 퍼져있다는 것이다. 즉, 제각각이다.

 

[ WA ]

WA.πs < WA.πc

 

[ Query performance ]

πc is better than πs

 

why?

- in πs, the SSTables included in the query range are fewer,

- but as the size decreases, the files to read increases

 

1. recent data query

select *
from TS
where time > (max_time - window)

 

2. historical data query

select *
from TS
where time > rand_value and time < rand_value + window

 

2. Capacity of C_nonseq

the total MemTable capacity is not tunable (n_0 = n_seq + n_nonseq)

if C_nonseq is too small,
C_nonseq will fill more often, causing frequent compaction

if C_nonseq is too large,
C_seq will fill more often, causing more data points treated as out of order
(because it updates the max generation time on the disk frequently)

 

Separation or not

main idea : predict the WA under two policies & select the best policy at that point

 

1. What is the WA under πc?

2-1. Under πs, how does WA change with different capacities of C_seq?

2-2. What is the minimum WA in this case? (minimum WA under πs)

 

 

πs(n*_seq)
means the separation policy, where the capacity of C_seq is set to be n*_seq

 


 

How can we predict the WAs??

: by predicting the number of subsequent data points

 

Subsequent data points

p   : on-disk data point
C_0 : in-memory MemTable

if there is any data point q ∈ C_0, p.tg > q.tg,
then p is a subsequent data point of C_0

즉, 현재 MemTable에서 가지고 있는 어떤 key q가 disk에 저장되어 있는 어떤 key p보다 늦게 생성된 놈일 때(→ q is out-of-order data point), p를 subsequent data point라고 부른다.

 

During compaction, any SSTable containing subsequent data points should be rewritten

 

rc : WA under πc

- n : the capactiy of C_0

- ζ(n) : the expectation of the number of subsequent data points

(P(Bi) : the probability of being a subsequent data point)

 

rs : WA under πs

- N_cur(n_seq) : number of data points to rewrite = 전체 - out-of-order - last-flushed SSTable(→ 왜 빼는가???)

- n'_seq : number of data points in the last-flushed SSTable

- unit : a phase (= C_nonseq의 주기)

 

 

위 수식에서 (n-n_seq+n'_seq) / N_arrive(n_seq) 부분은 이해하지 못했다....

(N_cur를 통해서 정리된 부분인 것 같은데 왜 저렇게 이어지는지 모르겠음)

 

Experiment - correctness

 

→ the model can accurately predict the WA & choose the better policy so that the WA is reduced

 

 

반응형
Comments