Estimation from Partially Sampled Distributed Traces

arXiv preprint arXiv:2107.07703, 2021

Sampling is often a necessary evil to reduce the processing and storage costs of distributed tracing. In this work, we describe a scalable and adaptive sampling approach that can preserve events of interest better than the widely used head-based sampling approach. Sampling rates can be chosen individually and independently for every span, allowing to take span attributes and local resource constraints into account. The resulting traces are often only partially and not completely sampled which complicates statistical analysis. To exploit the given information, an unbiased estimation algorithm is presented. Even though it does not need to know whether the traces are complete, it reduces the estimation error in many cases compared to considering only complete traces.