Abstract

Kernel density estimation is a non-parametric density estimation and often used in statistical inference, especially in scientific fields like high energy physics. Multiple new implementations of univariate kernel density estimation are proposed, based on TensorFlow (a just-in-time compiled mathematical Python library for CPU and GPU) and zfit (a highly scalable and customizable model manipulation and fitting library based on TensorFlow). Starting from the exact algorithm, several optimizations from recent papers are implemented to boost its efficiency. These optimizations include linear binning, Fast Fourier Transformed kernel functions, the Improved Sheather-Jones algorithm proposed by Botev et al., and specialized kernel functions as proposed by Hofmeyr. The accuracy and efficiency of the proposed implementation suite is then compared to existing implementations in Python and shown to be competitive. The proposed univariate kernel density estimation suite achieves state-of-the-art accuracy as well as efficiency, especially for large number of samples (\(n \geq 10^8\)).