HRTnomaly:Historical, Relational, and Tail Anomaly-Detection Algorithms
The presence of outliers in a dataset can substantially bias the results of statistical analyses. To correct for
outliers, micro edits are manually performed on all records. A
set of constraints and decision rules is typically used to aid
the editing process. However, straightforward decision rules
might overlook anomalies arising from disruption of linear
relationships. Computationally efficient methods are provided
to identify historical, tail, and relational anomalies at the
data-entry level (Sartore et al., 2024;
<doi:10.6339/24-JDS1136>). A score statistic is developed for
each anomaly type, using a distribution-free approach motivated
by the Bienaymé-Chebyshev's inequality, and fuzzy logic is used
to detect cellwise outliers resulting from different types of
anomalies. Each data entry is individually scored and
individual scores are combined into a final score to determine
anomalous entries. In contrast to fuzzy logic, Bayesian
bootstrap and a Bayesian test based on empirical likelihoods
are also provided as studied by Sartore et al. (2024;
<doi:10.3390/stats7040073>). These algorithms allow for a more
nuanced approach to outlier detection, as it can identify
outliers at data-entry level which are not obviously distinct
from the rest of the data. --- This research was supported in
part by the U.S. Department of Agriculture, National
Agriculture Statistics Service. The findings and conclusions in
this publication are those of the authors and should not be
construed to represent any official USDA, or US Government
determination or policy.