Can We Apply Benfordís Law To Check Quality Of Cancer Incidence Data?
Emanuele CROCETTI, European Commission, DG Joint Research Centre (JRC), Italy
RANDI G. 1
,
DYBA T. 1
,
CARVALHO R. 1
,
GIUSTI F. 1
,
MARTOS C. 1
,
ROONEY R. 1
,
VOTI L. 1
,
BETTIO M. 1
1 European Commission, DG Joint Research Centre, Institute for Health and Consumer Protection, Public Health Policy Support Unit, Ispra (VA), Italy
Purpose
Benford's law states that the distribution of the occurrence of the first significant digit (FSD) of a number, in many large collections of numbers, is not uniform. The aim of this study was to evaluate whether population-based cancer incidence rates follow Benford's law and if this can be used in their data quality checking process.
Methods
Detailed databases from six population-based cancer registries (from Africa, North and South America, Asia, Europa and Oceania) were retrieved from the Cancer Incidence in 5 Continents-X website. These datasets consisted of 244 combinations of topography and morphological groups, 18 age groups and two sexes. The distribution of FSD was evaluated for the whole dataset, plus for some subgroups as cancer registries, cancer types and sexes.
Several statistics, including Pearson's coefficient of correlation, distance measures and specific tests, were applied to check for consistency between calculated FSD frequency distribution and the theoretical Benford's one.
Results
The distribution of FSD, calcutated for each combination, consistently showed mean values greater than the medians and were positively skewed. For the whole dataset (22,180 observations), and for single cancer registries (from 1,546 to 6,296 observations), the coefficient of correlation was high, ranging from 0.918 to 0.997. Also the distance measures were very low. Very similar results were obtained for major cancer sites, and sexes. The need for statistical tests, not influenced by sample size, was confirmed.
Conclusions
The data analyzed in this study had already been checked and approved for publication in Cancer Incidence in 5 Continents-X. Therefore, their quality was expected to be good. This study demonstrated that cancer incidence rates follow Benford's law. This suggests using the adherence to Benford's law of the FSD distribution of incidence rates as a quick tool in their quality evaluation, in order to identify possible deviations for further investigations.