Quantifying Color and Distortion Biases in the NCT-CRC-HE-100K Histopathology Dataset
Rodrigues, G. A. P., Serrano, A. L. M., Filho, G. P. R. , Bonacin, R., Gonçalves, V. P., Rajarajan, M.
ORCID: 0000-0001-5814-9922 & Meneguette, R. I. (2026).
Quantifying Color and Distortion Biases in the NCT-CRC-HE-100K Histopathology Dataset.
Journal of the Brazilian Computer Society, 32(1),
pp. 1317-1330.
doi: 10.5753/jbcs.2026.7045
Abstract
Colorectal cancer (CRC) represents a persistent challenge for healthcare systems, and the development of reliable deep learning systems for histopathology depends on unbiased datasets. The widely used NCT-CRC-HE-100K dataset has been shown to contain color inconsistencies, distortion artifacts, and corrupted patches, yet prior analyses offered only limited quantitative evidence. In this work, we extend these observations by evaluating color signatures, stain-normalization behavior, and class-dependent image quality variations. We compare classical and deep learning based stain normalization methods to identify their impact on image quality metrics and potential reduction of class-specific biases in computational pathology. Our results show that while normalization reduces color-based class distinguishability, none of the evaluated methods completely eliminate tissue-specific color signatures. Additionally, this work demonstrates that distortion artifacts disproportionately affect one class in the dataset, introducing technical biases unrelated to morphology. Also, a CNN classifier trained on each normalized dataset indicates that model performance is not significantly changed across the normalization methods, including the unnormalized dataset, despite reductions in color-based separability. Overall, our study provides quantitative evidence that color, saturation, and distortion persist across normalization techniques, emphasizing the need for caution when using NCT-CRC-HE-100K to assess histopathology models.
| Publication Type: | Article |
|---|---|
| Additional Information: | Copyright (c) 2026 Gabriel Arquelau Pimenta Rodrigues, André Luiz Marques Serrano, Geraldo Pereira Rocha Filho, Rodrigo Bonacin, Vinícius Pereira Gonçalves, Muttukrishnan Rajarajan, Rodolfo Ipolito Meneguette. This work is licensed under a Creative Commons Attribution 4.0 International License. |
| Publisher Keywords: | Bias analysis, colorectal cancer, histopathology, stain normalization |
| Subjects: | R Medicine > RC Internal medicine > RC0254 Neoplasms. Tumors. Oncology (including Cancer) |
| Departments: | School of Science & Technology School of Science & Technology > Department of Engineering |
| SWORD Depositor: |
Available under License Creative Commons Attribution.
Download (4MB) | Preview
Export
Downloads
Downloads per month over past year
Metadata
Metadata