City Research Online

Quantifying Color and Distortion Biases in the NCT-CRC-HE-100K Histopathology Dataset

Rodrigues, G. A. P., Serrano, A. L. M., Filho, G. P. R. , Bonacin, R., Gonçalves, V. P., Rajarajan, M. ORCID: 0000-0001-5814-9922 & Meneguette, R. I. (2026). Quantifying Color and Distortion Biases in the NCT-CRC-HE-100K Histopathology Dataset. Journal of the Brazilian Computer Society, 32(1), pp. 1317-1330. doi: 10.5753/jbcs.2026.7045

Abstract

Colorectal cancer (CRC) represents a persistent challenge for healthcare systems, and the development of reliable deep learning systems for histopathology depends on unbiased datasets. The widely used NCT-CRC-HE-100K dataset has been shown to contain color inconsistencies, distortion artifacts, and corrupted patches, yet prior analyses offered only limited quantitative evidence. In this work, we extend these observations by evaluating color signatures, stain-normalization behavior, and class-dependent image quality variations. We compare classical and deep learning based stain normalization methods to identify their impact on image quality metrics and potential reduction of class-specific biases in computational pathology. Our results show that while normalization reduces color-based class distinguishability, none of the evaluated methods completely eliminate tissue-specific color signatures. Additionally, this work demonstrates that distortion artifacts disproportionately affect one class in the dataset, introducing technical biases unrelated to morphology. Also, a CNN classifier trained on each normalized dataset indicates that model performance is not significantly changed across the normalization methods, including the unnormalized dataset, despite reductions in color-based separability. Overall, our study provides quantitative evidence that color, saturation, and distortion persist across normalization techniques, emphasizing the need for caution when using NCT-CRC-HE-100K to assess histopathology models.

Publication Type: Article
Additional Information: Copyright (c) 2026 Gabriel Arquelau Pimenta Rodrigues, André Luiz Marques Serrano, Geraldo Pereira Rocha Filho, Rodrigo Bonacin, Vinícius Pereira Gonçalves, Muttukrishnan Rajarajan, Rodolfo Ipolito Meneguette. This work is licensed under a Creative Commons Attribution 4.0 International License.
Publisher Keywords: Bias analysis, colorectal cancer, histopathology, stain normalization
Subjects: R Medicine > RC Internal medicine > RC0254 Neoplasms. Tumors. Oncology (including Cancer)
Departments: School of Science & Technology
School of Science & Technology > Department of Engineering
SWORD Depositor:
[thumbnail of 7045-Article Text-39742-1-10-20260507.pdf]
Preview
Text - Published Version
Available under License Creative Commons Attribution.

Download (4MB) | Preview

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login