City Research Online

Persistent Homology and Gabor Features Reveal Inconsistencies Between Widely Used Colorectal Cancer Training and Testing Datasets

Brito-Pacheco, D., Ibadulla, R. ORCID: 0000-0002-0359-0830, Fernández, X. , Giannopoulos, P. ORCID: 0000-0002-6261-1961 & Reyes-Aldasoro, C. C. ORCID: 0000-0002-9466-2018 (2026). Persistent Homology and Gabor Features Reveal Inconsistencies Between Widely Used Colorectal Cancer Training and Testing Datasets. In: Medical Image Understanding and Analysis. 29th Annual Conference, MIUA 2025, 15-17 Jul 2025, Leeds, UK. doi: 10.1007/978-3-031-98688-8_7

Abstract

Recent work on computer vision and image processing has relied substantially on open datasets, which allow for an objective comparison of techniques and methodologies. In the area of computational pathology and, more specifically, on colorectal cancer, the dataset NCT-CRC-HE-100K, which consists of 100,000 patches of human tissue stained with Haematoxylin and Eosin has been widely used as a training set for deep learning studies. The patches are grouped into 9 classes of tissue (adipose, background, debris, lymphocytes, mucus, smooth muscle, normal colon mucosa, cancer-associated stroma, colorectal adenocarcinoma epithelium). The set is released with a separate set (CRC-VAL-HE-7K) of 7,180 patches that is commonly used for testing. In this work, features were extracted from both sets first with Persistent Homology, then, with Gabor filters to reveal that the training set presents a rather different distribution from the testing set. Namely, the distribution of features in the 7K-set presents a much higher class overlap than those in the 100K-set, which would imply a much higher separability in the testing set than in the training set.

Publication Type: Conference or Workshop Item (Paper)
Additional Information: © 2026 The Author(s), under exclusive license to Springer Nature Switzerland AG. This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-031-98688-8_7
Publisher Keywords: Persistent Homology, Gabor Features, Class separability
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
R Medicine > RC Internal medicine > RC0254 Neoplasms. Tumors. Oncology (including Cancer)
Departments: School of Science & Technology
School of Science & Technology > Department of Computer Science
School of Science & Technology > Department of Mathematics
SWORD Depositor:
[thumbnail of PH_in_Histopathology___MIUA_2025.pdf] Text - Accepted Version
This document is not freely accessible until 17 July 2026 due to copyright restrictions.

To request a copy, please use the button below.

Request a copy

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login