Persistent Homology and Gabor Features Reveal Inconsistencies Between Widely Used Colorectal Cancer Training and Testing Datasets
Brito-Pacheco, D., Ibadulla, R.
ORCID: 0000-0002-0359-0830, Fernández, X. , Giannopoulos, P.
ORCID: 0000-0002-6261-1961 & Reyes-Aldasoro, C. C.
ORCID: 0000-0002-9466-2018 (2026).
Persistent Homology and Gabor Features Reveal Inconsistencies Between Widely Used Colorectal Cancer Training and Testing Datasets.
In:
Medical Image Understanding and Analysis.
29th Annual Conference, MIUA 2025, 15-17 Jul 2025, Leeds, UK.
doi: 10.1007/978-3-031-98688-8_7
Abstract
Recent work on computer vision and image processing has relied substantially on open datasets, which allow for an objective comparison of techniques and methodologies. In the area of computational pathology and, more specifically, on colorectal cancer, the dataset NCT-CRC-HE-100K, which consists of 100,000 patches of human tissue stained with Haematoxylin and Eosin has been widely used as a training set for deep learning studies. The patches are grouped into 9 classes of tissue (adipose, background, debris, lymphocytes, mucus, smooth muscle, normal colon mucosa, cancer-associated stroma, colorectal adenocarcinoma epithelium). The set is released with a separate set (CRC-VAL-HE-7K) of 7,180 patches that is commonly used for testing. In this work, features were extracted from both sets first with Persistent Homology, then, with Gabor filters to reveal that the training set presents a rather different distribution from the testing set. Namely, the distribution of features in the 7K-set presents a much higher class overlap than those in the 100K-set, which would imply a much higher separability in the testing set than in the training set.
| Publication Type: | Conference or Workshop Item (Paper) |
|---|---|
| Additional Information: | © 2026 The Author(s), under exclusive license to Springer Nature Switzerland AG. This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-031-98688-8_7 |
| Publisher Keywords: | Persistent Homology, Gabor Features, Class separability |
| Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science R Medicine > RC Internal medicine > RC0254 Neoplasms. Tumors. Oncology (including Cancer) |
| Departments: | School of Science & Technology School of Science & Technology > Department of Computer Science School of Science & Technology > Department of Mathematics |
| SWORD Depositor: |
This document is not freely accessible until 17 July 2026 due to copyright restrictions.
To request a copy, please use the button below.
Request a copyExport
Downloads
Downloads per month over past year
Metadata
Metadata