KG2Tables: A Domain-Specific Tabular Data Generator to Evaluate Semantic Table Interpretation Systems
Abdelmageed, N., Jiménez-Ruiz, E. ORCID: 0000-0002-9083-4599, Hassanzadeh, O. & Konig-Ries, B. (2025).
KG2Tables: A Domain-Specific Tabular Data Generator to Evaluate Semantic Table Interpretation Systems.
Transactions on Graph Data and Knowledge (TGDK),
Abstract
Tabular data, often in the form of CSV files, plays a pivotal role in data analytics pipelines. Understanding this data semantically, known as Semantic Table Interpretation (STI), is crucial but poses challenges due to several factors such as the ambiguity of labels. As a result, STI has gained increasing attention from the community in the past few years. Evaluating STI systems requires well-established benchmarks. Most of the existing large-scale benchmarks are derived from general domain sources and focus on ambiguity, while domain-specific benchmarks are relatively small in size. This paper introduces KG2Tables, a framework that can construct domain-specific large-scale benchmarks from a Knowledge Graph (KG). KG2Tables leverages the internal hierarchy of the relevant KG concepts and their properties. As a proof of concept, we have built large datasets in the food, biodiversity, and biomedical domains. The resulting datasets, tFood, tBiomed, and tBiodiv, have been made available for the public in the ISWC SemTab challenge (2023 and 2024 editions). We include the evaluation results of top-performing STI systems using tFood Such results underscore its potential as a robust evaluation benchmark for challenging STI systems. We demonstrate the data quality level using a samplebased approach for the generated benchmarks including, for example, realistic tables assessment. Nevertheless, we provide an extensive discussion of KG2Tables explaining how it could be used to create other benchmarks from any domain of interest and including its key features and limitations with suggestions to overcome them.
Publication Type: | Article |
---|---|
Additional Information: | © Nora Abdelmageed, Ernesto Jiménez-Ruiz, Oktie Hassanzadeh, and Birgitta König-Ries - CC-BY all rights reserved; licensed under Creative Commons License CC-BY 4.0 |
Publisher Keywords: | Semantic Table Interpretation (STI), Knowledge Graph (KG), STI Benchmark, Food, Biodiversity, Biomedical |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science |
Departments: | School of Science & Technology School of Science & Technology > Computer Science |
SWORD Depositor: |
Available under License Creative Commons: Attribution International Public License 4.0.
Download (1MB) | Preview
Export
Downloads
Downloads per month over past year