SemTab Challenge
This page presents the ongoing SemTab Challenge, a direct continuation of SemTab 2025 within the long-running SemTab series on semantic annotation and table-to-knowledge-graph matching. New submissions are welcome and actively encouraged. All participating systems will be evaluated on a rolling basis, and top-performing solutions will be invited to submit a manuscript and present their work at an upcoming venue (TBD — either the TaDA Workshop at VLDB 2026 or the Ontology Matching (OM) Workshop at ISWC 2026).
MammoTab is a large-scale benchmark designed to provide realistic and complex scenarios, including tables affected by typical challenges of web and Wikipedia data.
This leaderboard has been generated using the MammoTab sample dataset, which consists of 870 tables containing a total of 84,907 distinct mentions.
About the Challenge
This challenge is managed by:
Requirements
Only approaches based on Large Language Models are allowed, either:
- in fine-tuning settings, or
- using Retrieval-Augmented Generation strategies.
The evaluation will focus on the Cell Entity Annotation (CEA) task, but will also take into account the ability of the proposed approaches to effectively deal with the following key challenges:
Participants are expected to demonstrate not only strong CEA performance, but also robustness and versatility across all these dimensions, which are critical for real-world table interpretation scenarios.
Evaluation Details
Task Focus
The evaluation will focus on the Cell Entity Annotation (CEA) task using the Wikidata KG (v. 20240720).
Dataset Structure
The test set is not included in the dataset in order to preserve the impartiality of the final evaluation and to discourage ad-hoc solutions.
Targets Format
CEA task
filename, row id (0-indexed), column id (0-indexed), entity idAnnotation:
LYQZQ0T5,1,1,Q3576864
Table LYQZQ0T5:
col0,col1,col2
1976,Eat My Dust!,Charles Byron Griffith
1976,Hollywood Boulevard,Joe Dante
1976,Hollywood Boulevard,Allan Arkush
1977,Grand Theft Auto,Ron Howard
Evaluation Criteria
Precision, Recall and F1 Score are calculated:
Notes:
- denotes the number.
- is used as the primary score, and is used as the secondary score.
How to Participate
Leaderboard
| Model | Precision | Recall | F1 Score |
|---|---|---|---|
| ADFr | 0.758 | 0.758 | 0.758 |
| RAGDify | 0.603 | 0.603 | 0.603 |
| ditlab | 0.549 | 0.549 | 0.549 |
| Kepler-aSI | 0.403 | 0.157 | 0.226 |