SemTab Challenge

This page presents the ongoing SemTab Challenge, a direct continuation of SemTab 2025 within the long-running SemTab series on semantic annotation and table-to-knowledge-graph matching. New submissions are welcome and actively encouraged. All participating systems will be evaluated on a rolling basis, and top-performing solutions will be invited to submit a manuscript and present their work at an upcoming venue (TBD — either the TaDA Workshop at VLDB 2026 or the Ontology Matching (OM) Workshop at ISWC 2026).

MammoTab is a large-scale benchmark designed to provide realistic and complex scenarios, including tables affected by typical challenges of web and Wikipedia data.

This leaderboard has been generated using the MammoTab sample dataset, which consists of 870 tables containing a total of 84,907 distinct mentions.

About the Challenge

This challenge is managed by:

Marco Cremaschi, Fabio D'Adda from the University of Milano-Bicocca,Ernesto Jimenez-Ruiz from City St George's, University of Londonand Oktie Hassanzadeh from IBM Research

Requirements

Only approaches based on Large Language Models are allowed, either:

in fine-tuning settings, or
using Retrieval-Augmented Generation strategies.

The evaluation will focus on the Cell Entity Annotation (CEA) task, but will also take into account the ability of the proposed approaches to effectively deal with the following key challenges:

DisambiguationCorrectly linking ambiguous mentions to the intended entities.

HomonymyManaging mentions referring to entities with identical or very similar names.

Alias resolutionRecognising entities referred by alternative names, acronyms, or nicknames.

NIL DetectionCorrectly identifying mentions that do not correspond to any entity in the Knowledge Graph.

Noise RobustnessDealing with incomplete, noisy, or imprecise table contexts.

Collective InferenceLeveraging inter-cell and inter-column signals to improve the consistency of annotations.

Participants are expected to demonstrate not only strong CEA performance, but also robustness and versatility across all these dimensions, which are critical for real-world table interpretation scenarios.

Evaluation Details

Task Focus

The evaluation will focus on the Cell Entity Annotation (CEA) task using the Wikidata KG (v. 20240720).

Dataset Structure

The test set is not included in the dataset in order to preserve the impartiality of the final evaluation and to discourage ad-hoc solutions.

Targets Format

CEA task
filename, row id (0-indexed), column id (0-indexed), entity id

Annotation: LYQZQ0T5,1,1,Q3576864 Table LYQZQ0T5: col0,col1,col2 1976,Eat My Dust!,Charles Byron Griffith 1976,Hollywood Boulevard,Joe Dante 1976,Hollywood Boulevard,Allan Arkush 1977,Grand Theft Auto,Ron Howard

Evaluation Criteria

Precision, Recall and F1 Score are calculated:

Precision = \frac{\#correct\_annotations}{\#submitted\_annotations}

Recall = \frac{\#correct\_annotations}{\#ground\_truth\_annotations}

F_1 = \frac{2 \times Precision \times Recall}{Precision + Recall}

Notes:

$\#$ denotes the number.
$F_1$ is used as the primary score, and $Precision$ is used as the secondary score.

How to Participate

Submission

Are you ready? Then, submit the annotations via Google Form.

Download Dataset

Leaderboard

Model	Precision	Recall	F1 Score
ADFr	0.758	0.758	0.758
RAGDify	0.603	0.603	0.603
ditlab	0.549	0.549	0.549
Kepler-aSI	0.403	0.157	0.226