MammoTab

MammoTab, is a dataset composed of 1M Wikipedia tables extracted from over 20M Wikipedia pages and annotated through Wikidata.
The lack of this kind of datasets in the stateof-the-art makes MammoTab a good resource for testing and training Semantic Table Interpretation approaches.
The dataset has been designed to cover several key challenges, such as disambiguation, homonymy, and NIL-mentions

Learn more

Mammotab 2026
V3 (Next Release)
A new version of MammoTab will be launched in 2026, bringing major updates in data coverage, semantic linking, and multilingual support. This release will integrate advanced LLM-based entity disambiguation, improved NIL handling, and expanded Wikidata synchronization. Stay tuned for the official release!
Mammotab 2025
🏆 MammoTab 25 won the Best Resource Award • ISWC 2025
V2
Introducing the enhanced MammoTab 2.0! This latest version features a fully refactored codebase, resulting in a streamlined number of tables. Thanks to advanced data cleaning techniques, the annotations are now of superior quality. Additionally, each table is complemented by comprehensive metadata that detail their features, addressing the key challenges of the STI.
Download
888.372N. of tables
40.702.248Entities
4.937.828Classes
24.193Properties
4.121.995NIL
21.731.092Total rows
4Min rows
24.193Max rows
5.030.655Total cols
1Min cols
1.000Max cols
Evaluation
Dataset 511 tables, 9741 mentions.
APPROACHES CEA
Zang 2023 0.86
Deng 2022 0.31
Avogadro 2023 0.62
Find on Zenodo
Find on Github
Upgrades
Add table metadata

Column classification (NIL, Ne)

Domain classification

Add table context

Add classification for key STI challenges

Add export material
Mammotab 2024
V2-alpha (SemTab)
This version was created for SemTab 2024, it comes from the preliminary version of V2. The annotations within MammoTab 24 are derived from Wikidata v. 20240401 and follow the structure used in the SemTab challenge. All tables are stored in a separate CSV file, where each line in the file corresponds to a row in the table. Target columns for annotation, CTA, and CEA are saved in separate CSV files.
Download
Evaluation
Dataset 511 tables, 9741 mentions.
APPROACHES CEA
Zang 2023 0.86
Deng 2022 0.31
Avogadro 2023 0.62
Find on Zenodo
Find on Github
Upgrades
Greater accuracy in annotations

New annostions for CPA (Columns Predicate Annotations)
Mammotab 2022
V1
The annotations within MammoTab22 are derived from Wikidata v. 20220511 and follow the structure used in the SemTab challenge. All tables are stored in a separate CSV file, where each line in the file corresponds to a row in the table. Target columns for annotation, CTA, and CEA are saved in separate CSV files
Download
980.254N. of tables
43.661.125Entities
5.541.283 Classes
23.229.899Total rows
4Min rows
14.436Max rows
5.638.191Total cols
1Min cols
1.0100.012Max cols
Evaluation
Mtab performance
DATASET CEA CTA CPA
Semtab2019 R4 0.983 - 0.832
Semtab2020 R4 0.907 0.993 0.997
Semtab2020 2T 0.907 0.728 -
Semtab2021 R3 0.968 0.984 0.993
MammoTab 22 0.853 0.659 -
Find on Zenodo
Find on Bitbucket

DATASET	CEA	CTA	CPA
Semtab2019 R4	0.983	-	0.832
Semtab2020 R4	0.907	0.993	0.997
Semtab2020 2T	0.907	0.728	-
Semtab2021 R3	0.968	0.984	0.993
MammoTab 22	0.853	0.659	-

APPROACHES	CEA
Zang 2023	0.86
Deng 2022	0.31
Avogadro 2023	0.62

APPROACHES	CEA
Zang 2023	0.86
Deng 2022	0.31
Avogadro 2023	0.62

MammoTab

V3 (Next Release)

V2

V2-alpha (SemTab)

V1