Skip to main content

LLM Leaderboard

Compare the performance of different language models on the MammoTab dataset

This leaderboard has been generated using the MammoTab sample dataset, which consists of 870 tables containing a total of 84,907 distinct mentions.

This leaderboard is managed by:

Marco Cremaschi, Federico Belotti, Matteo Palmonari from the University of Milano-Bicoccaand Jennifer D'Souza from TIB Leibniz Information Centre for Science and Technology
Model
Parameters
Max Context
Status
System
Total Time
Accuracy
Total Correct
Out Context Prompt
NILs
Acronyms
Typos
Aliases
Generic Types
Specific Types
Single Domain
Multi Domain
Small % Cols
Medium % Cols
Large % Cols
Small % Rows
Medium % Rows
Large % Rows
google/gemma-2-2b-it
2B
8192
Done
NVIDIA RTX A6000
69.67h
0.407
34608
142
9545[64%]
2323[66%]
8168[67%]
4957[70%]
58[60%]
529[68%]
226[52%]
361[83%]
69[45%]
354[76%]
164[65%]
234[59%]
185[80%]
168[69%]
meta-llama/Llama-3.2-1B
1B
131072
Done
NVIDIA RTX A6000
23.12h
0.048
4142
0
1625[11%]
30[1%]
98[1%]
60[1%]
4[4%]
18[2%]
7[2%]
15[3%]
1[1%]
13[3%]
8[3%]
17[4%]
2[1%]
3[1%]
meta-llama/Llama-3.2-3B
3B
131072
Done
NVIDIA RTX A6000
58.73h
0.303
25771
0
7599[51%]
2171[62%]
7144[59%]
3692[52%]
43[45%]
461[60%]
205[47%]
299[69%]
64[42%]
317[68%]
123[49%]
177[45%]
169[73%]
158[65%]
microsoft/Phi-3-mini-4k-instruct
3.8B
4096
Done
NVIDIA RTX A6000
102.76h
0.281
23881
3944
7045[47%]
2085[59%]
6871[57%]
3660[51%]
37[39%]
482[62%]
207[48%]
312[72%]
58[38%]
318[68%]
143[57%]
208[53%]
166[72%]
145[59%]
Qwen/Qwen2-0.5B
0.5B
131072
Done
NVIDIA RTX A6000
14.63h
0.044
3741
0
1103[7%]
6[0%]
5[0%]
3[0%]
2[2%]
17[2%]
5[1%]
14[3%]
0[0%]
15[3%]
4[2%]
17[4%]
1[0%]
1[0%]
Qwen/Qwen2-1.5B
1.5B
131072
Done
NVIDIA RTX A6000
27.36h
0.166
14124
0
4491[30%]
1429[41%]
4468[37%]
1691[24%]
23[24%]
285[37%]
147[34%]
161[37%]
42[28%]
209[45%]
57[23%]
90[23%]
110[47%]
108[44%]
01-ai/Yi-1.5-6B
6B
4096
Done
NVIDIA RTX A6000
51.96h
0.068
5832
3543
1626[11%]
545[15%]
756[6%]
370[5%]
6[6%]
71[9%]
31[7%]
46[11%]
1[1%]
51[11%]
25[10%]
39[10%]
24[10%]
14[6%]
Qwen/Qwen2.5-0.5B
0.5B
32768
Done
NVIDIA RTX A6000
14.35h
0.015
1329
0
452[3%]
0[0%]
1[0%]
0[0%]
0[0%]
2[0%]
1[0%]
1[0%]
0[0%]
2[0%]
0[0%]
1[0%]
1[0%]
0[0%]
Qwen/Qwen2.5-7B
7B
131072
Done
NVIDIA RTX A6000
47.99h
0.510
43321
0
12705[86%]
2568[73%]
9314[77%]
6101[86%]
74[77%]
607[78%]
265[61%]
416[96%]
76[50%]
383[82%]
222[88%]
298[76%]
198[85%]
185[76%]