Skip to main content

LLM Leaderboard

Compare the performance of language models (continuously updated) on the MammoTab dataset

This leaderboard has been generated using the MammoTab sample dataset, which consists of 870 tables containing a total of 84,907 distinct mentions.

This leaderboard is managed by:

Marco Cremaschi, Federico Belotti, Matteo Palmonari from the University of Milano-Bicoccaand Jennifer D'Souza from TIB Leibniz Information Centre for Science and Technology
Model
Parameters
Status
Priorita
GPU
Total Time
Accuracy
Total Correct
NILs
Acronyms
Generic Types
Specific Types
Single Domain
Multi Domain
Small % Cols
Medium % Cols
Large % Cols
Small % Rows
Medium % Rows
Large % Rows
tiiuae/Falcon3-7B-Instruct
7B
Done
NVIDIA L40S
18.58h
0.276
23508
576[4%]
2192[62%]
58[60%]
517[67%]
228[52%]
347[80%]
66[43%]
314[68%]
195[77%]
260[66%]
162[70%]
153[63%]
google/gemma-2-2b-it
2B
Done
NVIDIA A6000
69.67h
0.407
34608
680[5%]
2323[66%]
58[60%]
529[68%]
226[52%]
361[83%]
69[45%]
354[76%]
164[65%]
234[59%]
185[80%]
168[69%]
google/gemma-2-9b-it
9B
Done
NVIDIA L40S
39.26h
0.401
34063
676[5%]
2226[63%]
56[58%]
542[70%]
241[55%]
357[82%]
71[47%]
359[77%]
168[66%]
255[65%]
176[76%]
167[68%]
google/gemma-3-27b-it
27B
Done
NVIDIA A40
87.84h
0.377
32037
2859[19%]
2248[64%]
59[61%]
523[68%]
252[58%]
330[76%]
85[56%]
323[69%]
174[69%]
230[58%]
182[78%]
170[70%]
meta-llama/Llama-3.2-1B
1B
Done
NVIDIA A6000
23.12h
0.048
4142
170[1%]
30[1%]
4[4%]
18[2%]
7[2%]
15[3%]
1[1%]
13[3%]
8[3%]
17[4%]
2[1%]
3[1%]
meta-llama/Llama-3.2-3B
3B
Done
NVIDIA A6000
58.73h
0.303
25771
610[4%]
2171[62%]
43[45%]
461[60%]
205[47%]
299[69%]
64[42%]
317[68%]
123[49%]
177[45%]
169[73%]
158[65%]
meta-llama/Llama-3.3-70B-Instruct
70B
Done
NVIDIA L40S
235.66h
0.629
53473
1700[11%]
2570[73%]
71[74%]
644[83%]
303[70%]
412[95%]
92[61%]
402[86%]
221[87%]
309[78%]
206[89%]
200[82%]
meta-llama/Llama-3.1-8B-Instruct
8B
Done
NVIDIA L40S
92.12h
0.453
38488
500[3%]
2245[64%]
67[70%]
503[65%]
222[51%]
348[80%]
69[45%]
321[69%]
180[71%]
245[62%]
166[72%]
159[65%]
mistralai/Mistral-Large-Instruct
123B
Done
NVIDIA L40S
102.57h
0.736
62503
6308[42%]
2703[77%]
88[92%]
655[85%]
342[79%]
401[92%]
119[78%]
400[86%]
224[89%]
303[77%]
214[92%]
226[93%]
mistralai/Mistral-7B-Instruct-v0.3
7B
Done
NVIDIA A6000
97.63h
0.533
45265
729[5%]
2517[72%]
68[71%]
610[79%]
270[62%]
408[94%]
78[51%]
387[83%]
213[84%]
295[75%]
199[86%]
184[75%]
microsoft/Phi-3-mini-4k-instruct
3.8B
Done
NVIDIA A6000
102.76h
0.281
23881
650[4%]
2085[59%]
37[39%]
482[62%]
207[48%]
312[72%]
58[38%]
318[68%]
143[57%]
208[53%]
166[72%]
145[59%]
microsoft/Phi-3-mini-128k-instruct
3.8B
Done
NVIDIA A6000
111.50h
0.285
24247
524[4%]
2199[63%]
41[43%]
507[66%]
221[51%]
327[75%]
63[41%]
343[74%]
142[56%]
208[53%]
178[77%]
162[66%]
Qwen/Qwen2-0.5B
0.5B
Done
NVIDIA A6000
16.64h
0.044
3750
47[0%]
10[0%]
2[2%]
17[2%]
6[1%]
13[3%]
0[0%]
14[3%]
5[2%]
16[4%]
2[1%]
1[0%]
Qwen/Qwen2-1.5B
1.5B
Done
NVIDIA A6000
27.36h
0.166
14124
447[3%]
1429[41%]
23[24%]
285[37%]
147[34%]
161[37%]
42[28%]
209[45%]
57[23%]
90[23%]
110[47%]
108[44%]
Qwen/Qwen2-7B
7B
Done
NVIDIA A6000
78.72h
0.289
24546
448[3%]
1727[49%]
48[50%]
378[49%]
179[41%]
247[57%]
55[36%]
267[57%]
104[41%]
136[35%]
155[67%]
135[55%]
Qwen/Qwen2.5-0.5B
0.5B
Done
NVIDIA A6000
14.35h
0.015
1329
70[0%]
0[0%]
0[0%]
2[0%]
1[0%]
1[0%]
0[0%]
2[0%]
0[0%]
1[0%]
1[0%]
0[0%]
Qwen/Qwen2.5-7B
7B
Done
NVIDIA A40
47.99h
0.510
43321
1531[10%]
2568[73%]
74[77%]
607[78%]
265[61%]
416[96%]
76[50%]
383[82%]
222[88%]
298[76%]
198[85%]
185[76%]
osunlp/TableLlama
based on LLama 7B
Done
NVIDIA A6000
78.44h
0.731
62116
2472[17%]
2556[73%]
82[85%]
656[85%]
306[70%]
432[99%]
91[60%]
413[89%]
234[92%]
332[84%]
209[90%]
197[81%]
01-ai/Yi-1.5-6B
6B
Done
NVIDIA A6000
51.96h
0.068
5832
24[0%]
545[15%]
6[6%]
71[9%]
31[7%]
46[11%]
1[1%]
51[11%]
25[10%]
39[10%]
24[10%]
14[6%]
01-ai/Yi-1.5-9B
9B
Done
NVIDIA A6000
77.20h
0.177
15080
512[3%]
1769[50%]
16[17%]
296[38%]
153[35%]
159[37%]
38[25%]
207[45%]
67[26%]
86[22%]
119[51%]
107[44%]
microsoft/Phi-3-small-8k-instruct
7B
Done
NVIDIA A6000
280.07h
0.392
33332
205[1%]
2376[68%]
63[66%]
580[75%]
249[57%]
394[91%]
74[49%]
370[80%]
199[79%]
283[72%]
184[79%]
176[72%]
google/gemma-3-27b-it
27B
Done
NVIDIA A40
87.84h
0.377
32037
2859[19%]
2248[64%]
59[61%]
523[68%]
252[58%]
330[76%]
85[56%]
323[69%]
174[69%]
230[58%]
182[78%]
170[70%]
deepseek-ai/DeepSeek-R1
651B
Done
NVIDIA A40
1860.55h
0.693
58921
6179[42%]
2775[79%]
93[97%]
703[91%]
365[84%]
431[99%]
127[84%]
429[92%]
240[95%]
332[84%]
225[97%]
239[98%]
Qwen/Qwen2.5-72B-Instruct
73B
Done
NVIDIA A40
113.89h
0.735
62439
7262[49%]
2704[77%]
92[96%]
686[89%]
356[82%]
422[97%]
125[82%]
417[90%]
236[93%]
328[83%]
221[95%]
229[94%]
Qwen/Qwen2.5-72B-Instruct
73B
Done
NVIDIA A40
113.89h
0.735
62439
7262[49%]
2704[77%]
92[96%]
686[89%]
356[82%]
422[97%]
125[82%]
417[90%]
236[93%]
328[83%]
221[95%]
229[94%]
microsoft/Phi-4-mini-instruct
4B
Done
NVIDIA A6000
66.67h
0.288
24533
105[1%]
2183[62%]
42[44%]
491[63%]
219[50%]
314[72%]
66[43%]
338[73%]
129[51%]
194[49%]
173[75%]
166[68%]
deepseek-r1-distill-llama-70b
70B
Done
NVIDIA A6000
348.73h
0.333
28280
3465[23%]
2457[70%]
62[65%]
632[82%]
308[71%]
386[89%]
110[72%]
387[83%]
197[78%]
272[69%]
205[88%]
217[89%]
microsoft/Phi-4-mini-reasoning
4B
Done
NVIDIA A6000
37.60h
0.008
757
24[0%]
1[0%]
0[0%]
1[0%]
1[0%]
0[0%]
0[0%]
1[0%]
0[0%]
0[0%]
1[0%]
0[0%]
qwen3-30b-a3b-thinking-2507
31B
Done
NVIDIA L40S
273.90h
0.626
53179
5905[40%]
2733[78%]
90[94%]
696[90%]
356[82%]
430[99%]
126[83%]
418[90%]
242[96%]
325[82%]
220[95%]
241[99%]
qwen3-235b-a22b
235B
Done
NVIDIA L40S
996.48h
0.578
49141
7190[48%]
2511[71%]
91[95%]
645[83%]
341[78%]
395[91%]
121[80%]
391[84%]
224[89%]
299[76%]
208[90%]
229[94%]
mistral-large-3-675b-instruct-2512
675B
Done
NVIDIA L40S
124.62h
0.739
62752
6762[46%]
2707[77%]
84[88%]
670[87%]
352[81%]
402[92%]
123[81%]
409[88%]
222[88%]
311[79%]
210[91%]
233[95%]
openai-gpt-oss-120b
In progress
Alta
Qwen/Qwen3-0.6B
0.6B
In progress
NVIDIA A6000
20.56h
0.181
15412
4273[29%]
1151[33%]
29[30%]
325[42%]
151[35%]
203[47%]
49[32%]
238[51%]
67[26%]
116[29%]
121[52%]
117[48%]
qwen3-30b-a3b-instruct-2507
To do
Alta
qwen3-32b
To do
Alta
glm-4.7
To do
Alta
apertus-70b-instruct-2509
To do
Alta
openai/gpt-oss-20b
To do
Alta
RUCKBReasoning/TableLLM-13b
To do
Alta
Qwen/Qwen3-Next-80B-A3B-Instruct
To do
Alta
qwen3-coder-30b-a3b-instruct
To do
Media
qwen3-vl-30b-a3b-instruct
To do
Media
qwen3-omni-30b-a3b-instruct
To do
Media
internvl3.5-30b-a3b
To do
Media
moonshotai/Kimi-K2-Instruct-0905
To do
Media
Qwen/Qwen3-4B-Instruct-2507
To do
Media
zai-org/GLM-4.7-Flash
To do
Media
devstral-2-123b-instruct-2512
To do
Bassa
medgemma-27b-it
To do
Bassa
teuken-7b-instruct-research
To do
Bassa
llama-3.1-sauerkrautlm-70b-instruct
To do
Bassa