Skip to main content

LLM Leaderboard

Compare the performance of different language models on the MammoTab dataset

This leaderboard has been generated using the MammoTab sample dataset, which consists of 870 tables containing a total of 84,907 distinct mentions.

This leaderboard is managed by:

Marco Cremaschi, Federico Belotti, Matteo Palmonari from the University of Milano-Bicoccaand Jennifer D'Souza from TIB Leibniz Information Centre for Science and Technology
Model
Parameters
Status
GPU
Total Time
Accuracy
Total Correct
NILs
Acronyms
Generic Types
Specific Types
Single Domain
Multi Domain
Small % Cols
Medium % Cols
Large % Cols
Small % Rows
Medium % Rows
Large % Rows
tiiuae/Falcon3-7B-Instruct
7B
Done
NVIDIA L40S
18.58h
0.276
23508
576[4%]
2192[62%]
58[60%]
517[67%]
228[52%]
347[80%]
66[43%]
314[68%]
195[77%]
260[66%]
162[70%]
153[63%]
google/gemma-2-2b-it
2B
Done
NVIDIA A6000
69.67h
0.407
34608
680[5%]
2323[66%]
58[60%]
529[68%]
226[52%]
361[83%]
69[45%]
354[76%]
164[65%]
234[59%]
185[80%]
168[69%]
google/gemma-2-9b-it
9B
Done
NVIDIA L40S
39.26h
0.401
34063
676[5%]
2226[63%]
56[58%]
542[70%]
241[55%]
357[82%]
71[47%]
359[77%]
168[66%]
255[65%]
176[76%]
167[68%]
meta-llama/Llama-3.2-1B
1B
Done
NVIDIA A6000
23.12h
0.048
4142
170[1%]
30[1%]
4[4%]
18[2%]
7[2%]
15[3%]
1[1%]
13[3%]
8[3%]
17[4%]
2[1%]
3[1%]
meta-llama/Llama-3.2-3B
3B
Done
NVIDIA A6000
58.73h
0.303
25771
610[4%]
2171[62%]
43[45%]
461[60%]
205[47%]
299[69%]
64[42%]
317[68%]
123[49%]
177[45%]
169[73%]
158[65%]
meta-llama/Llama-3.3-70B-Instruct
70B
Done
NVIDIA L40S
235.66h
0.629
53473
1700[11%]
2570[73%]
71[74%]
644[83%]
303[70%]
412[95%]
92[61%]
402[86%]
221[87%]
309[78%]
206[89%]
200[82%]
meta-llama/Llama-3.1-8B-Instruct
8B
Done
NVIDIA L40S
92.12h
0.453
38488
500[3%]
2245[64%]
67[70%]
503[65%]
222[51%]
348[80%]
69[45%]
321[69%]
180[71%]
245[62%]
166[72%]
159[65%]
mistralai/Mistral-Large-Instruct
123B
Done
NVIDIA L40S
102.57h
0.736
62503
6308[42%]
2703[77%]
88[92%]
655[85%]
342[79%]
401[92%]
119[78%]
400[86%]
224[89%]
303[77%]
214[92%]
226[93%]
mistralai/Mistral-7B-Instruct-v0.3
7B
Done
NVIDIA A6000
97.63h
0.533
45265
729[5%]
2517[72%]
68[71%]
610[79%]
270[62%]
408[94%]
78[51%]
387[83%]
213[84%]
295[75%]
199[86%]
184[75%]
microsoft/Phi-3-mini-4k-instruct
3.8B
Done
NVIDIA A6000
102.76h
0.281
23881
650[4%]
2085[59%]
37[39%]
482[62%]
207[48%]
312[72%]
58[38%]
318[68%]
143[57%]
208[53%]
166[72%]
145[59%]
microsoft/Phi-3-mini-128k-instruct
3.8B
Done
NVIDIA A6000
111.50h
0.285
24247
524[4%]
2199[63%]
41[43%]
507[66%]
221[51%]
327[75%]
63[41%]
343[74%]
142[56%]
208[53%]
178[77%]
162[66%]
Qwen/Qwen2-0.5B
0.5B
Done
NVIDIA A6000
16.64h
0.044
3750
47[0%]
10[0%]
2[2%]
17[2%]
6[1%]
13[3%]
0[0%]
14[3%]
5[2%]
16[4%]
2[1%]
1[0%]
Qwen/Qwen2-1.5B
1.5B
Done
NVIDIA A6000
27.36h
0.166
14124
447[3%]
1429[41%]
23[24%]
285[37%]
147[34%]
161[37%]
42[28%]
209[45%]
57[23%]
90[23%]
110[47%]
108[44%]
Qwen/Qwen2-7B
7B
Done
NVIDIA A6000
78.72h
0.289
24546
448[3%]
1727[49%]
48[50%]
378[49%]
179[41%]
247[57%]
55[36%]
267[57%]
104[41%]
136[35%]
155[67%]
135[55%]
Qwen/Qwen2.5-0.5B
0.5B
Done
NVIDIA A6000
14.35h
0.015
1329
70[0%]
0[0%]
0[0%]
2[0%]
1[0%]
1[0%]
0[0%]
2[0%]
0[0%]
1[0%]
1[0%]
0[0%]
Qwen/Qwen2.5-7B
7B
Done
NVIDIA A40
47.99h
0.510
43321
1531[10%]
2568[73%]
74[77%]
607[78%]
265[61%]
416[96%]
76[50%]
383[82%]
222[88%]
298[76%]
198[85%]
185[76%]
osunlp/TableLlama
based on LLama 7B
Done
NVIDIA A6000
78.44h
0.731
62116
2472[17%]
2556[73%]
82[85%]
656[85%]
306[70%]
432[99%]
91[60%]
413[89%]
234[92%]
332[84%]
209[90%]
197[81%]
01-ai/Yi-1.5-6B
6B
Done
NVIDIA A6000
51.96h
0.068
5832
24[0%]
545[15%]
6[6%]
71[9%]
31[7%]
46[11%]
1[1%]
51[11%]
25[10%]
39[10%]
24[10%]
14[6%]
01-ai/Yi-1.5-9B
9B
Done
NVIDIA A6000
77.20h
0.177
15080
512[3%]
1769[50%]
16[17%]
296[38%]
153[35%]
159[37%]
38[25%]
207[45%]
67[26%]
86[22%]
119[51%]
107[44%]
microsoft/Phi-3-small-8k-instruct
7B
Done
NVIDIA A6000
280.07h
0.392
33332
205[1%]
2376[68%]
63[66%]
580[75%]
249[57%]
394[91%]
74[49%]
370[80%]
199[79%]
283[72%]
184[79%]
176[72%]
Qwen/Qwen3-235B-A22B
235B
Unusable results
NVIDIA A40
deepseek-ai/DeepSeek-R1
651B
Unusable results
NVIDIA A40