LLM Leaderboard
Compare the performance of different language models on the MammoTab dataset
This leaderboard has been generated using the MammoTab sample dataset, which consists of 870 tables containing a total of 84,907 distinct mentions.
This leaderboard is managed by:
Marco Cremaschi, Federico Belotti, Matteo Palmonari from the University of Milano-Bicoccaand Jennifer D'Souza from TIB Leibniz Information Centre for Science and Technology
Model | Parameters | Status ↑ | GPU | Total Time | Accuracy | Total Correct | NILs | Acronyms | Generic Types | Specific Types | Single Domain | Multi Domain | Small % Cols | Medium % Cols | Large % Cols | Small % Rows | Medium % Rows | Large % Rows |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
tiiuae/Falcon3-7B-Instruct | 7B | Done | NVIDIA L40S | 18.58h | 0.276 | 23508 | 576[4%] | 2192[62%] | 58[60%] | 517[67%] | 228[52%] | 347[80%] | 66[43%] | 314[68%] | 195[77%] | 260[66%] | 162[70%] | 153[63%] |
google/gemma-2-2b-it | 2B | Done | NVIDIA A6000 | 69.67h | 0.407 | 34608 | 680[5%] | 2323[66%] | 58[60%] | 529[68%] | 226[52%] | 361[83%] | 69[45%] | 354[76%] | 164[65%] | 234[59%] | 185[80%] | 168[69%] |
google/gemma-2-9b-it | 9B | Done | NVIDIA L40S | 39.26h | 0.401 | 34063 | 676[5%] | 2226[63%] | 56[58%] | 542[70%] | 241[55%] | 357[82%] | 71[47%] | 359[77%] | 168[66%] | 255[65%] | 176[76%] | 167[68%] |
meta-llama/Llama-3.2-1B | 1B | Done | NVIDIA A6000 | 23.12h | 0.048 | 4142 | 170[1%] | 30[1%] | 4[4%] | 18[2%] | 7[2%] | 15[3%] | 1[1%] | 13[3%] | 8[3%] | 17[4%] | 2[1%] | 3[1%] |
meta-llama/Llama-3.2-3B | 3B | Done | NVIDIA A6000 | 58.73h | 0.303 | 25771 | 610[4%] | 2171[62%] | 43[45%] | 461[60%] | 205[47%] | 299[69%] | 64[42%] | 317[68%] | 123[49%] | 177[45%] | 169[73%] | 158[65%] |
meta-llama/Llama-3.3-70B-Instruct | 70B | Done | NVIDIA L40S | 235.66h | 0.629 | 53473 | 1700[11%] | 2570[73%] | 71[74%] | 644[83%] | 303[70%] | 412[95%] | 92[61%] | 402[86%] | 221[87%] | 309[78%] | 206[89%] | 200[82%] |
meta-llama/Llama-3.1-8B-Instruct | 8B | Done | NVIDIA L40S | 92.12h | 0.453 | 38488 | 500[3%] | 2245[64%] | 67[70%] | 503[65%] | 222[51%] | 348[80%] | 69[45%] | 321[69%] | 180[71%] | 245[62%] | 166[72%] | 159[65%] |
mistralai/Mistral-Large-Instruct | 123B | Done | NVIDIA L40S | 102.57h | 0.736 | 62503 | 6308[42%] | 2703[77%] | 88[92%] | 655[85%] | 342[79%] | 401[92%] | 119[78%] | 400[86%] | 224[89%] | 303[77%] | 214[92%] | 226[93%] |
mistralai/Mistral-7B-Instruct-v0.3 | 7B | Done | NVIDIA A6000 | 97.63h | 0.533 | 45265 | 729[5%] | 2517[72%] | 68[71%] | 610[79%] | 270[62%] | 408[94%] | 78[51%] | 387[83%] | 213[84%] | 295[75%] | 199[86%] | 184[75%] |
microsoft/Phi-3-mini-4k-instruct | 3.8B | Done | NVIDIA A6000 | 102.76h | 0.281 | 23881 | 650[4%] | 2085[59%] | 37[39%] | 482[62%] | 207[48%] | 312[72%] | 58[38%] | 318[68%] | 143[57%] | 208[53%] | 166[72%] | 145[59%] |
microsoft/Phi-3-mini-128k-instruct | 3.8B | Done | NVIDIA A6000 | 111.50h | 0.285 | 24247 | 524[4%] | 2199[63%] | 41[43%] | 507[66%] | 221[51%] | 327[75%] | 63[41%] | 343[74%] | 142[56%] | 208[53%] | 178[77%] | 162[66%] |
Qwen/Qwen2-0.5B | 0.5B | Done | NVIDIA A6000 | 16.64h | 0.044 | 3750 | 47[0%] | 10[0%] | 2[2%] | 17[2%] | 6[1%] | 13[3%] | 0[0%] | 14[3%] | 5[2%] | 16[4%] | 2[1%] | 1[0%] |
Qwen/Qwen2-1.5B | 1.5B | Done | NVIDIA A6000 | 27.36h | 0.166 | 14124 | 447[3%] | 1429[41%] | 23[24%] | 285[37%] | 147[34%] | 161[37%] | 42[28%] | 209[45%] | 57[23%] | 90[23%] | 110[47%] | 108[44%] |
Qwen/Qwen2-7B | 7B | Done | NVIDIA A6000 | 78.72h | 0.289 | 24546 | 448[3%] | 1727[49%] | 48[50%] | 378[49%] | 179[41%] | 247[57%] | 55[36%] | 267[57%] | 104[41%] | 136[35%] | 155[67%] | 135[55%] |
Qwen/Qwen2.5-0.5B | 0.5B | Done | NVIDIA A6000 | 14.35h | 0.015 | 1329 | 70[0%] | 0[0%] | 0[0%] | 2[0%] | 1[0%] | 1[0%] | 0[0%] | 2[0%] | 0[0%] | 1[0%] | 1[0%] | 0[0%] |
Qwen/Qwen2.5-7B | 7B | Done | NVIDIA A40 | 47.99h | 0.510 | 43321 | 1531[10%] | 2568[73%] | 74[77%] | 607[78%] | 265[61%] | 416[96%] | 76[50%] | 383[82%] | 222[88%] | 298[76%] | 198[85%] | 185[76%] |
osunlp/TableLlama | based on LLama 7B | Done | NVIDIA A6000 | 78.44h | 0.731 | 62116 | 2472[17%] | 2556[73%] | 82[85%] | 656[85%] | 306[70%] | 432[99%] | 91[60%] | 413[89%] | 234[92%] | 332[84%] | 209[90%] | 197[81%] |
01-ai/Yi-1.5-6B | 6B | Done | NVIDIA A6000 | 51.96h | 0.068 | 5832 | 24[0%] | 545[15%] | 6[6%] | 71[9%] | 31[7%] | 46[11%] | 1[1%] | 51[11%] | 25[10%] | 39[10%] | 24[10%] | 14[6%] |
01-ai/Yi-1.5-9B | 9B | Done | NVIDIA A6000 | 77.20h | 0.177 | 15080 | 512[3%] | 1769[50%] | 16[17%] | 296[38%] | 153[35%] | 159[37%] | 38[25%] | 207[45%] | 67[26%] | 86[22%] | 119[51%] | 107[44%] |
microsoft/Phi-3-small-8k-instruct | 7B | Done | NVIDIA A6000 | 280.07h | 0.392 | 33332 | 205[1%] | 2376[68%] | 63[66%] | 580[75%] | 249[57%] | 394[91%] | 74[49%] | 370[80%] | 199[79%] | 283[72%] | 184[79%] | 176[72%] |
Qwen/Qwen3-235B-A22B | 235B | Unusable results | NVIDIA A40 | |||||||||||||||
deepseek-ai/DeepSeek-R1 | 651B | Unusable results | NVIDIA A40 |