LLM Leaderboard
Compare the performance of different language models on the MammoTab dataset
This leaderboard has been generated using the MammoTab sample dataset, which consists of 870 tables containing a total of 84,907 distinct mentions.
This leaderboard is managed by:
Marco Cremaschi, Federico Belotti, Matteo Palmonari from the University of Milano-Bicoccaand Jennifer D'Souza from TIB Leibniz Information Centre for Science and Technology
Model | Parameters | Max Context | Status ↑ | System | Total Time | Accuracy | Total Correct | Out Context Prompt | NILs | Acronyms | Typos | Aliases | Generic Types | Specific Types | Single Domain | Multi Domain | Small % Cols | Medium % Cols | Large % Cols | Small % Rows | Medium % Rows | Large % Rows |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
google/gemma-2-2b-it | 2B | 8192 | Done | NVIDIA RTX A6000 | 69.67h | 0.407 | 34608 | 142 | 9545[64%] | 2323[66%] | 8168[67%] | 4957[70%] | 58[60%] | 529[68%] | 226[52%] | 361[83%] | 69[45%] | 354[76%] | 164[65%] | 234[59%] | 185[80%] | 168[69%] |
meta-llama/Llama-3.2-1B | 1B | 131072 | Done | NVIDIA RTX A6000 | 23.12h | 0.048 | 4142 | 0 | 1625[11%] | 30[1%] | 98[1%] | 60[1%] | 4[4%] | 18[2%] | 7[2%] | 15[3%] | 1[1%] | 13[3%] | 8[3%] | 17[4%] | 2[1%] | 3[1%] |
meta-llama/Llama-3.2-3B | 3B | 131072 | Done | NVIDIA RTX A6000 | 58.73h | 0.303 | 25771 | 0 | 7599[51%] | 2171[62%] | 7144[59%] | 3692[52%] | 43[45%] | 461[60%] | 205[47%] | 299[69%] | 64[42%] | 317[68%] | 123[49%] | 177[45%] | 169[73%] | 158[65%] |
microsoft/Phi-3-mini-4k-instruct | 3.8B | 4096 | Done | NVIDIA RTX A6000 | 102.76h | 0.281 | 23881 | 3944 | 7045[47%] | 2085[59%] | 6871[57%] | 3660[51%] | 37[39%] | 482[62%] | 207[48%] | 312[72%] | 58[38%] | 318[68%] | 143[57%] | 208[53%] | 166[72%] | 145[59%] |
Qwen/Qwen2-0.5B | 0.5B | 131072 | Done | NVIDIA RTX A6000 | 14.63h | 0.044 | 3741 | 0 | 1103[7%] | 6[0%] | 5[0%] | 3[0%] | 2[2%] | 17[2%] | 5[1%] | 14[3%] | 0[0%] | 15[3%] | 4[2%] | 17[4%] | 1[0%] | 1[0%] |
Qwen/Qwen2-1.5B | 1.5B | 131072 | Done | NVIDIA RTX A6000 | 27.36h | 0.166 | 14124 | 0 | 4491[30%] | 1429[41%] | 4468[37%] | 1691[24%] | 23[24%] | 285[37%] | 147[34%] | 161[37%] | 42[28%] | 209[45%] | 57[23%] | 90[23%] | 110[47%] | 108[44%] |
01-ai/Yi-1.5-6B | 6B | 4096 | Done | NVIDIA RTX A6000 | 51.96h | 0.068 | 5832 | 3543 | 1626[11%] | 545[15%] | 756[6%] | 370[5%] | 6[6%] | 71[9%] | 31[7%] | 46[11%] | 1[1%] | 51[11%] | 25[10%] | 39[10%] | 24[10%] | 14[6%] |
Qwen/Qwen2.5-0.5B | 0.5B | 32768 | Done | NVIDIA RTX A6000 | 14.35h | 0.015 | 1329 | 0 | 452[3%] | 0[0%] | 1[0%] | 0[0%] | 0[0%] | 2[0%] | 1[0%] | 1[0%] | 0[0%] | 2[0%] | 0[0%] | 1[0%] | 1[0%] | 0[0%] |
Qwen/Qwen2.5-7B | 7B | 131072 | Done | NVIDIA RTX A6000 | 47.99h | 0.510 | 43321 | 0 | 12705[86%] | 2568[73%] | 9314[77%] | 6101[86%] | 74[77%] | 607[78%] | 265[61%] | 416[96%] | 76[50%] | 383[82%] | 222[88%] | 298[76%] | 198[85%] | 185[76%] |