LLM Leaderboard
Compare the performance of different language models on the MammoTab dataset
This leaderboard has been generated using the MammoTab sample dataset, which consists of 844 tables containing a total of 84,145 distinct mentions.
Model | Parameters | Status | CEA | NILs | Acronyms | Aliases | Typos | Generic Types | Specific Types | Single Domain | Multi Domain |
---|---|---|---|---|---|---|---|---|---|---|---|
Gemini-1.0 Pro | 1.8B | To do | 0.85[85%] | 78[78%] | 56[80%] | 89[89%] | 23[66%] | 45[90%] | 42[84%] | 85[85%] | 75[75%] |
Gemini-1.5 Pro | 3.2B | In progress | 0.88[88%] | 82[82%] | 59[84%] | 92[92%] | 25[71%] | 48[96%] | 45[90%] | 88[88%] | 78[78%] |
Gemini-1.5 Flash | 2.5B | Done | 0.87[87%] | 80[80%] | 58[83%] | 91[91%] | 24[69%] | 47[94%] | 44[88%] | 87[87%] | 77[77%] |
Gemma | 2B | To do | 0.84[84%] | 76[76%] | 55[79%] | 88[88%] | 22[63%] | 44[88%] | 41[82%] | 84[84%] | 74[74%] |
Gemma 2 | 7B | In progress | 0.86[86%] | 79[79%] | 57[81%] | 90[90%] | 23[66%] | 46[92%] | 43[86%] | 86[86%] | 76[76%] |
Phi-3 Mini | 3.8B | To do | 0.83[83%] | 75[75%] | 54[77%] | 87[87%] | 21[60%] | 43[86%] | 40[80%] | 83[83%] | 73[73%] |
Phi-3 Small | 7B | In progress | 0.85[85%] | 76[76%] | 55[79%] | 88[88%] | 22[63%] | 44[88%] | 41[82%] | 84[84%] | 74[74%] |
Phi-3 Medium | 14B | Done | 0.87[87%] | 79[79%] | 57[81%] | 90[90%] | 23[66%] | 46[92%] | 43[86%] | 86[86%] | 76[76%] |
Phi-3.5 Mini | 4.2B | To do | 0.84[84%] | 77[77%] | 56[80%] | 89[89%] | 22[63%] | 45[90%] | 42[84%] | 85[85%] | 75[75%] |
Mixtral | 7B | In progress | 0.89[89%] | 83[83%] | 60[86%] | 93[93%] | 26[74%] | 49[98%] | 46[92%] | 89[89%] | 79[79%] |
Mixtral-Instruct | 8B | Done | 0.9[90%] | 84[84%] | 61[87%] | 94[94%] | 27[77%] | 50[100%] | 47[94%] | 90[90%] | 80[80%] |
Claude 3 Sonnet | 7B | In progress | 0.91[91%] | 85[85%] | 62[89%] | 95[95%] | 28[80%] | 51[102%] | 48[96%] | 91[91%] | 81[81%] |
Claude 3 Haiku | 3.5B | To do | 0.89[89%] | 83[83%] | 60[86%] | 93[93%] | 26[74%] | 49[98%] | 46[92%] | 89[89%] | 79[79%] |
Claude 3.5 Sonnet | 8.5B | Done | 0.92[92%] | 86[86%] | 63[90%] | 96[96%] | 29[83%] | 52[104%] | 49[98%] | 92[92%] | 82[82%] |
Llama 3.2 | 7B | In progress | 0.88[88%] | 82[82%] | 59[84%] | 92[92%] | 25[71%] | 48[96%] | 45[90%] | 88[88%] | 78[78%] |
Llama 3.1 | 6.5B | To do | 0.87[87%] | 81[81%] | 58[83%] | 91[91%] | 24[69%] | 47[94%] | 44[88%] | 87[87%] | 77[77%] |
Qwen 2 | 7B | In progress | 0.86[86%] | 80[80%] | 57[81%] | 90[90%] | 23[66%] | 46[92%] | 43[86%] | 86[86%] | 76[76%] |
Qwen-2.5 | 8B | Done | 0.88[88%] | 82[82%] | 59[84%] | 92[92%] | 25[71%] | 48[96%] | 45[90%] | 88[88%] | 78[78%] |
Yi-1.5 | 6B | To do | 0.85[85%] | 78[78%] | 56[80%] | 89[89%] | 23[66%] | 45[90%] | 42[84%] | 85[85%] | 75[75%] |