Skip to main content

LLM Leaderboard

Compare the performance of different language models on the MammoTab dataset

This leaderboard has been generated using the MammoTab sample dataset, which consists of 844 tables containing a total of 84,145 distinct mentions.

Model
Parameters
Status
CEA
NILs
Acronyms
Aliases
Typos
Generic Types
Specific Types
Single Domain
Multi Domain
Gemini-1.0 Pro
1.8B
To do
0.85[85%]
78[78%]
56[80%]
89[89%]
23[66%]
45[90%]
42[84%]
85[85%]
75[75%]
Gemini-1.5 Pro
3.2B
In progress
0.88[88%]
82[82%]
59[84%]
92[92%]
25[71%]
48[96%]
45[90%]
88[88%]
78[78%]
Gemini-1.5 Flash
2.5B
Done
0.87[87%]
80[80%]
58[83%]
91[91%]
24[69%]
47[94%]
44[88%]
87[87%]
77[77%]
Gemma
2B
To do
0.84[84%]
76[76%]
55[79%]
88[88%]
22[63%]
44[88%]
41[82%]
84[84%]
74[74%]
Gemma 2
7B
In progress
0.86[86%]
79[79%]
57[81%]
90[90%]
23[66%]
46[92%]
43[86%]
86[86%]
76[76%]
Phi-3 Mini
3.8B
To do
0.83[83%]
75[75%]
54[77%]
87[87%]
21[60%]
43[86%]
40[80%]
83[83%]
73[73%]
Phi-3 Small
7B
In progress
0.85[85%]
76[76%]
55[79%]
88[88%]
22[63%]
44[88%]
41[82%]
84[84%]
74[74%]
Phi-3 Medium
14B
Done
0.87[87%]
79[79%]
57[81%]
90[90%]
23[66%]
46[92%]
43[86%]
86[86%]
76[76%]
Phi-3.5 Mini
4.2B
To do
0.84[84%]
77[77%]
56[80%]
89[89%]
22[63%]
45[90%]
42[84%]
85[85%]
75[75%]
Mixtral
7B
In progress
0.89[89%]
83[83%]
60[86%]
93[93%]
26[74%]
49[98%]
46[92%]
89[89%]
79[79%]
Mixtral-Instruct
8B
Done
0.9[90%]
84[84%]
61[87%]
94[94%]
27[77%]
50[100%]
47[94%]
90[90%]
80[80%]
Claude 3 Sonnet
7B
In progress
0.91[91%]
85[85%]
62[89%]
95[95%]
28[80%]
51[102%]
48[96%]
91[91%]
81[81%]
Claude 3 Haiku
3.5B
To do
0.89[89%]
83[83%]
60[86%]
93[93%]
26[74%]
49[98%]
46[92%]
89[89%]
79[79%]
Claude 3.5 Sonnet
8.5B
Done
0.92[92%]
86[86%]
63[90%]
96[96%]
29[83%]
52[104%]
49[98%]
92[92%]
82[82%]
Llama 3.2
7B
In progress
0.88[88%]
82[82%]
59[84%]
92[92%]
25[71%]
48[96%]
45[90%]
88[88%]
78[78%]
Llama 3.1
6.5B
To do
0.87[87%]
81[81%]
58[83%]
91[91%]
24[69%]
47[94%]
44[88%]
87[87%]
77[77%]
Qwen 2
7B
In progress
0.86[86%]
80[80%]
57[81%]
90[90%]
23[66%]
46[92%]
43[86%]
86[86%]
76[76%]
Qwen-2.5
8B
Done
0.88[88%]
82[82%]
59[84%]
92[92%]
25[71%]
48[96%]
45[90%]
88[88%]
78[78%]
Yi-1.5
6B
To do
0.85[85%]
78[78%]
56[80%]
89[89%]
23[66%]
45[90%]
42[84%]
85[85%]
75[75%]