LLM Leaderboard

Compare the performance of language models (continuously updated) on the MammoTab dataset

This leaderboard has been generated using the MammoTab sample dataset, which consists of 870 tables containing a total of 84,907 distinct mentions.

This leaderboard is managed by:

Marco Cremaschi, Federico Belotti, Matteo Palmonari from the University of Milano-Bicoccaand Jennifer D'Souza from TIB Leibniz Information Centre for Science and Technology

Model	Parameters	Status ↑	GPU	Total Time	Accuracy	Total Correct	NILs	Acronyms	Generic Types	Specific Types	Single Domain	Multi Domain	Small % Cols	Medium % Cols	Large % Cols	Small % Rows	Medium % Rows	Large % Rows
tiiuae/Falcon3-7B-Instruct	7B	Done	NVIDIA L40S	18.58h	0.276	23508	576[4%]	2192[62%]	58[60%]	517[67%]	228[52%]	347[80%]	66[43%]	314[68%]	195[77%]	260[66%]	162[70%]	153[63%]
google/gemma-2-2b-it	2B	Done	NVIDIA A6000	69.67h	0.407	34608	680[5%]	2323[66%]	58[60%]	529[68%]	226[52%]	361[83%]	69[45%]	354[76%]	164[65%]	234[59%]	185[80%]	168[69%]
google/gemma-2-9b-it	9B	Done	NVIDIA L40S	39.26h	0.401	34063	676[5%]	2226[63%]	56[58%]	542[70%]	241[55%]	357[82%]	71[47%]	359[77%]	168[66%]	255[65%]	176[76%]	167[68%]
google/gemma-3-27b-it	27B	Done	NVIDIA A40	87.84h	0.377	32037	2859[19%]	2248[64%]	59[61%]	523[68%]	252[58%]	330[76%]	85[56%]	323[69%]	174[69%]	230[58%]	182[78%]	170[70%]
meta-llama/Llama-3.2-1B	1B	Done	NVIDIA A6000	23.12h	0.048	4142	170[1%]	30[1%]	4[4%]	18[2%]	7[2%]	15[3%]	1[1%]	13[3%]	8[3%]	17[4%]	2[1%]	3[1%]
meta-llama/Llama-3.2-3B	3B	Done	NVIDIA A6000	58.73h	0.303	25771	610[4%]	2171[62%]	43[45%]	461[60%]	205[47%]	299[69%]	64[42%]	317[68%]	123[49%]	177[45%]	169[73%]	158[65%]
meta-llama/Llama-3.3-70B-Instruct	70B	Done	NVIDIA L40S	235.66h	0.629	53473	1700[11%]	2570[73%]	71[74%]	644[83%]	303[70%]	412[95%]	92[61%]	402[86%]	221[87%]	309[78%]	206[89%]	200[82%]
meta-llama/Llama-3.1-8B-Instruct	8B	Done	NVIDIA L40S	92.12h	0.453	38488	500[3%]	2245[64%]	67[70%]	503[65%]	222[51%]	348[80%]	69[45%]	321[69%]	180[71%]	245[62%]	166[72%]	159[65%]
mistralai/Mistral-Large-Instruct	123B	Done	NVIDIA L40S	102.57h	0.736	62503	6308[42%]	2703[77%]	88[92%]	655[85%]	342[79%]	401[92%]	119[78%]	400[86%]	224[89%]	303[77%]	214[92%]	226[93%]
mistralai/Mistral-7B-Instruct-v0.3	7B	Done	NVIDIA A6000	97.63h	0.533	45265	729[5%]	2517[72%]	68[71%]	610[79%]	270[62%]	408[94%]	78[51%]	387[83%]	213[84%]	295[75%]	199[86%]	184[75%]
microsoft/Phi-3-mini-4k-instruct	3.8B	Done	NVIDIA A6000	102.76h	0.281	23881	650[4%]	2085[59%]	37[39%]	482[62%]	207[48%]	312[72%]	58[38%]	318[68%]	143[57%]	208[53%]	166[72%]	145[59%]
microsoft/Phi-3-mini-128k-instruct	3.8B	Done	NVIDIA A6000	111.50h	0.285	24247	524[4%]	2199[63%]	41[43%]	507[66%]	221[51%]	327[75%]	63[41%]	343[74%]	142[56%]	208[53%]	178[77%]	162[66%]
Qwen/Qwen2-0.5B	0.5B	Done	NVIDIA A6000	16.64h	0.044	3750	47[0%]	10[0%]	2[2%]	17[2%]	6[1%]	13[3%]	0[0%]	14[3%]	5[2%]	16[4%]	2[1%]	1[0%]
Qwen/Qwen2-1.5B	1.5B	Done	NVIDIA A6000	27.36h	0.166	14124	447[3%]	1429[41%]	23[24%]	285[37%]	147[34%]	161[37%]	42[28%]	209[45%]	57[23%]	90[23%]	110[47%]	108[44%]
Qwen/Qwen2-7B	7B	Done	NVIDIA A6000	78.72h	0.289	24546	448[3%]	1727[49%]	48[50%]	378[49%]	179[41%]	247[57%]	55[36%]	267[57%]	104[41%]	136[35%]	155[67%]	135[55%]
Qwen/Qwen2.5-0.5B	0.5B	Done	NVIDIA A6000	14.35h	0.015	1329	70[0%]	0[0%]	0[0%]	2[0%]	1[0%]	1[0%]	0[0%]	2[0%]	0[0%]	1[0%]	1[0%]	0[0%]
Qwen/Qwen2.5-7B	7B	Done	NVIDIA A40	47.99h	0.510	43321	1531[10%]	2568[73%]	74[77%]	607[78%]	265[61%]	416[96%]	76[50%]	383[82%]	222[88%]	298[76%]	198[85%]	185[76%]
osunlp/TableLlama	based on LLama 7B	Done	NVIDIA A6000	78.44h	0.731	62116	2472[17%]	2556[73%]	82[85%]	656[85%]	306[70%]	432[99%]	91[60%]	413[89%]	234[92%]	332[84%]	209[90%]	197[81%]
01-ai/Yi-1.5-6B	6B	Done	NVIDIA A6000	51.96h	0.068	5832	24[0%]	545[15%]	6[6%]	71[9%]	31[7%]	46[11%]	1[1%]	51[11%]	25[10%]	39[10%]	24[10%]	14[6%]
01-ai/Yi-1.5-9B	9B	Done	NVIDIA A6000	77.20h	0.177	15080	512[3%]	1769[50%]	16[17%]	296[38%]	153[35%]	159[37%]	38[25%]	207[45%]	67[26%]	86[22%]	119[51%]	107[44%]
microsoft/Phi-3-small-8k-instruct	7B	Done	NVIDIA A6000	280.07h	0.392	33332	205[1%]	2376[68%]	63[66%]	580[75%]	249[57%]	394[91%]	74[49%]	370[80%]	199[79%]	283[72%]	184[79%]	176[72%]
google/gemma-3-27b-it	27B	Done	NVIDIA A40	87.84h	0.377	32037	2859[19%]	2248[64%]	59[61%]	523[68%]	252[58%]	330[76%]	85[56%]	323[69%]	174[69%]	230[58%]	182[78%]	170[70%]
deepseek-ai/DeepSeek-R1	651B	Done	NVIDIA A40	1860.55h	0.693	58921	6179[42%]	2775[79%]	93[97%]	703[91%]	365[84%]	431[99%]	127[84%]	429[92%]	240[95%]	332[84%]	225[97%]	239[98%]
Qwen/Qwen2.5-72B-Instruct	73B	Done	NVIDIA A40	113.89h	0.735	62439	7262[49%]	2704[77%]	92[96%]	686[89%]	356[82%]	422[97%]	125[82%]	417[90%]	236[93%]	328[83%]	221[95%]	229[94%]
Qwen/Qwen2.5-72B-Instruct	73B	Done	NVIDIA A40	113.89h	0.735	62439	7262[49%]	2704[77%]	92[96%]	686[89%]	356[82%]	422[97%]	125[82%]	417[90%]	236[93%]	328[83%]	221[95%]	229[94%]
microsoft/Phi-4-mini-instruct	4B	Done	NVIDIA A6000	66.67h	0.288	24533	105[1%]	2183[62%]	42[44%]	491[63%]	219[50%]	314[72%]	66[43%]	338[73%]	129[51%]	194[49%]	173[75%]	166[68%]