I've attempted several runs to replicate the results of Qwen/Qwen2.5-Math-7B-Instruct on College Math dataset but I'm getting ~41.8 which is too far off from the 46.8 as reported (despite using the ...