2601.21937v1_Retrieval-Infused_Reasoning_Sandbox_A_Benchmark