1 item across 1 digest
Researchers at USC and University of Wisconsin-Madison developed a four-tier memory hierarchy system that reduces LLM reasoning costs by not storing all chain-of-thought tokens in expensive HBM memory. This matters to technologists because it could significantly lower the hardware costs of running large language models at scale.