2 items across 2 digests
A system using 768GB of Intel Optane DIMM memory successfully ran a 1-trillion-parameter LLM with a single GPU, achieving 4 tokens per second performance. This demonstrates how alternative memory architectures can enable large AI model deployment without massive GPU clusters, potentially reducing AI infrastructure costs.
German researchers developed a new Transformer architecture designed to handle both mathematical reasoning requiring thinking time and everyday knowledge requiring memory. This addresses a key limitation in current AI systems that struggle to balance computational approaches for different types of problems.