MEME (May 12, 2026) introduces six memory evaluation tasks spanning multi-entity and evolving dimensions, including three not scored by prior workâCascade (downstream effects of a state change), Absence (reasoning over removed entities), and Deletion (post-removal state). Evaluating six memory systems across three paradigms on 100 controlled episodes, all systems fail on dependency reasoning despite adequate static retrieval: average Cascade accuracy is 3% and Absence accuracy is 1%. The authors conclude that current memory paradigms do not support the relational updating required for robust long-horizon agent operation.