Shanghai AI Lab Open-Sources MemVerse: Giving Agents a "Hippocampus" for Multimodal Memory

Shanghai Artificial Intelligence Laboratory has open-sourced MemVerse, the first universal multimodal memory framework for AI agents, overcoming the "modality isolation and slow response" limitations of traditional systems. For the first time, agents gain cross-modal memory across images, audio, and video, achieving "growable, internalizable, second-level response" lifelong memory.

Conventional AI memory is mostly text-based, relying on mechanical retrieval without understanding spatiotemporal logic or cross-modal semantics. MemVerse employs a three-layer bionic architecture: a central coordinator acts like the "prefrontal cortex" for active scheduling; short-term memory uses sliding windows for conversation coherence; long-term memory builds multimodal knowledge graphs, categorizing core memory (user profiles), episodic memory (event timelines), and semantic memory (abstract concepts), fundamentally mitigating hallucinations.

A pioneering "parameterized distillation" technique periodically fine-tunes high-value long-term knowledge into dedicated small models, boosting retrieval speed over 10x.

Benchmarks show strong results: On ScienceQA, GPT-4o-mini with MemVerse jumped from 76.82 to 85.48; MSR-VTT text-to-video R@1 recall hit 90.4%, far surpassing CLIP (29.7%) and dedicated large model ExCae (67.7%). Memory compression and distillation cut Token usage by 90%, balancing accuracy and cost. MemVerse is now fully open-source: