MultiNet: Evaluating Multimodal Generalization in Next Generation Models
Research and Software for evaluating Next Generation Model Generalization on Diverse Multimodal Tasks
Multinet is a comprehensive benchmarking initiative for evaluating multimodal models across vision, language, and action tasks. It provides:
- Well-reasoned and curated set of evaluation datasets to assess multimodal understanding and action taking capabilities of SoTA multimodal models
- Varied evaluation tasks including Common Sense Reasoning, Object Detection in the wild, Spatial Reasoning, Visual Question Answering, Robotics, and complex Multi-Agent game-playing
- Open-source toolkit to standardize the process of obtaining and utilizing data from various sources and of different formats
- Open-source adaptations of diverse models to various out-of-distribution task domains
- Comprehensive evaluation protocols for profiling SoTA VLMs, VLAs, and generalist models on benchmark datasets
Multinet v1.0 Leaderboard
Evaluation across 7 diverse tasks across robotics, digital control, and multimodal reasoning.
Leaderboard Legend and Notes
Explore our Research
Previous Benchmark Releases
Comprehensive benchmarks for evaluating multimodal models across various modalities and tasks
Benchmark Releases
Our benchmarking efforts:
- v0.2 - Gameplay: Benchmarking models in procedurally generated game environments
- v0.1 - Robotics: Evaluating models on real-world robotics tasks
Software Releases
Open-source tools and frameworks for building and evaluating multimodal models of the future
Software Releases
Our software contributions:
- Eval Harness: Systematic evaluation framework for multimodal models
- Toolkit: Data curation SDK for evaluation datasets
- GenESIS: Framework for mapping language models to actions
News
Multinet v1.0 released! We release our most comprehensive benchmark yet - evaluating a SoTA VLM, VLA, and generalist models on a wide variety of multimodal understanding and action datasets. Read more here.
Multinet v0.2 released! We systematically profile state-of-the-art VLAs and VLMs perform in procedurally generated OOD game environments. Read more about our recent release here.
Paper accepted at ICML 2025! Our paper detailing the Open-Source contributions of Multinet that benefit the AI community has been accepted at the CodeML Workshop at ICML 2025! Read our paper here.
Multinet v0.1 released! How well do state-of-the-art VLMs and VLAs perform on real-world robotics tasks? Read more on our release Page.
Introducing Multinet! A comprehensive benchmark for evaluating multimodal generalization in next generation models. Learn more here.
Research Talks & Demos
Explore our collection of presentations showcasing Multinet's vision, progress, and development journey!
Citation
@article{guruprasad2025multinet,
author = {Pranav Guruprasad and Sudipta Chowdhury and Harshvardhan Sikka and Mridul Sharma and Helen Lu and Sean Rivera and Aryan Khurana and Yangyue Wang},
title = {MultiNet v1.0: A Comprehensive Benchmark for Evaluating Multimodal Reasoning and Action Models Across Diverse Domains},
journal = {Manifold Research Publications},
year = {2025},
note = {https://multinet.ai/static/pages/Multinetv1.html},
doi = {10.5281/zenodo.17404313}
}
MultiNet