Here are
23 public repositories
matching this topic...
We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sora-2 surpasses GPT5 by 10% on eyeballing puzzles and reaches 69% accuracy on MMMU.
Updated
Jun 3, 2026
Python
[CVPR 2026] Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
Updated
Aug 2, 2025
Python
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
Updated
Jan 16, 2026
Python
[CVPR 2026] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Updated
Mar 30, 2026
Python
Updated
Mar 9, 2026
Python
A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal LLMs
[ICLR 2026 Oral & ICML 2026] Generative Universal Verifier as Multimodal Meta-Reasoner
Updated
May 29, 2026
Python
Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)
Updated
Feb 4, 2026
Python
[NeurIPS 2025] Official implementation for the paper "SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning"
Updated
Sep 19, 2025
Python
[CVPR 2026] AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition
Updated
Apr 27, 2026
Python
ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model
Updated
Apr 30, 2026
Python
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning
Updated
Jun 10, 2025
Python
[ICLR 2026] SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes
Updated
Mar 22, 2026
Python
[ICLR 2026]🌴 ARES is an open-source framework for adaptive multimodal reasoning, featuring a two-stage pipeline—Adaptive Cold-Start and Entropy-Shaped Policy Optimization—to balance reasoning depth and efficiency.
Updated
Feb 3, 2026
Python
Project COLON-X (Shaping the neXt frontier in intelligent COLONoscopy)
Updated
May 2, 2026
Python
Repository to train multimodal latent reasoning tokens for Qwen 2.5 VL.
Updated
Apr 29, 2026
Python
Code and dataset for TurtleBench: A Visual Programming Benchmark in Turtle Geometry
Updated
Jan 30, 2025
Python
Paragraph-level Policy Optimization for Vision-Language Deepfake Detection - ICML 2026
Updated
May 8, 2026
Python
CRYSTAL: Beyond Final Answers: Benchmark for Transparent Multimodal Reasoning Evaluation | arXiv 2603.13099
Improve this page
Add a description, image, and links to the
multimodal-reasoning
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
multimodal-reasoning
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.