More and more large multimodal models (LMMs) are being released from time to time, but the finetuning of these models is not always straightforward. This codebase aims to provide a unified, minimal ...
Abstract: We introduce WildVideo, an open-world benchmark dataset designed to address how to assess hallucination of Large Multi-modal Models (LMMs) for understanding video-language interaction in the ...
LLaVA-OneVision-1.5-RL introduces a training recipe for multimodal reinforcement learning, building upon the foundation of LLaVA-OneVision-1.5. This framework is designed to democratize access to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results