OCC-MLLM-Alpha:Empowering Multi-modal Large Language Model for the Understanding of Occluded Objects with Self-Supervised Test-Time Learning

Abstract

Author(s): Shuxin Yang* and Xinhan Di

There is a gap in the understanding of occluded objects in existing large-scale visual language multi-modal models. Current state of the art multi modal models fail to provide satisfactory results in describing occluded objects through universal visual encoders and supervised learning strategies. Therefore, we introduce a multi-modal large language framework and corresponding self-supervised learning strategy with support of 3D generation. We start our experiments comparing with the state of the art models in the evaluation of a large scale dataset SOM Video. The initial results demonstrate the improvement of 16.92% in comparison with the state of the art VLM models.

References Blog | Auto Insurance in California Best Betting Sites in Netherlands Blog | Auto Insurance in USA Best Betting Sites in Morocco Blog | Car Insurance in Australia Find Lawyer in Tennessee Best Betting Sites in Romania Blog | Best Betting Sites in Bangladesh Blog | Best Betting Sites in Belgium Best Betting Sites in Germany Find Lawyer in Oregon Blog | Best Betting Sites in Spain Blog | Best Betting Sites in Ethiopia Blog | Best Betting Sites in France Find Lawyer in New Jersey Blog | Best Betting Sites in Sweden Find Lawyer in Missouri Blog - Find Lawyer in California Find Lawyer in Georgia Blog - Find Lawyer in Louisiana