Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion
Paper
โข 2402.12195 โข Published
Models for this github repo that focuses on the modality isolation issues (image-text isolation and interimage isolation).
Detailed instructions are coming soon.