all search terms
2024 年 8 月 19 日
xGenMM BLIP3 A Family of Open Large Multimodal Models
title: xGenMM BLIP3 A Family of Open Large Multimodal Models
publish date:
2024-08-16
authors:
Le Xue et.al.
paper id
2408.08872v1
download
abstracts:
This report introduces xGen-MM (also known as BLIP-3), a framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs. xGen-MM, short for xGen-MultiModal, expands the Salesforce xGen initiative on foundation AI models. Our models undergo rigorous evaluation across a range of tasks, including both single and multi-image benchmarks. Our pre-trained base model exhibits strong in-context learning capabilities and the instruction-tuned model demonstrates competitive performance among open-source LMMs with similar model sizes. In addition, we introduce a safety-tuned model with DPO, aiming to mitigate harmful behaviors such as hallucinations and improve safety. We open-source our models, curated large-scale datasets, and our fine-tuning codebase to facilitate further advancements in LMM research. Associated resources will be available on our project page above.
QA:
coming soon
编辑整理: wanghaisheng 更新日期:2024 年 8 月 19 日