Abstract: Current Multi-Modal Large language Models (MMLMs) primarily rely on instance-level feature statistics for cross-modal alignment. However, they commonly suffer three inherent limitations ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results