Abstract: Multimodal information extraction has garnered increasing attention in the domain of natural language processing. However, current approaches largely overestimate the role of visual ...