Today's paper introduces the Needle In A Multimodal Haystack (MM-NIAH) benchmark, designed to evaluate the ability of multimodal large language models to comprehend long multimodal documents. Method Overview The method involves creating long "multimodal haystack" documents by concatenating interleaved image-text sequences from the OBELICS dataset. Then, "needles" containing key information are inserted into either the text or images of these documents.
Needle In A Multimodal Haystack
Needle In A Multimodal Haystack
Needle In A Multimodal Haystack
Today's paper introduces the Needle In A Multimodal Haystack (MM-NIAH) benchmark, designed to evaluate the ability of multimodal large language models to comprehend long multimodal documents. Method Overview The method involves creating long "multimodal haystack" documents by concatenating interleaved image-text sequences from the OBELICS dataset. Then, "needles" containing key information are inserted into either the text or images of these documents.