1 Comment
Jun 25Liked by Vlad Bogolin

Prism is an innovative framework designed to address the intertwined challenges of perception and reasoning in solving visual problems. By separating perception and reasoning into two distinct stages, Prism enables a systematic comparison and evaluation of proprietary and open-source Vision Language Models (VLMs) in terms of their perception and reasoning capabilities. Combining a streamlined VLM focused on perception with a powerful Large Language Model (LLM) designed for reasoning, Prism achieves outstanding results in general visual language tasks while significantly reducing training and operational costs.

Expand full comment