Current large vision-language models excel in conversational interfaces for healthcare, but fail at quantitative measurements, precise localization, and objective analysis of medical images.
Existing models often hallucinate or provide unreliable disease metrics, even when presented confidently, risking patient safety and undermining trust in AI-powered diagnosis.
Without reliable, regulatory-approved algorithms for clinical measurements, vision-language systems cannot deliver actionable, trustworthy diagnosis or gain approval for real-life medical use.


