advancing AI in medicine:: EurIPS 2025 scientific achievement
EurIPS 2025 Publication Highlights AI Cutting-Edge Research
Proud to share that NSBproject is strengthening its role not only as a trusted consultancy, but as an active research partner in European innovation.
Our researcher Michele Ferramola co-presented the paper “Are Large Vision–Language Models Truly Grounded in Medical Images?” at EurIPS 2025, where the team received the Best Presentation Award.
This significant scientific study co‑authored has been published on arXiv and presented at the EurIPS 2025 Workshop on Multimodal Representation Learning for Healthcare in Copenhagen — a premier event in artificial intelligence research with strong relevance to both academic and applied AI communities.
The paper, titled “Are Large Vision Language Models Truly Grounded in Medical Images? Evidence from Italian Clinical Visual Question Answering”, investigates a critical question for AI in medicine: whether state‑of‑the‑art vision‑language models genuinely interpret clinical images or rely on textual cues and shortcuts to answer medical questions.
Abstract
Large vision–language models (VLMs) have demonstrated impressive performance on medical visual question answering benchmarks, but it remains unclear if this success reflects true visual understanding. This study tests four state‑of‑the‑art models — Claude Sonnet 4.5, GPT‑4o, GPT‑5‑mini, and Gemini 2.0 flash exp — on 60 clinical questions from the EuropeMedQA Italian dataset that explicitly require image interpretation. By replacing the original clinical images with blank placeholders, the research reveals substantial variability in how much models depend on visual information versus textual shortcuts. For example, GPT‑4o’s performance dropped significantly when images were removed, while other models maintained high accuracy, suggesting different levels of visual grounding and reliance on non‑visual features. These findings highlight the importance of rigorous evaluation for deploying multimodal AI in real clinical settings.
You can read the full paper here:
📄 https://arxiv.org/abs/2511.19220
Why This Matters for AI in Healthcare
The growing adoption of AI in clinical workflows depends not only on high benchmark scores, but on models’ real ability to understand and reason with medical images and language. This research sheds light on robustness, safety, and real‑world applicability — key considerations for reliable clinical AI systems.
Unlike traditional evaluation approaches, our study specifically measures how much visual grounding contributes to correct answers, by observing model behavior when image content is unavailable. These insights are crucial for developers, clinicians, and partners who are investing in multimodal AI for healthcare and aiming to move beyond superficial performance metrics toward trustworthy, interpretable systems.
NSBproject’s Role and Strategic Impact
NSBproject’s contribution to this publication underlines our capability to operate not just as a project implementer, but as a technical and scientific partner within large European research initiatives. By collaborating with academic institutions and research centers, we help bridge cutting‑edge AI research with real project needs, especially in domains where high standards of validation and clinical relevance are essential. This reinforces our value to future European and international consortia in AI and health technologies.
Congratulations to Michele Ferramola and the full research team for this outstanding achievement, and our thanks to the EurIPS organizers for hosting a stimulating workshop that catalyzes new directions in multimodal AI research.
