• Author(s) : Soham Gadgil, Mahtab Bigverdi

The field of dermatology AI is rapidly advancing, but the scarcity of data with ground-truth concept-level labels, which are semantically meaningful meta-labels for humans, remains a significant limitation in training trustworthy classifiers. Foundation models like CLIP (Contrastive Language-Image Pre-training) offer a potential solution by leveraging their zero-shot capabilities and vast amounts of image-caption pairs available on the internet. Fine-tuning CLIP using domain-specific image-caption pairs can improve classification performance. However, a challenge arises from the misalignment between CLIP’s pre-training data and the medical jargon used by clinicians for diagnosis.

Recent advancements in large language models (LLMs) have opened up the possibility of harnessing their expressive nature to generate rich text. This study aims to utilize these models to generate caption text that aligns well with both the clinical lexicon and the natural human language used in CLIP’s pre-training data. The approach involves starting with captions from images in PubMed articles and extending them by processing the raw captions through an LLM fine-tuned on several textbooks in the field of dermatology.

The findings of this study demonstrate that using captions generated by an expressive fine-tuned LLM, such as GPT-3.5, leads to improved downstream zero-shot concept classification performance. By aligning the caption text with the clinical lexicon and natural human language, the generated captions enhance the ability of foundation models like CLIP to accurately classify dermatological concepts in a zero-shot manner.

This research highlights the potential of combining foundation models, fine-tuned LLMs, and data alignment techniques to overcome the limitations posed by the scarcity of labeled data in dermatology AI. The proposed approach paves the way for the development of more trustworthy and accurate AI systems in the field of dermatology, ultimately benefiting both clinicians and patients.