Skip to main content

Automated Skin Disease Diagnosis Using DINO Vision Transformers and OpenCV

Page 1


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 08 | Aug 2025 www.irjet.net p-ISSN: 2395-0072

Automated Skin Disease Diagnosis Using DINO Vision Transformers and OpenCV

1B.Tech Student, Department of Computer Science and Engineering (AI & ML), Pragati Engineering College, East Godavari, India

21B.Tech Student, Department of Computer Science and Engineering (AI & ML), Pragati Engineering College, East Godavari, India ***

Abstract - Thisresearchprovidesanefficient,cost-effective, and affordable solution for automated skin disease diagnosis through state-of-the-art image processing and deep learning methods. Our objective is to overcome costly and scarce dermatologic diagnostic impediments, especially for underserved communities. We implemented a web-based applicationthatutilizesOpenCVtoextractimportantfeatures from skin images involving color, texture, and shapes and applies a self-supervised Vision Transformer (ViT) model trained through DINO for robust and precise classification. The system provides an easy-to-use interface for uploading skinimages,wherebyimages areautomaticallyprocessedand analyzedinrealtime.Utilizingapubliclyaccessible,annotated dermatoscopic image dataset, our model was about 92% accurate in predicting multiple skin ailments. The prime contribution of this research lies in combining powerful computer vision algorithms with an easily scalable and easyto-use web application, which forms an efficacious tool for early diagnosis and screening of skin diseases that can be easily implemented in real-world clinical and remote healthcare environments.

Key Words: Skin disease diagnosis, computer vision, OpenCV, Vision Transformer (ViT), DINO, deep learning, web application, medical image analysis, automated diagnosis, healthcare accessibility

1.INTRODUCTION

Skin conditions affect millions worldwide, with over 900 millionindividualsexperiencingsomeformofskindisorder atleastonceintheirlifetime.Timelyandprecisediagnosisis essentialforeffectivetreatment,butconventionalmethods often demand specialized knowledge and expensive equipment,limitingaccessibilityinmanyregions.

Recent advancements in computer vision and artificial intelligenceofferscalableandobjectivediagnosticsolutions. UtilizingOpenCVforimagefeatureextractionandaVision Transformer (ViT) model trained with the DINO selfsupervised learning approach, we propose an automated systemforclassifyingskindiseases.Thisplatformfacilitates easy user registration, secure image uploads, and fast diagnosis,providinganaffordableandefficienttoolforearly

detection of skin conditions, particularly in areas with limitedmedicalresources.

1.1 Objectives

Theobjectiveofthisprojectistodesignanddevelopawebbasedapplicationforautomatedskindiseasediagnosisthatis both accessible and user-friendly. The system leverages advanced image processing techniques using OpenCV to extractrelevantfeaturesfromskinimages,includingcolor, texture,andshape.Foraccurateandrobustclassificationof skin diseases, a deep learning model based on the Vision Transformer(ViT)architectureisimplemented,pretrained usingtheDINOself-supervisedparadigm.Theperformanceof the system is rigorously evaluated on publicly available, annotated dermatoscopic image datasets using standard metrics such as accuracy, precision, recall, and F1-score. Ultimately,thissolutionaimstobescalableandcost-effective, enablingitsuseinreal-worldclinicalandremotehealthcare environments for early screening and diagnosis of skin conditions.

1.2 Literature Survey

Withanever-growingincidenceofskindiseasesaroundthe world, considerable research interest lies in automated dermatologicdiagnosis.Currentmethodsaremostlybased onhandcraftedfeatureextractionandtraditionalmachine learning algorithms, yet these are frequently hindered by subjective human interpretation and issues of scalability.

Withrecentadvancementsindeeplearning,particularlyin convolutional neural networks (CNNs), remarkable performanceinmedicalimageanalysis,suchasskinlesion classification,hasemerged.CNN-basednetworksarecapable oflearningautomaticdiscriminativefeaturesfromrawpixel informationwithouthandcraftedfeatures,whichpromotes better

ViTarchitectureshavebecomestrongcompetitorstoCNNs for classification on images in more recent times. ViTs employself-attentionforextractinghigh-orderpatternsand long-range context in images, achieving state-of-the-art performance across several benchmarks. Self-supervised

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 08 | Aug 2025 www.irjet.net p-ISSN: 2395-0072

learningparadigmslikeDINOextendmodelgeneralization further through learning robust representations from unlabeleddata.

By utilizing publicly available datasets like ISIC and DermNet,automateddiagnosismethodscanbeconstructed and tested. However, present solutions do not adequately answerconcernsregardingaccessibility,cost,andfeasibility in practical scenarios. The integration of advanced image processinglibrarieslikeOpenCVforfeatureextractionwith recenttransformer-basednetworksholdsapromisingfuture forenhanceddiagnosticcapabilityandfeasibility.

2. Methodology

This section describes the design, development, and integrationoftheautomaticskindiseasediagnosissystem. The approach integrates state-of-the-art image processing with state-of-the-art deep learning, all served in a userfriendlywebapplication.

2.1 System Architecture

The application is designed as a multi-page web platform, withasmoothuserexperiencethroughoutfromregistration todiagnosis.Themainpagesare:

Home Page: Provides the platform's functionality, highlightingthedetectionofskindiseaseautomaticallyand guidingtheuser.

Login Page:Enablesreturninguserstosecurelyauthenticate usingtheircredentials.

Sign Up Page:Allowsnewuserstosignupbyprovidinga username and password, giving safe access to diagnostic features.

Profile Page:Itactsastheprimaryuploadinterfaceforskin images,viewingofimagesselected,andviewingofresultsfor diagnostics.

2.2 Workflow

User Signup and Verification:

Youcancreateanewaccountbygoingtothesign-uppageor by loggingin withtheaccount youalreadyhave. Database connectionsguaranteethesafehandlingofusercredentials andsessions.

Image Upload and Preview:

Ontheuser-profilepage,usersuploadpicturesoftheaffected skinregions.Previewsoftheuploadedpicturesaredisplayed right away, giving immediate visual feedback and acknowledgmenttotheuser.

Image Storage:

ImagesaremanagedbytheuseofFileSystemStorage,which safelystoresthemontheserverandreturnsdynamicURLs forviewingandsubsequentprocessing.

Fig -1:Sequencediagramillustratingtheflowofuser interactionsandbackendprocessing.

Fig -2:Flowchartrepresentingtheskindiseasediagnosis process.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 08 | Aug 2025 www.irjet.net p-ISSN: 2395-0072

2.3 Image Processing and Preprocessing

OpenCV Feature Extractor:

OpenCV is used to process uploaded images, extracting featuressuchascolor,texture,andshapeattributescritical fordistinguishingbetweendifferentskinconditions.

Normalization:

Imagesundergonormalizationtostandardizepixel values, improvingmodelrobustnessandaccuracy.

Dimension Extraction:

The system computes and reports image dimensions and otherusefulmeasurestofacilitatesubsequentanalysisand debugging.

Augmentation:

Imagescanbeaugmented(flips,rotations,etc.)inorderto diversifythedatasetandpreventoverfittingofthetrained model.

2.4 Model Development and Integration

Model Selection:

WechosetheVisionTransformer(ViT)architecturetrained undertheDINOself-supervisedlearningframeworkbecause of the latter's effective generalization and feature representationcapabilities.

Training:

The network has been trained on a publicly available, annotated dataset of dermatoscopic images. Data normalization,augmentation,andcautiousvalidationhave been used in the training process to ensure reliable performance.

Accuracy:

ThetrainedDINO-ViTachievedabout92%accuracyonthe test dataset, showing remarkable reliability in the classificationofvariousskinconditions.

3. Results and Discussion

Software/Libraries: Python3.9,OpenCV,TensorFlow,Flask

Evaluation Metrics:Accuracy,Precision,Recall,F1-score, ConfusionMatrix

Why DINO ViT is Better?

The DINO-ViT modeloutperformsthebaselineCNNacross allmetrics(seeFigure).Itsadvantagesstemfrom:

1. Transformer Architecture – Self-attention captures long-range dependencies and fine detailsinskinimages.

2. Self-Supervised Pretraining (DINO) –Learns robust, transferable features from unlabeled data, improving performance with limited labeledsamples

3. Enhanced Feature Representation –OpenCV preprocessing with ViT yields more discriminative,noise-tolerantfeatures.

Together,thesefactorsmakeDINO-ViTamoreaccurateand reliable tool for automated skin disease diagnosis, with strongpotentialforearlyscreeningandclinicaluse.

3.1 Results

Fig -3: ComparisonofUploadedSkinImagesBeforeand AfterNormalizationandTheirFeatureDimensions
Fig -4:PerformanceComparision:DINOViTvsBaseline CNN

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 08 | Aug 2025 www.irjet.net p-ISSN: 2395-0072

3.2 Discussion

Strengths:

TheDINO-ViTmodeldemonstratedhighaccuracy,confirming itseffectivenessforskindiseaseclassification.Preprocessing withOpenCVimprovedimagequalityandconsistency,which positivelyimpactedthemodel’sperformance.

Limitations:

Performance declined for underrepresented disease categories,highlightingdatasetimbalance.Real-worldimages withpoorlightingorocclusionsreducedpredictionaccuracy.

Potential Improvements:

Expand and diversify the dataset to strengthen model generalization.

Applyadvancedaugmentationandpreprocessingstrategies formorerobustperformance.

Incorporatedermatologistinsightstorefinepredictionsand ensureclinicalrelevance.

4. Conclusion

Thisresearchdemonstratesthefeasibilityandeffectiveness ofanautomated,web-basedskindiseasediagnosissystem that combines state-of-the-art image processing and deep learning techniques. By integrating OpenCV-based feature extraction with a DINO-pretrained Vision Transformer model, the system achieves high accuracy and robust performance in classifying multiple skin conditions. The user-friendlywebinterfaceandreal-timeanalysismakethe solutionaccessibletobothcliniciansandthegeneralpublic, especiallyinresource-limitedsettings.Theproposedsystem

not only addresses the scarcity and cost barriers of traditionaldermatologicaldiagnosisbutalsopavestheway forearlydetectionandbetterpatientoutcomes.Futurework may focus on expanding the dataset, incorporating additional skin conditions, enhancing model performance with more advanced architectures, and integrating expert feedbacktofurthervalidateandimprovethesystem.

REFERENCES

[1] Kaggle, "Skin Diseases Image dataset", Available at: https://www.kaggle.com/datasets/ismailpromus/skindiseases-image-dataset

[2] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski,andA.Joulin,"EmergingPropertiesinSelfSupervisedVisionTransformers,"in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV),pp.9630–9640,2021.

[3] A.Dosovitskiy,L.Beyer,A.Kolesnikov,D.Weissenborn, X.Zhai,T.Unterthiner,M.Dehghanietal.,"AnImageis Worth 16x16 Words: Transformers for Image Recognition at Scale," in International Conference on Learning Representations (ICLR),2021.

[4] G.Bradski,"TheOpenCVLibrary," Dr. Dobb's Journal of Software Tools,2000.

Fig -5:ConfusionMatrix

Turn static files into dynamic content formats.

Create a flipbook