Skip to main content

CrashLens: Machine Learning Insights into Statewise Traffic Safety

Page 1


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

"CrashLens: Machine Learning Insights into Statewise Traffic Safety"

1 Professor, Master of Computer Application, VTU, Kalaburagi , Karnataka, India

2Student , Master of Computer Application ,VTU, Kalaburagi , Karnataka, India

ABSTRACT- Road traffic accidents represent one of the most pressing challenges in public health, economics, and transportation management worldwide. Every year, approximately 1.3 million people die globally due to road crashes,andtensofmillionsmoreareinjuredordisabled. Developing nations such as India bear a disproportionate share of this burden, with the Ministry of Road Transport andHighways(MoRTH)reportingnearly1.55lakhdeaths and more than 4 lakh accidents annually. Despite policy measures, stricter enforcement of traffic rules, and awarenesscampaigns,accidentfiguresremainconsistently high, creating an urgent need for data-driven analysis. Traditionalapproachestoaccidentanalysisrelyheavilyon descriptive statistics or regression based models, which summarizetrendsbutoftenfailtouncoverhiddenstructures in the data. Machine learning, particularly unsupervised techniques,offersnewopportunitiestodiscoverclustersand patterns that may not be visible through conventional methods.

Keywords: The results indicate that clustering can successfully separate high-fatality states such as Maharashtra, Tamil Nadu, and Uttar Pradesh from lower-risk states, while DBSCAN effectively detects anomalies.

1. INTRODUCTION

India’s road safety crisis is one of the most pressing challengesinitstransportationsector.Theissueisnotonly limitedtothequalityofroadinfrastructurebutalsostrongly influencedbyhumanbehavior,weakenforcementoftraffic regulations,andsocio-economicdisparities.TheMinistryof RoadTransportandHighways(MoRTH)consistentlyreports alarming statistics that highlight the seriousness of the problem.In2021alone,Indiarecordedmorethan4.1lakh accidents, which tragically claimed around 1.55 lakh lives andinjuredover3.7lakhpeople.Onaverage,thistranslates to 47 accidents and 18 deaths every single hour, making Indiaoneoftheworst-affectedcountriesgloballyintermsof roadsafety.

The consequences of such a large number of accidents extend beyond the immediate loss of life and injuries. According to estimates from the World Bank, road traffic accidentscostIndianearly3%ofitsGrossDomesticProduct (GDP) every year. For a developing economy, this is an enormous economic burden that diverts resources away from infrastructure growth, education, and healthcare.

Familiesofaccidentvictims alsofacelong-termhardships due to loss of income, medical expenses, and emotional trauma. Thus, the problem of road safety in India is not merelyatransportationissuebutanationaleconomicand socialconcern.

Severalcontributingfactorsarerepeatedlycitedinaccident reports.Speeding,non-compliancewithhelmetandseatbelt regulations, distracted driving, and driving under the influenceofalcoholremainleadingbehavioralcauses.The lack of strict enforcement of traffic laws worsens these issues,especiallyinruralandsemi-urbanareaswhereroad monitoringisminimal.Additionally,inadequateemergency response systems and delays in providing timely medical caresignificantlyincreasethefatalityrate.Poorroaddesign, insufficient signage, and lack of pedestrian-friendly infrastructure also add to the growing list of causes. This multi-dimensionalnatureofroadaccidentsindicatesthatno singlesolutioncanfullyaddressthecrisis.

TraditionalapproachestoroadsafetymanagementinIndia have largely relied on descriptive statistical analysis of accidentdatapublishedannuallybyMoRTH.Thesereports provide valuable insights into the number of accidents, fatalities,andinjuriesbutremainlimitedinscope.Theydo not adequately capture underlying relationships or reveal hiddenpatternsinthedata.

2. PROBLEM STATEMENT

RoadsafetyinIndiahasbecomeanationalconcern,withan averageof47accidentsand18deathsreportedeveryhour. Although statistical reports highlight the scale of the problem,theydonotprovidesufficientinsightsintopatterns and relationships between accident factors. Policymakers and stakeholders face significant challenges in identifying accident-prone states, prioritizing interventions, and allocatingresourceseffectively.Theabsenceofautomated, data-driven classification systems leads to inefficient strategies, continued loss of lives, and growing economic burden.Therefore,theproblemaddressedbythisprojectis thelackofadvancedanalyticaltoolstoclusterIndianstates into risk categories and visualize accident hotspots for informeddecision-making.

3. OBJECTIVES

DataPreparation:Collectandpreprocessmulti-yearstatewise accident data (2018– 2022) to ensure quality and uniformity.ClusteringImplementation:ApplyK-Meansand

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

DBSCAN algorithms to classify Indian states into natural accident risk clusters. Evaluation of Clusters: Use performancemetricssuchasSilhouetteScoreandDavies–Bouldin Index to validate cluster quality. Visualization: Generatescatter plots,heatmaps,and geospatial mapsfor intuitive interpretation of accident-prone states. Policy Support: Provide actionable insights to policymakers, transportauthorities,andresearchersforreducingaccidents andfatalities.

4. METHODOLOGY USED

The methodology adopted in this project is designed to provide a structured and systematic framework for analyzingIndianroadaccidentdata.Sinceaccidentstatistics are complex and influenced by multiple factors such as population, road length, and vehicle growth, it becomes necessarytoorganizetheworkflowintowell-definedstages. By following a step-by-step approach, the project ensures that the analysis remains accurate, reproducible, and interpretableforbothtechnicalexpertsandpolicymakers. The first stage involves data collection, where multi-year accidentdatasetsaregatheredfromreliablegovernmentand opendatasources.Thisrawdataisofteninconsistentand requiressignificantpreprocessingbeforeitcanbeusedfor analysis.Thepreprocessingstageincludesremovingmissing values,eliminatingduplicates,andstandardizingnumerical columns. Feature engineering is also applied to derive additional indicators such as accidents per 100,000 populationandfatalitiesper100kilometersofroadlength, whichprovidedeeperinsightsthanrawcountsalone.

Thenextphaseistheapplicationofclusteringalgorithms. Twounsupervisedlearningtechniquesareused:K-Means, which groups states into clusters based on similarity in accident attributes, and DBSCAN, which is capable of identifyingirregularpatternsandoutliersthatmaynot fit traditionalsphericalclusters.Thesealgorithmshelpuncover hiddenpatternsinthedata,allowingstatestobecategorized ashigh-risk,medium-risk,orlow-riskaccidentregions.

Following clustering, the system performs evaluation of cluster quality. Metrics such as the Silhouette Score and Davies–Bouldin Index are computed to assess the compactness and separation of the clusters. This ensures thattheresultsarenotarbitrarybutstatisticallyvalidated. Evaluationplaysacriticalroleinconfirmingthatthechosen number of clusters or density parameters leads to meaningful groupings that can be trusted for decisionmaking.

Finally, the project emphasizes visualization of results, which is crucial for effective interpretation. Scatter plots, heatmaps,andgeospatialmapsaregeneratedtopresentthe clusteringoutcomesinaclearandintuitivemanner.These visual outputs allow non-technical stakeholders, such as policymakersandroadsafetyauthorities,toquicklyidentify accidenthotspotsandtrends.Bycombiningsystematicdata

processing, robust clustering techniques, and accessible visualizations, the methodology ensures a comprehensive approachtounderstandingandreducingroadaccidentsin India.

5. LITERATURE SURVEY

Theanalysisofroadaccidentshaslongbeenanimportant researchareaintransportationsafety,urbanplanning,and public policy. Traditional descriptive statistics are being replaced by advanced data-driven approaches such as machinelearningandclustering,whichallowresearchersto extracthiddenpatternsfromaccidentdatasets.Thissection reviews past studies related to accident prediction, clustering, and geospatial visualization, with emphasis on methodologies,findings,andlimitations.

Zhangetal.(2019)employedK-MeansclusteringonChinese traffic accident data to identify urban accident hotspots. Theirstudydemonstratedthatclusteringcouldeffectively categorizeregionsintohigh,medium,andlow-riskgroups, allowingtransportauthoritiestoprioritizesafetymeasures. However, their dataset was limited to metropolitan cities, leavingruralareasunexplored.

Similarly,LiandWang(2020)appliedDBSCANtolarge-scale trafficaccidentdatasets.UnlikeKMeans,DBSCANdetected irregular accident clusters that did not follow spherical patterns.Thisapproachprovedeffectiveinidentifyinghighdensityaccidentzonesalonghighways.Thelimitationwas thatDBSCANrequiredcarefultuningofparameterssuchas epsilon(ε)andMinPts,whichvariedacrossregions.

Chakrabortyetal.(2021)usedregression-basedmodelsto predict accident severity in Indian states. Their findings highlighted that variables such as population growth and vehicle density had strong correlations with fatality rates. However, regression models could not capture non-linear relationshipsinthedata.

Inarelatedwork,KumarandSingh(2021)appliedDecision Treestoaccidentdatasetsforseverityclassification.Their model achieved an accuracy of 85%, showing that supervised learning could predict accident outcomes. Nonetheless,decisiontreesrequiredlabeleddatasets,which arenotalwaysavailable.

Ahmed et al. (2020) integrated geospatial mapping with clustering techniques to identify blackspots in the Middle East. Using GIS and DBSCAN, they successfully detected accident hotspots along major highways. Their study concludedthatvisualizationwascriticalforcommunicating resultstopolicymakers.

In India, Rao et al. (2021) applied heatmap-based visualizationofroadaccidentdatasetsforTelanganastate. Their approach improved interpretability but did not incorporate machine learning techniques, limiting the predictiveandclusteringcapabilitiesofthestudy.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

Several studies compared the effectiveness of clustering algorithms.Chenetal.(2018)comparedK-Means,DBSCAN, andHierarchicalClusteringfortrafficdatasetsinChina.They concludedthatK-Meansperformedbetterforwell-separated clusters,whileDBSCANwassuperiorindetectingnoiseand irregularclusters.

6. SYSTEM DESIGN

The System Perspective Diagram provides a high-level architectural view of the proposed India Road Accident Clustering System. It demonstrates how the different components interact with each other to transform raw datasets into meaningful insights for end-users. At the startingpoint,theuserorpolicymakerplaysacriticalroleby supplyingmulti-yearaccidentdatasets,typicallyinCSVor Excel formats. These datasets include accident counts, fatalities,injuries,roadlength,vehicleregistrationdata,and populationfigures.Thesystemacceptsthesefilesasinput andvalidatesthemtoensurethattherequiredschemaand attributesareavailablebeforefurtherprocessing.Oncethe dataissuccessfullyingested,itflowsintotheclusteringand analysis modules. The system uses unsupervised learning algorithms such as K-Means and DBSCAN to group Indian states or regions based on accident characteristics. This clustering process highlights hidden patterns, identifies high-riskandlow-riskzones,andisolatesoutliersthatmay requirefurtherinvestigation.Theclusteringmoduleensures that raw statistics are transformed into structured categoriesthatcanbeeasilyinterpretedbystakeholders.

7. SCREENSHOTS

8. CONCLUSION & FUTURE SCOPE

Theproject“IndiaRoadAccidentClusteringUsingMachine LearningTechniques”wasdevelopedtoaddressthepressing issueofroadsafetyinIndiabyanalyzingmulti-yearaccident data through advanced clustering algorithms. Traditional descriptivemethodsprovideonlynumericalsummariesof accidents, but they fail to uncover deeper patterns and correlations. This project successfully demonstrates how unsupervised learning techniques such as K-Means and DBSCAN can categorize Indian states into meaningful accident risk groups, thereby supporting data-driven decision-making.

The present system lays a strong foundation for the clustering-based analysis of road accidents in India, but

Figure 2: Multiyear Sample Of Road Accident Clustering
Figure 3: Fatalities
Figure 4: The Clustering Map Using Correlation Matrix
Figure 5: Map of Clusterd items
Figure1: System Perspective DFD Of India Road

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

there is significant scope for extending its functionality. Future enhancements can focus on integrating additional datasources,expandingthescopeofanalysis,andimproving usability for policymakers and stakeholders. These improvementswillnotonlyenhancetheaccuracyofresults but also increase the system’s relevance in real-world decision-making.Oneofthemostimpactfulenhancements would be the integration of real-time accident and traffic data.Currently,thesystemreliesonhistoricaldatasetsthat are updated annually by government authorities. By incorporating real-time feeds from traffic monitoring systems,GPSenabledvehicles,andconnectedroadsensors, thesystemcanprovidedynamicaccidentriskpredictions. Thiswouldmaketheanalysismoreresponsiveandsuitable forimmediateinterventionssuchasdeployingambulances oralertingtrafficpoliceinhigh-riskareas.

9. REFERENCES

[1]WorldHealthOrganization,GlobalStatusReportonRoad Safety2018,Geneva:WHO,2018.

[2] Ministry of Road Transport and Highways (MoRTH), RoadAccidentsinIndia–2022,GovernmentofIndia,New Delhi,2023.

[3]A.Mohan,R.Tiwari,andS.K.Sharma,"Analysisoftraffic accidentdatausingdataminingtechniques,"International Journal ofData AnalysisandInformationSystems,vol.10, no.2,pp.45–54,2021.

[4] Y. Zhang and J. Zhao, "Identification of urban traffic accident hotspots using clustering methods," Journal of TransportationSafety&Security,vol.11,no.4,pp.321–338, 2019.

[5] L. Li and Y. Wang, "Application of DBSCAN in road accidenthotspotdetection,"IEEEAccess,vol.8,pp.90123–90132,2020.

[6]V.BhatiaandA.Sharma,"GIS-basedroadaccidentblack spotanalysisinIndia,"InternationalJournalofGeographic InformationScience,vol.34,no.5,pp.765–779,2022.

[7]S.GuptaandR.Mehta,"ComparativeanalysisofK-Means andDBSCANclusteringforaccidentdatasets,"International JournalofComputerApplications,vol.176,no.3,pp.15–22, 2020.

[8] H. Park and S. Lee, "Hierarchical clustering for traffic accident analysis in South Korea," Journal of Advanced Transportation, vol. 2019, Article ID 8456231, pp. 1–11, 2019.

[9] A. Ahmed and M. Khan, "Geospatial visualization of accidenthotspotsusingmachinelearningandGIS,"Arabian JournalforScienceandEngineering,vol.45,pp.7651–7664, 2020.

[10]S.Rao,D.Patel,andP.Reddy,"Roadaccidentanalysis and visualization using heatmaps: A case study of Telangana," International Journal of Transportation Research,vol.9,no.2,pp.112–121,2021.

Turn static files into dynamic content formats.

Create a flipbook