Autonomous Driving Scene Parsing
1Assistant Professor of Computer Science and Engineering, MIT College, Uttar Pradesh, India
2Student of Computer Science and Engineering, MIT College, Uttar Pradesh, India
3Student of Computer Science and Engineering, MIT College, Uttar Pradesh, India
4Student of Computer Science and Engineering, MIT College, Uttar Pradesh, India
5Student of Computer Science and Engineering, MIT College, Uttar Pradesh, India ***
Abstract - In this research paper, we give the semantic segmentation of driving scenes in unconstrained conditions The focus of previous approaches has been on constrained environments but we focus on the unconstrainedenvironment of Indian roads. We have used IndianDrivingDataset (IDD[5]) which consists of 182 drive sequences on Indian roads. To perform semantic segmentation, we have used U-Net[6] model which is completely convolutional neural network, we have made some slight adjustments in the architecture for our purposes.
Key Words: Deep learning, classification, convolutional neural networks, semantic segmentation, OpenCV.
1.INTRODUCTION
Autonomous Drivingdependsoninformationthathas been processed from several sensors that are positioned above the car. These sensors enable the vehicle sense its surroundings, comprehend traffic scenes,andmanageitsmotions,actingasitssenseof hearing and sight. High-resolution cameras, radar, andLightImagingRecognitionandRanging(LiDAR)arethe maintypesofsensorsusedtoclassifyobjectsusingfeature extraction and measure their distance from other objects using radio waves and illumination in order to create a three-dimensional (3D) image of the object. Environment. Severaltypesofextrasensors,suchasinfrared,sonar,microradar,ultrasonic,andshort-rangesensors,havebeenfitted for autonomous vehicles to prevent collision with road obstacles.Similartothis,autonomousvehiclesemployvision sensors to enable them to comprehend the visual components of their environment. Lane detection, traffic light and road sign analysis, pedestrian and vehicle detection, among other visual display and understanding problems,areallpartofautonomousdriving.Bygathering thisdata,automatedvehicle behaviourslikelanechanges, braking, and turning manoeuvres can be better and safer instructed.
(a)InstanceSegmentation (b)SemanticSegmentation
Examplesofsegmentedimagesforanautonomousvehicle toaidinsceneanalysisinclude(a)aninstancesegmentation exemplarwherevariousobjectsfromnearlyidenticalclasses aresegmentedintovariouscolourswiththeirownboundary pixels, and (b) a semantic segmentation example where objects are illustrated with a single colour without any distinction.
Vision sensor data is arguably regarded as the most reliable source of information among those gathered for vehicledecision-making.Asaresult,thisfieldofstudyhas received much research and application in Intelligent Transportation Systems (ITS), primarily from the perspectives of machine learning and deep convolutional neural networks (CNN). By iteratively collecting model characteristics from the input image and optimizingly achieving better representations, deep CNNs are neural networks with a variety of functional layers for image processing.Similartechniquesareutilisedforsceneanalysis from vision data, where deep CNN is used to real-time photos,forexample,todeterminewhereapedestrianisand howfarawayitisfromanautonomousvehicle.Incontrastto this streamlined overview of computer-generated landscapes, the complicated models that are now being proposed are capable of generating multiple labels like pedestriansandvehiclesaswellasthelocalization.
2. LITERATURE REVIEW
To understand the development of autonomous driving research in recent years, it is necessary to organise a literaturereviewtounderstandthevariousapplicationareas bywhichautonomousdrivinghasdeveloped,aswellasto recognise research gaps. Thus, the research process, approaches and findings of the literature review are introducedinthenextsections.
2.1. Xinyu Huang, Xinjing Cheng, “The ApolloScapeDatasetforAutonomousDriving”
They offer a huge, detailed dataset of street maps in this project.Thisdatasetincludesvideosequencewithinstancelevel annotations, 2D/3D annotation and location information,variousannotatedlanemarkings,andscenarios thataremoresophisticatedthanthoseinpreviousdatasets.. In the future, we'll start by increasing our dataset to one millionlabeledvideoframeswithawiderrangeofweather, suchsnow,rain,andfog.Inordertoproducedepthmapsand panoramicphotosinthenearfuture,wealsointendtoattach stereocamerasandapanoramiccamerasystem.Formoving objects,thepresentversionstilllacksdetailedinformation. Forbothstationarybackgroundsandmovingobjects,they wouldliketoproducefulldepthinformation..
2.2. JuanRosenzweig,MichaelBartl,“AReview and Analysis of Literature on Autonomous Driving”
Autonomousvehiclesarealreadywithinreach;forinstance, AudiandMercedes(Bartl,2014)declaredthattheirhighly automatedfeatureswerevirtuallyreadyforproduction.Asa mirror of the news report, we can gradually see how this innovationmanagestoenterourdailylives,withexamples suchascarstakingablindmanfortacosin2013,coast-tocoast excursions in 2016 (CNN), an Italy-to-China trip in 2014(Broggi,etal.),and6millionmilespreviouslycovered byGoogle(Urmson,2015).However,despitebeinglargely ready, the major competitors in the race, including Audi, Bentley,Chevrolet,Ford,Automakers,Volvo,Lexus,Lincon, Lamborghini, Nissan, Tesla, and Volvo, are attempting to integrateitgraduallyintotheirvehicles.
2.3. MarkusHofmarcher& ThomasUnterthiner, “Visual Scene Understanding for Autonomous Driving Using Semantic Segmentation”
IncomparisontotheENetsegmentationnetwork,apopular networkarchitectureforeffectivesemanticsegmentationon embeddeddevices,theyratedthenetworkfavourably.There arecomparable,morerecentarchitectures,butnonehave beencompared.
They presented a tiered approach, where only specific components are educated in an end-to-end manner, as a supplementtofulldeeplearning,whichlacksexplainability. A human specialist with thorough understanding of the systemmaydecipherthedefinitionofintermediatenodes, whichareoutputsthroughonelayerandinputstoanother.A potentmethodthatoffersareduceddimensionalabstraction ofincomingvideosignalsissemanticsegmentation.
2.4. Manikandan Thiyagarajan, “Self-Driving Car”
Thisprojectfocusesonmakingimprovementsinpedestrian safetyandcommuting,drasticallyreducingaccidentsand human mistakes through continuous system learning because self-driving cars are the primary advancement in the automatable industry in the future. This initiative will revolutionisehowpeoplewithdisabilitiesandblindpeople who can drive themselves are transported. Mobile applications can be created using our solution as the foundation,allowinguserstosummonavehiclethroughan appand,oncethelawispassed,manufactureacompletely autonomousvehicle.
in Unconstrained Environment”
Theyprovideasafreshdatasetforresearchingissueswith autonomousnavigationinunstructureddrivingsituations. Theypointouta numberof flawsinthecurrentdata sets, including the inability to discriminate between safe and harmfulroadsideregions,theneedformorelabelsforcars, andalabelhierarchythatlessensambiguity.Theyexamine the dataset's class imbalance and label statistics. With relationtoprevioussemanticsegmentationdatasets,wealso look at the domain discrepancy properties. Our semantic segmentationdatasetsaremorediversethanotherdatasets since they were collected in India, where there are disparities in background categories and road user looks. Thisnotonlypresentsintriguingchallengestothestateof theartinsemanticsegmentation,butitalsorepresentsthe first attempt to address problems linked to autonomous drivingtoourknowledge.
Theu-net[6]architectureperformsexceptionallywellina variety of biomedical segmentation applications. It just requiresasmallnumberofannotatedphotosandhasavery manageabletrainingtimeofonlyeighthoursonanNVidia Titan GPU because of data augmentation using elastic deformations (5GB). Depending on Caffe and a trained network,weofferawholeimplementation.Wearesurethat many such jobs can be completed with ease using the unet[6]design.
Xinyu Huang, XinjingCheng Present a large comprehensive street viewdataset.
Thedatasetincludesscenarios that are more complicated thanthoseinexistingdatasets, as well as 2D/3D annotation andlocationdata,avarietyof annotated lane markings,and video frames labelled at the instancelevel.
Wewillraisethetotalsizeof our dataset to one million annotatedvideoframeswith a wider range of weather, suchassnow,rain,andfog.
Juan
Examplesincludedriving fromItalytoChinain 2014,600,000miles previouslycoveredby Google,andautos bringingablindguyfor tacosasearlyas2013 (Google,2013).(Urmson, 2015).
A prototype for a self-driving automobileisbeingdeveloped that combines a number of technologies, including lane detection, disparity maps for measuring distances between thecarandothervehicles,and support vector machine classification algorithms for anomaly detection. With our data collection, very good accuracy.
The system's goal is to use innovative methods to increasethereliabilityofselfdrivingtechnology.Theuseof a self-driving car that can make informed decisions is themainfocus.Thesystemis regarded as a prototype vehiclethathascamerasand sensors to understand its surroundings.
Manikandan
Evaluated the network compares favourably against the ENet segmentation
ENet (Efficient Neural Network). It offers real-time, pixel-by-pixel semantic segmentationcapabilities.
A self-driving car that requires little or no human involvement to navigate traffic and other challenges while detecting its surroundings. It will benefit thosewithdisabilitiesandthe blind.
GirishVarma,
Examplesofcarscarrying a blind man for tacos as early as 2012 (Google, 2012),coast-to-coastroad trips (CNN, 2015), traveling from Italy to China (2013), and the 700,000 miles already driven by Google (Urmson,2014).
Examplesincludedriving a blind man to tacos in a car as early as 2013 (Google, 2013), taking road trips from coast to coastin2016(CNN),and going fromItaly to China in a car as early as 2013 (Google,2013).(Urmson, 2015)
Drivers who are blind are independent. Our device can serve as the foundation for mobileapplicationsthatletthe owner summon the vehicle throughtheappandconstruct a completely autonomous vehicle.
Aself-drivingvehiclethatcan recognise its surroundings and negotiate traffic and other obstacles with little to no assistance from humans. The blind and those with disabilitieswillprofitfromit. It will help differently abled people and blind people as well.
Distinguish between safe and harmfulroadsideregions,add labelsforvehicles,andcreatea label hierarchy that reduces uncertainty. These are only a fewoftheflawsinthecurrent datasets.
Distinguishbetweensafeand risky zones along the road, the need for extra labels for cars,andataghierarchythat reduces uncertainty. These are only a few examples of the limitations of existing datasets.
3. PROBLEM STATEMENT
Thechallengingrobotictaskofautonomousdrivingrequires vision,planning,andexecutioninasituationthatisalways going ahead. This process requires precision for safety reasons. Semantic segmentation provides pixel-accurate detectionofobjectsinadrivingscene.Mostoftheworkhas been done on the structured conditions as there are lot of data sets available for structured conditions. But the environment of Indian road is unconstrained and it is unstructured.MostofthedatasetsavailablearenotonIndian roadstheyaremorestructuredcomparedtothescenarioson Indian roads. So, in order to handle these unconstrained conditions, we have utilized Indian Driving Dataset. Past approacheshavenotcompletelyutilizedthisdataset.
A dataset called IDD[5] consists of 11,000 photos with 33 classes that were gathered from 180 driving situations on Indianroadways.Thisdatasetofferstheinformationneeded for an unrestricted environment, and it can be utilised for semantic segmentation as well as item recognition. The collection includes road scenes that were recorded in Bangalore, Hyderabad,and the surroundingareas of these
cities.Whilesomeofthephotographsare720p,themajority are1080p
4. PRESENTLY AVAILABLE SOLUTIONS
Modern developments in machine learning and computer visionhavepermitadvancementsinautonomousnavigation. Although,mostofthealreadypresentresearchaddresseson European driving situations, while very less progress has beenmadefortheIndiandrivingsituations..Ourcontribution pointstoachievea complete spatial understandingfor the Indian driving situations. Our central focus on semantic segmentation.
Thetopicofautomatedvehicleshasadvancedsignificantlyin recentyears,attractingagreatdealofinterestandbranching out into numerous sub-fields that address every facet ofa self-driving vehicle. Vehicle-to-vehicle communication, energystoragesystems,sensors,safetyequipment,andmore area fewexamples.AmongtheseisthedifficultComputer Vision(CV)taskof"sceneunderstanding,"whichdealswith the processing of unprocessed environmental data to producearepresentationofthesceneinfrontofthecarthat
permitslaterinteractionwiththesurroundingenvironment. Sceneunderstandinginvolvesinterpretingthescenethathas beenviewedusinganetworkofsensorstoperceive,analyse, anddevelopthescene.Itincludesarangeofdifficulttasks, including image classification and more complex ones like object identification and semantic segmentation . The first taskentailsgivingtheinputimageagloballabel;however, becausediverseitemsintheenvironmentmustbelocated,it haslimitedapplicationinanautonomousvehiclescenario. Thesecondassignmentgivesamorethoroughdescription, findsallthingsthathavebeenidentified,andclassifiesthem. Thefinalassignment,whichisthemostdifficult,isforgiving eachpixelintheinputimageaclass.Thischallengeprovides adetailedsemanticdescription,sounderstandingthesceneis the primary objective of the pre-processor, whichcalls for sophisticatedmachinelearningarchitectures.
5. CONCLUSION
Not Satisfied with the previous approaches which tend to focusonconstrainedconditions.Thisdrivesustofocuson unconstrainedconditionsusingIDD[5]dataset.
ForSemanticSegmentation,wehaveusedU-Net[6].Itwas introducedin2015byOlafRonneberger,PhilippFischer,and Thomas Brox. It is a fully convolutional architecture. It comprises encoderanddecoder.Wehaveimplementedthe architecture in PyTorch. We have slightly modified this architecture,aftereverytwolayersofconvolutionallayerwe have added a Batch Normalization layer and later ReLU activation.Insteadofup-convlayer,wehaveusedTranspose Convolutionallayers.Also,wehaveusedpaddingtoensure that output of every layer in encoder is of same size as in correspondinginputofdecoderlayer.
ACKNOWLEDGEMENT
Our mentor Assistant Professor Mrs. Neha Gupta has my warmest gratitude for her encouragement and assistance throughout the review paper. We learned about recent developmentsinthissubject.Weoweheradebtofgratitude. Aftercompletingthisreviewpaper,wehavegainedagreat dealofconfidence.
REFERENCES
[1] Xinyu Huang, Xinjing Cheng, Qichuan Geng, ““The ApolloScapeDatasetforAutonomousDriving”.
[2] JuanRosenzweig,MichaelBartl,“AReviewandAnalysis ofLiteratureonAutonomousDriving”inTHEMAKINGOFINNOVATION,E-JOURNALmakingofinnovation.com, OCTOBER2015.
[3] Markus Hofmarcher & Thomas Unterthiner, “Visual Scene Understanding for Autonomous Driving Using Semantic Segmentation” in LIT AI Lab & Institute for
MachineLearning,JohannesKeplerUniversityLinz,A4040Linz,Austria.
[4] Manikandan Thiyagarajan, “Self-Driving Car” in International Journal of Psychosocial Rehabilitation · April2020.
[5] Girish Varma, Anbumani Subramanian, Anoop Namboodiri,ManmohanChandraker,“DD:ADatasetfor Exploring Problems of Autonomous Navigation in Unconstrained Environment” http://idd.insaan.iiit.ac.in/.
[6] OlafRonneberger,PhilippFischer,andThomasBrox,“UNet: Convolutional Networks for Biomedical Image Segmentation.