Analysis Of Tweets Using Machine Learning to Examine Women's Safety in Indian Cities
Ashwini Y1 , Kavya M2 , Deekshitha A 3 , Akshitha 4 , Bhavya Balakrishnan51,2,3,4Students, Department of Computer Science & Engineering, T John Institute of Technology, Bangaluru, India 5Asst. Professor, Department of Computer Science & Engineering, T John Institute of Technology, Bangaluru, India ***
Abstract - In many cities, violence and harassment against women and girls in public spaces has increased, starting with stalking and progressing to sex harassment or sexual assault. The primary topic of this study paper is on how social media, specifically the Twitter platform, Facebook, and Instagram, plays a part in enhancing the safety of women in Indian cities. This essay also focuses on how society may instill in the average Indian citizen a sense of responsibility about the safety of women around them. Tweets, or Twitter posts, can be used to spread awareness within Indian youth culture and encourage people to take stern action against those who harass women. Tweets typically include photographs, text, and written words and statements that are focused on the safety of women in Indian cities. As a platform for women to express their opinions about how they feel while we go out for work or travel in a public transportation, twitter and other twitter handles that include hash tag messages that are widely shared across the entire globe sir provide women with the opportunity to do so. These women can discuss how they feel when they are surrounded by unknown men and whether or not they feel safe
Key Words: Women Safety, Sexual Assault, Hash Tags, Sentimental Analysis, Tweets on Tweeter.
1. INTRODUCTION
Several studieshavebeenconductedincitiesacross India andwomenreportsimilartypesofsexualharassmentand passingoff commentsby otherunidentified people. There are some forms of harassment and violence that are very aggressive, including starting and passing comments, and these unacceptable practicesareusuallyseenasa normal partofurbanlife.Accordingtoresearchthatwasdoneinthe mostpopulatedIndiancities,includingDelhi,Mumbai,and Pune,60%ofwomenreportfeelingunsafe.Womenhavethe abilitytoexpresstheirthoughtsabouthowtheyfeelwhen wegoforworkorrideinapublicvehiclethankstoTwitter andotherTwitteraccountsthatfeaturehashtagmessages that are frequently shared throughout the world. These ladies can talk about how they feel and whether they feel comfortable when they are around guys they don't know. Women have the right to the city, which gives them the freedom to go wherever they like, including to places of learningandotherplaces.
Therearemanyplacesinthecountrywherewomenarestill notawareofsomeofthemostbasicrightsthattheycantake advantageofinordertoempowerthemselves.Thisbringsus tothenextthingthatneedstheattentionofpeoplelivingin our country. Many women livings in socially and economicallybackwardareasarebeingvictimsofdomestic violence,withoutbeingawareofwhattheyshouldbedoing inordertopreventthisfromhappeningandtakingastand forthemselvesafterthishappens,womenkeeponenduring this horrible behavior against them. However, the biggest reasonwhywomenfeelunsafeinpublicplaceslikemallsis becauseofgirlharassment.Theycanbepreoccupiedbytheir work'attentionissuesorsafetyworriesaswell.Sometimes neighborhoodgirlswouldbotherthegirlsastheywalkedto school, or perhaps there wasn't adequate safety, which wouldcauseyounggirlstobeafraid.
1.1 Motivation
India is experiencing a daily rise in crime. The most concerning sort of crime is crime against women Women travellingfromothernationsarelikewiseinahesitantstate whenconsideringvisitingIndia.Theirfear,however,cannot prevent them from participating in any form of social engagement.Whilethereareregulations,therealsohastobe sufficientsafetymeasuresthatwemustadheretoinorderto safeguard women from abuse. A nationcannot advance if women must endure hostility from the populace since womenalsocontributetotheadvancementofthecountry.
1.2 Objective
Strategies,policies,andlawsaimedatreducinggenderbasedviolence,includingwomen'sfearofcrime,arepartof women's safety. Safe spaces are necessary for women's safety. Space isn't impartial. Fear-inducing spaces limit movementandthecommunity'sutilizationofthearea.
2. LITRATURE SURVEY
[1] Contextual phrase level polarity analysis using lexical affectscoringandsyntacticN-grams:
Theyofferaclassifierthatpredictsthecontextualpolarity of subjective clauses in a sentence. We can automatically scorethegreatmajorityofthewordsinourinputwithoutthe requirement for manual labelling thanks to lexical scoring
thatwasdevelopedfromtheDictionaryofAffectLanguage (DAL) and extended through the World Wide Web. To accountfortheimpactofcontext,theyaddn-gramanalysisto thelexicalscoringprocess.Theymergedallofthesyntactic componentsfromallsentenceswiththeDALscore.Then,as features,extractthen-gramsofthesentence'scomponents. Thefindingsindicateasignificantimprovementoverboththe easier baseline of lexical n-grams and the baseline for the majorityclass
[2]Determiningthesentimentofopinions:
Identifyingsentiments(theaffectivepartsofopinions)isa challenging problem. They present a system that, given a topic,automaticallyfindsthepeoplewhoholdopinionsabout that topic and the sentiment of each opinion. The system contains a module for determining word sentiment and another forcombiningsentiments withina classifyingand combining sentiment at word and sentence levels, with promisingresults.
[3]AccurateUnlexicalizedParsing:
In this study, we demonstrate that the parsing performance that an unlexicalized PCFG can attain is substantiallygreaterthanpreviouslyreported,andinfact,far higher than conventional wisdom had considered conceivable. We outline a number of straightforward, linguisticallyjustifiedannotationsthatsignificantlyclosethe gap betweena standard PCFGandcutting-edgelexicalized models.
[4] Study of twitter sentiment analysis using machine learningalgorithmsonpython:
People frequently use the social media site Twitter to share their thoughts and emotions on various occasions. Sentiment analysis is a method for analyzing data and locating the sentiments it contains. Twitter sentiment analysisistheuseofsentimentanalysistodatafromtweets on the social media platform in order to derive user sentiments. In this study, we analyze a few studies on sentiment analysis research on twitter, outlining the methodology used, the models used, and outlining a generalizedPython-basedapproach.
[5]TwitterSentimentAnalysis:
Twitter sentiment analysis was created to examine customer perceptions of the essential elements of market success. The application will combine natural language processingmethodswithamachine-basedlearningapproach thatismoreaccurateforsentimentanalysis.
3. METHODOLOGY
Thisprojecthasbeendividedinto2phases.
First, literature study is conducted, followed by system development. Literature study involves conducting studies on various sentiment analysis techniquesandmethodthatcurrentlyisused.
In phase 2 application requirements and Functionalitiesaredefinedpriortoitsdevelopment. Also, architecture and interface design of the programandhowitwillinteractarealsoidentified. In developing the twitter sentiment analysis applications, several tools are utilized, such as pythonshellandnotepad.
SENTIMENTAL ANALYSIS
Fig-1: GeneralMethodologyforSentimentAnalysis
3.1 SYSTEM ARCHITECTURE
Fig-2: SystemArchitecture
InSystemArchitecturewehaveuserandthetweeterserver
Few steps in this have the two-way connections Like For collecting the tweeter API and collecting the tweet After creating the tweeter API user sends to the tweeter server wheretheservergeneratesthekeysandsenditback.Once the tweets are collected again it sends it to back server whereserververifiesandagainsenditbacktotheuser.
3.2 USECASE DIAGRAM
In the use case diagram, we have many connections from userandthetweeterserver.Manyconnectionshavethe2way connections. It undergoes many processors like. Requesting tweeter for the generation of the keys. After generatingthekeys,itverifiesthekeys.
The user will collect the verified keys and continues the further processing. Like we have data preprocessing, sentimentalanalysisandsegregation
In sentimental analysis the tweets will be classified in to three groups positive, negative and neutral tweets. In segregationclassifiedtweetswillbedeclaredforeachcity based on positive negative and neutral in the form of percentage
3.3 SEQUENCE DIAGRAM
In the sequence diagram we have the same systematic representation as system architecture. It undergoes data preprocessing whereall theinputtweetswill becollected after collecting the tweets will be cleaned and it rescaled datasetthroughNLP.Inthedatapreprocessing1levelwe willfindthesentimentalanalysisandsegregation.
4. IMPLEMENTATION
A subfield of data science known as natural language processing (NLP) involves methodical procedures for intelligentlyandeffectivelyevaluating,comprehending,and extrapolatinginformationfromtextdata.Thelargeamounts oftextdatacanbeorganizedusingNLPanditscomponents toaddressawiderangeofdifficulties,includingautomatic summarization, machine translation, named entity recognition, link extraction, sentiment analysis, speech recognition,andtopicsegmentation
A text is tokenized during the process of tokenization.
Tokensarewordsorotheritemsthatappearinthe text.
Textobjectsincludesentences,phrases,words,and article
4.1 Text Preprocessing:
Astextistheleaststructuredofallthedatakinds,itcontains avarietyofnoiseandcannotbeeasilyanalyzedwithoutpreprocessing.Textpre-processingreferstothefullprocedure of standardizing and cleaning text to remove noise and prepareitforanalysis.
Ittypicallyconsistsofthreesteps:
1. Lexiconnormalization.
2. Noisereduction
3. Objectstandardization
5. PROBLEM STATEMENT
Manyincidentsofviolenceandharassmentagainstwomen andgirlshaveoccurredinpubliclocationsindifferentcities, startingwithstalkingandprogressingtosexualharassment orsexualassault.Girlsareharassedmostoftenforreasons relatedtosafetyoralackoftangibleconsequencesintheir lives. Instead of placing limits on women, society should understandtheneedofprotectingthemandthatwomenand girlshavethesamerighttosafetyinthecityasmenhave.
5.1 EXISTING SYSTEM
On social media, people frequently express themselves openly about how they feel about Indian society and the politicianswhoassertthatwomenaresecureinIndiancities.
People can freely express their opinions on social media networks, and women can publish their stories of sexual harassmenttheyhaveencounteredorhowtheywouldhave retaliatedagainstitifithadbeenpusheduponthem.
5.2 PROPOSED SYSTEM
Social media can be seen as the ideal medium to discover people's opinions and thoughts regarding various events because people actively communicate and share their opinions on sites like Facebook and Twitter. There are numerous opinion-focused information collection and analyticsplatformsthattrytoascertainpeople'sopinionson various subjects. Twitter posts are brief, and users frequently utilize alternative terms and acronyms. The existingNLPalgorithmfindsitchallengingtoeasilyextract thesentimentfromthesephrases.
6. CONCLUSIONS
Thedifferentmachinelearningtechniquesthatcanhelpus organizeandanalyzetheenormousamountofTwitterdata acquired,includingthemillionsoftweetsandtextmessages postedeveryday,havebeendiscussed.TheSPCmethodand linear algebraic Factor Model techniques, which help to furthercategorizethedataintomeaningfulgroupings,are two machine learning algorithms that are particularly successfulandusefulwhenitcomestoevaluatingenormous amountsofdata.Anothermachinelearningalgorithmknown assupportvectormachinesishighlypopularforextracting usefuldatafromTwitterandgaininginsightintothestatus ofwomen'ssafetyinIndiancities.
7. FUTURE SCOPE
Since only Twitter is taken into consideration in our experiment,wecanexpandtoapplythesemachinelearning algorithms on other social media sites like Facebook and Instagram as well. The proposed ideology can be incorporatedintotheTwitterapplicationinterfacetoreacha wideraudienceandperformemotiveanalysistomillionsof tweetstoincreasesafety.
ACKNOWLEDGEMENT
WeextendourgratitudetoDr.ThomasPJohn(Chairman), Dr.SureshVenugopalP(Principal),Dr.SrinivasaHP(Viceprincipal),Ms.SumaR(HOD-CSEDepartment),BhavyaNJ (Associate Professor & Project Coordinator), Ms. Bhavya Balakrishnan (Assistant Professor & Project Guide), Teaching & Non-Teaching Staffs of T. John Institute of Technology,Bengaluru–560083.
REFERENCES
[1] Agarwal,Apoorv,FadiBaidsyandKathleenR.Mckeown. “Contextualphrase-levelpolarityanalysisusinglexical affectscoringandsyntacticn-grams.”proceedingsofthe 12th European chapter of the association for computational linguistics, associations for computationallinguistice,2009.
[2] Barbosa Lucianoand Junla Feng. “Robust sentiment detection on twitter from biased and noisy data.” Proceedings of the 23rd international conference on computational linguistics: posters. associations for computationallinguistic,2010.
[3] Bemingham, Adam, and Alan F. Smeaton. “Classifying sentiment in micro blogs: is brevity an advantage?” proceedingsofthe19thACMinternationalconference oninformationandknowledgemanagementACM,2010.
[4] Gamon,Michael.“Sentimentclassificationoncustomer Facebookdata:noisydata,largefeaturevectors,andthe role of linguistic analysis:” proceedings of the 20th international conference on computational linguistics associationfromcomputationallinguistics,2004.
[5] Kim,Soo-min,andEduardhovy.“Determiningthesof options.” proceeding of the 20th international conference on computational linguistics Associations fromcomputationallinguistics,2004.
[6] Keindan, and Christopher D. Manning, “Accurate Unlexicalizedparsing.”proceedingsofthe41stannual meeting on association f or computational linguisticsvolume 1. Association from computational linguistics,2003.
[7] Charniak,Eugene,andmarkJohnson.“Coarse-to-finenbest parsing and maxent discriminative re-ranking”. proceedingsofthe43rdannualmeetingonAssociations for computational linguistics. Associations for computationallinguistics,2005.
[8] GuptaB.,negiM.,Vishwakarma.,RawatG.,&Badhani,P. (2017). “Study of twitter sentiment analysis using machinelearningalgorithmsonPython”.international journalofcomputerapplications,165(9),0975-8887.
[9] Sahayak, v., Shete, v. &Pathan, a. (2015). sentiment analysis on twitter data. international journal of innovative research in advanced engineering (IJIRAE),2(1),178-183.
[10] Mamgain, N., Mehta,E., Mittal, A., &Bhatt, G.(2016,march).sentimentanalysisoftopcollegesIndia using twitter data. in computational techniques in information and communication Technologies (ICCDICT),2026 international conference on (pp.525530).IEEE