Automation of Profile Reporting System for Misogyny Identification
Somavarapu Jahnavi Marru Manogna
Computer Science & Engineering
VNRVJIET
Hyderabad,India
Computer
Abstract In most cases, women have been centered in critical situations unnecessarily in digital media. Responsible citizens can stop this by reporting such disparities in social media. On many social media platforms, based on the public voice (no. of reports), the higher authority will take action. This paper provides a summary of how we are preventing misogyny situations by implementing automation of the reporting process from the user end rather than expecting action from the higher authority after the damage has occurred. Our main job is to identify misogynist memes. Memes should be classified as misogynous or not, and misogyny should be divided into sorts such as stereotype, shaming, violence, and objectification.
Keywords: quantizer, malicious, abnormalities, Bi-GRU, tinker, OpenAI, CLIP, corpus, paradigm, misogyny, Roberta, PerceiverIO, URLVoid, TrendMicro, linguistics, fusion.
I. INTRODUCTION
Online, women are prominent, especially on image-based platforms like Instagram and Twitter. The internet has opened up opportunities for women, but the same prejudice and discrimination that exist outside also exist online in the form of offensive material directed at them. Image macros, sometimes known as "memes," are a common communication technique on social networking sites. An internet meme is often a picture with superimposed text that was added later by the meme creator with the primary intention of being humorous and/orsarcastic.
While many memes are created only for laughs, some memesalsopossessperniciousobjectives.Fewpeoplewho are acquainted with the format would be startled to discover that memes may be used as a tool to spread violence and sexism online, furthering gender inequality andsexualstereotypesoffline.
Monitoringandmanaginguserprofilesregularlyisthekey to reducing the circumstances of sexism in social media.
Sayini AnirudhComputer Science & Engineering
VNRVJIET
Hyderabad,India
Computer
The accounts of users who attempt to upload anti-women content frequently and with justification will be permanentlydeleted.
II. RELATED WORKS
Aditya Vailaya., [1] Under the restriction that the test the picture does belong to one of the classes, use binary Bayesian classifiers to try to extract high-level ideas from low-level visual attributes. Consider the hierarchical categorization of vacation photos: at the top level, photos are categorized as indoor or outdoor; outside, photos are furtherdividedintocityorlandscape;andlastly,aportion of landscape photos are divided into classes for sunsets, forests,andmountains.Itwasshownthatacompactvector quantizer might adequately express the class-conditional densities ofthefeatures needed bytheBayesianapproach (Its preferred size is determined by modified MDL criteria).
Rima Masri., [2] In this research, a technique for automatically identifying fraudulent advertisements is proposed and put into practice. For the goal of detecting dangerous adverts, it uses three separate online malware domain detection systems (VirusTotal, URLVoid, and TrendMicro) and provides the number of advertisements foundusingeachsystem.
Elena Shushkevich., [3] In order to identify sexism in messagestakenfromtheTwitterplatform,Inthisarticle,a method based on a combination of Naive Bayes, Support Vector Machines, and Logistic Regression models were presented.
Alessandra Teresa Cignarella., [4] The suggested framework has been tested on an Italian dataset based on Sentence Embeddings and Multi-Objective Bayesian Optimization. Here, they concentrated on the advantages anddisadvantagesofusingpre-trainedlanguageaswellas the role that Bayesian optimization plays in the issue of biasedpredictions.
Goenaga., [5] The recurrent neural network and convolutional neural network (CNN) are two widely used
models for these modelling tasks (RNN), which use quite different approaches to interpreting natural languages. In thisstudy,theymainlyontheRNNtechnique,whichmakes useofaBidirectionalLongShortTermMemory(Bi-LSTM) andConditionalRandomFields(CRF),andweassessedthe suggested architecture on a task for identifying irregularities(textclassification).
John Cardiff., [6] devised a method to identify sexism in tweets obtained from the Twitter website that incorporatesLogistic Regression,NaiveBayesmodels,and SupportVectorMachines.
DebbieGing.,[7]Inordertofindinstancesofsexisminthe online slang lexicon Urban Dictionary, developed by the public, this research uses deep learning techniques. To identify sexist speech, the performance of two deep learning techniques(Bi-LSTM and Bi-GRU) has been evaluated, against that of more traditional machine learning techniques, such as logistic regression, NaiveBayes classification, and Random Forest classification. They discovered that in contrast to the other strategies investigated, both deep learning algorithms are more accurateatspottingsexismintheUrbanDictionary.
Arjun Roy., [8] This task's goal was to foresee how online text posts or comments would propagate violence. The datasets were made available in two languages: Hindi and English. We provided one system for each of these languages.Individualmodelsinbothsystemswerecreated using ensembles of Convoluted Neural Networks (CNN) andSupportVectorMachines(SVM).
Lei Chen., [9] utilized recently developed Transformer modelsthatwerepre-trainedonlargedatasets(mainlyby self-supervised learning) to provide extremely effective visual(V)andlinguistic(L)characteristics.Specifically,we obtainedcoherentVandL featuresusingtheOpenAICLIP model before making binary predictions with a logistic regression model. Second, by adhering to the data-centric AI approach, emphasis should be placed on data rather thanmodeltinkering.
José Antonio García-Díaz., [10] On the one hand, misogynistic tweets on Twitter were found using applied sentiment analysis and social computing tools. On the other hand, created the Spanish MisoCorpus-2020, a wellproportional collection of written texts about misogyny in Spanish, and partitioned it into three divisions based on violence against women, common properties based on misogyny, as well as, pestering females through messages inSpanishandLatinAmerica.
Niloofar Safi Samghabadi., [11] The task's data is supplied in three languages: Bengali, Hindi, and English. Data instances are categorized into aggressiveness classes such as Overtly Aggressive, Covertly Aggressive as well as Not Aggressive.On the other hand, categorized into two main
misogyny classes: Non-Gendered and Gendered. Data for theworkisprovidedinBengali,Hindi,andEnglish.
Endang Wahyu Pamungkas., [12] First, by developing a unique approach and carrying out a thorough assessment of this assignment, explore the key characteristics to spot misogyny and the problems that add to its difficulties. Secondly, carry out several cross-domain categorization studies to investigate the connection between sexism and other abusive language patterns. Finally, cross-lingual classification experiments were conducted to test the effectiveness of sexism detection in a bilingual environment.
Shardul Suryawanshi., [13] To determine if a specific memeisoffensiveornot,combinethetwomodalities.Used the memes associated with the U.S. presidential election held in 2016 to construct the MultiOFF multimodal meme dataset for rude content identification as there was no publically accessible dataset for such purposes. Using the MultiOFFdataset,aclassifierwassubsequentlycreatedfor this purpose. To compare it to a baseline of only text and images,Thevisualandtextmodeswerecombinedusingan earlyfusionmethod.
Abdullah Y. Murad., [14] In this study, the method of identification of an Arabic word for sexism detection in Arabic tweets is given. The Arabic Levantine Twitter Dataset for Misogynistic is used to assess the suggested method, which achieved recognition accuracy for multiclass and binary tasks of 90.0% and 89.0%, respectively. The suggested method appears to help offer workable, intelligent ways for identifying Arabic misogyny on social media.
S. Rajeskannan., [15]Classification engine is displayed by anamalgamatedmodelthatisacombinationofthefeature extraction engine and a social media engine consisting of datasets using input raw texts. For CB identification, context,usercomments,andpsychologicalpropertieswere extracted from the feature extraction engine. An artificial neural network (ANN) is used to classify the data, and the CB Identification may get rewards or penalties by an evaluation system where the classification engine has access to an evaluation system. Deep Reinforcement Learning(DRL),whichboostsclassificationperformance,is usedforassessment.
Giuseppe Attanasio., [16] To solve tasks, utilize Perceiver IO to combine multimodal late streams with unimodal ones. Created unimodal embeddings using RoBERTa (text transcript)andVisionTransformer(picture).Additionally, face and demographic identification, picture captioning, adultmaterialidentification,andwebentitieswereutilized toimprovethedepictionoftheinput.PerceiverIOisbeing used for the first time in this investigation to combine visualmodalitiesandtext.
Table1:Tableshowingdifferentmethodologies,pros,cons,andtheresultsobtainedinthisliteraturesurvey S.No Title
1 [1]ImageClassificationfor Content-BasedIndexing BayesianFramework,Vector QuantizationforDensity Estimation
PROS
Theaccuracyrisesif featuresarefixedsince categorizationis dependentonfeatures.
CONS
Thecountofclasses reducesasfeaturesgrow, andtheyarebasedon bothindividualtraitsand combinationsoffeatures thatarethoughttobe independent.
2 [2]AutomatedMalicious AdvertisementDetection usingVirusTotal,URLVoid, andTrendMicro
UtilizingTrendMicro, VirusTotal,andURLVoidtoFind Malvertisements
PROS
AfterextractingURLs,the URLVoidhasthegreatest accuracyratefor detection.
CONS
Thetotalproportionof thegenuinepositiveis impactedbyTrendMicro (maliciousandthe systemclassifieditas malicious).
2017
3 [3]MisogynyDetection andClassificationin EnglishTweets:The ExperienceoftheITT Team
SVM,modelensembles,and carryingoutanumberoftasks suchaspre-processing,model construction,andembedding thecreatedmodelsinone ensemble.
PROS
Achievingahighdegree ofclassification parameterestimate requiresquick computationsandlittlein thewayoftrainingdata..
CONS
Asthenumberofclasses islowered,themodel's efficiencyasappliedto theMisogyny Identification categorizationdrops.
4 [4]Automatic IdentificationofMisogyny inEnglishandItalian TweetsatEVALITA2018 withaMultilingualHate Lexicon
DialectalfeaturesfromLinear andRBFkernelSVM,suchasa multilingualhatedictionary, andstructuralfeatures.
Thetarget'sgenderhas beenattained.Itis determinedwhether malesorwomenare participating.
2018
5 [5]AutomaticMisogyny IdentificationUsingNeural Networks
Pre-trained word embeddings intheBi-LSTM
CONS
Thedisadvantagesof RBFkernelsaretheir highcomputationalcost andworseperformance inlargeandsparse featurematrices.
PROS
Whenitcomestofeature selection,CRFis sufficientlyversatile.
CONS
Itwon'tfunctionwith CRFifthetermsweren't knowninthesampleof trainingdata.
2018
6 [6]MisogynyDetection andClassificationin EnglishTweets:The ExperienceoftheITT Team
Ensembleof logisticregression, SVM, andnaiveBayes,withtf-id
PROS
Highestaccuracy,was achievedwiththeleast amountofpreprocessing labour.
CONS
Alackofunderstanding
Inotherwords,users haveahardtime interpretingthe knowledge.
2018
7 [7]AComparisonof MachineLearning ApproachesforDetecting MisogynisticSpeechin UrbanDictionary
RandomForest,Logistic Regression,andNaiveBayesBiGRUandBi-LSTM
PROS
Thegreatestoutcomes areaccuracyand sensitivitycamefromBiGRUandBi-LSTM, whereasthefinest outcomesofspecificity camefromRandom Forest.
CONS
WithouttheDl,the outcomesofthe conventionalML approachesarequite poor.
2020
8 [8]AnEnsembleapproach forAggression IdentificationinEnglish andHindiText
Tocarryouttheclassification job,theyusedaneural architecturebasedonBERT withthewordand distributionallevelembeddings.
PROS
ByusingBERTModel, Theyprocessedamore enormousamountoftext andlanguage.
2020
9 [9]MultimediaMisogyny DetectionByUsing CoherentVisualand LanguageFeaturesfrom CLIPModelandDatacentricAIPrinciple.
Transformermodelsthathave alreadybeentrained,suchas thefine-tuningBERTmodel,the universalsentenceencoding (USE)embedding,andthe SBERT,andCLIPmodels
CONS
NGENclassificationis poorinthreelanguages (Hindi,English,and Bengali)
PROS
Infact,performanceis betterwhensentencelevelrepresentationsand visualsareused.
CONS
Betterresultswere obtainedwithanLR modelalonethanwith morecomplexmodels.
2020
10 [10]Detectingmisogynyin Spanishtweets.An approachbasedon linguisticsfeaturesand wordembeddings
Randomforest,sequential minimaloptimization(SMO), LSVM
PROS
theabilitytorecognize sexismandeffectively usebinaryclassification
CONS
Whendealingwithmulticlassclassificationissues, LSVMwasunsuccessful.
2020
11 [11]Aggressionand MisogynyDetectionusing BERT:AMulti-Task Approach
utilisedmanylayers,including theClassificationlayer,Bert layer,Attentionlayer,andFullyconnectedlayer.theoutputof twodistinctclassification layers,oneforidentifying sexismandanotherfor predictingaggressivenessclass.
PROS
Theaggregateresults demonstratethatsexism iseasiertodetectthan antagonisminall availablelanguages.
CONS
Acrossallthelanguages, theperformanceforCAG (CovertlyAggressive)is thelowest,indicating thatitisthemostdifficult aggressivenessclassto recognize.
2020
12 [12]MisogynyDetectionin Twitter:aMultilingualand Cross-DomainStudy.
TofillthegapinAutomatic MisogynyIdentificationinlowresourcelanguages,amodel wasdevelopedbasedonBERT andLSTM.
PROS
Overallresultsshowthat theBERT-basedmodelis themostsuccessful modelinthecrosslingualsetting experiment.
CONS
Thealgorithmdoesnot operateoptimallywhen testedondatafromAMI
2020
13 [13]MultimodalMeme Dataset(MultiOFF)for IdentifyingOffensive ContentinImageandText
Anearlymethodoffusingtext andimageswasusedandits efficiencywastestedby contrastingitwithabaseline thatusedsolelytextandimages.
andtrainedondatafrom otherabusiveevents.
PROS
Theonlymodelthat includeslocal embeddingsisDNN.Asa result,itoutperformed othermodelsby achievingbetteraccuracy andanF1score.
CONS
Shouldgivetextual characteristicsmore weightwhileintegrating themwiththememe's visualcomponents.
2020
14 [14]AI-basedMisogyny DetectionfromArabic LevantineTwitterTweets
Withwordandword embeddingmethods,theArabic textisrendered.Toidentify sexisminArabic,themost recentdeeplearningBERT approachisemployed.
PROS
LinearSVCmodelhas achievedthehighest accuracyamongitspeers. Theresultsdemonstrate thelowperformanceof theRandomForest Classifiermodel.
CONS
Thedatasetlacked equilibrium.Thereare just17commentsinthe sexualharassmentclass, thereforethereisvery littleopportunityto understandthepattern fortheseclasses.
2021
15 [15]Nature-InspiredBasedApproachfor AutomatedCyberbullying Classificationon MultimediaSocial Networking
Artificialneuralnetworks, featureselection,and informationgainareallpartof theDRLAlgorithmforRewardPenaltyDecisions.
PROS
ThesuggestedANN's accuracyhasincreased thankstotheDRL Algorithm.
CONS
Thefeatureselectionand modelwillnotconsider thelatestCBtrends.
2021
16 [16]UsingPerceiverIOfor DetectingMisogynous MemeswithTextand ImageModalities
PerceiverIOwasemployedasa multimodallatefusionlayerfor multi-tasklearning,andthey constructedamultimodallate fusiontojointlylearnfrom severalmodalities.
PROS
Addssemantic informationtothememe, suchastheimage description,facialand demographic information,adult
2022
CONS
Themisogynyoftarget labelsisbalanced,while theothercategoriesare unbalanced.Unbalanceis alittlemoreobviousthan itwasonthetrainingset.
III. CONCLUSION
Accordingtotheaforementionedliteraturereview,several researchers have employed a variety of techniques to comprehendandautomatethemisogynydetectionsystem. The integration of technology to make social media toxicfree is a theme that emerges in every study. Additionally, several techniques are described, including TFBertModel, TFViT, and ViTFeatureExtractor. Each study appears to concentrate on a different topic, such as toxicity identification, misogyny detection automation, meme classification, picture classification, etc. All of the techniques have demonstrated a respectable level of effectivenesswhenclassifyingmemes.Betteroutcomescan be obtainedby increasing financingand exercisingcontrol overasocialmedianetwork.
IV. REFERENCES
[1] Vailaya,A.,Figueiredo,M.A.,Jain,A.K.,&Zhang,H. J. (2001). Image classification for content-based indexing. IEEE transactions on image processing, 10(1),117-130.
[2] Masri, R., & Aldwairi, M. (2017, April). Automated malicious advertisement detection using virustotal, urlvoid, and TrendMicro. In 2017 8th International Conference on Information and Communication Systems (ICICS) (pp. 336-341). IEEE.
[3] Cardiff, J., & Shushkevich, E. (2018). Misogyny detection and classification in English tweets: the experience of the ITT team. In Proc. EVALITA (p. 182).
[4] Endang, W. P., Alessandra, T. C., Valerio, B., & Viviana, P. (2018). Automatic identification of misogyny in English and Italian tweets at evalita 2018 with a multilingual hate lexicon. In CEUR Workshop Proceedings (Vol. 2263, No. 1, pp. 1-6). CEUR-WS.
[5] Goenaga, I., Atutxa, A., Gojenola, K., Casillas, A., Ilarraza, A.D., Ezeiza, N., Oronoz, M., Pérez, A., & Perez-de-Viñaspre,O.(2018).AutomaticMisogyny Identification Using Neural Networks. IberEval@SEPLN.
[6] Caselli, T., Novielli, N., Patti, V., & Rosso, P. (2018, December). Misogyny Detection and Classification in English Tweets: The Experience of the ITT Team. In Proceedings of the Final Workshop (Vol. 12,p.13).
[7] Lynn,T.,Endo,P.T.,Rosati,P.,Silva,I.,Santos,G.L., & Ging, D. (2019, June). A comparison of machine learning approaches for detecting misogynistic speech in the urban dictionary. In 2019 International Conference on Cyber Situational Awareness,DataAnalyticsAndAssessment(Cyber SA)(pp.1-8).IEEE.
[8] Roy, A., Kapil, P., Basak, K., & Ekbal, A. (2018, August). An ensemble approach for aggression identification in English and Hindi text. In Proceedings of the first workshop on trolling, aggression and cyberbullying (TRAC-2018) (pp. 66-73).
[9] Fersini, E., Gasparini, F., Rizzi, G., Saibene, A., Chulvi, B., Rosso, P., ... & Sorensen, J. (2022, July). SemEval-2022 Task 5: Multimedia automatic misogynyidentification.InProceedingsofthe16th International Workshop on Semantic Evaluation (SemEval-2022)(pp.533-549).
[10] García-Díaz, J. A., Cánovas-García, M., Colomo-Palacios, R., & Valencia-García, R. (2021). Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings. Future Generation Computer Systems,114,506-518.
[11] Samghabadi, N. S., Patwa, P., Pykl, S., Mukherjee, P., Das, A., & Solorio, T. (2020, May). AggressionandmisogynydetectionusingBERT:A
multi-taskapproach.InProceedingsoftheSecond Workshop on Trolling, Aggression and Cyberbullying(pp.126-131).
[12] Pamungkas, E. W., Basile, V., & Patti, V. (2020). Misogyny detection in Twitter: a multilingual and cross-domain study. Information Processing&Management,57(6),102360.
[13] Suryawanshi, S., Chakravarthi, B. R., Arcan,M.,&Buitelaar, P. (2020,May).Multimodal memedataset(MultiOFF)foridentifying offensive content in images and text. In Proceedings of the second workshop on trolling, aggression and cyberbullying(pp.32-41).
[14] Muaad, A. Y., Davanagere, H. J., Al-antari, M. A., Benifa, J. B., & Chola, C. (2021, September). AI-based misogyny detection from Arabic LevantineTwittertweets.In ComputerSciences& MathematicsForum(Vol.2,No.1,p.15).MDPI.
[15] Yuvaraj, N., Srihari, K., Dhiman, G., Somasundaram, K., Sharma, A., Rajeskannan, S. M. G.S.M.A.,...&Masud,M.(2021).Nature-inspiredbased approach for automated cyberbullying classification on multimedia social networking. MathematicalProblemsinEngineering,2021.
[16] Attanasio, G., Nozza, D., & Bianchi, F. (2022, July). MilaNLP at semeval-2022 task 5: Using perceiver IO for detecting misogynous memes with text and image modalities. In Proceedings of the 16th International Workshop onSemanticEvaluation(SemEval-2022)(pp.654662).ictionerrorestimation,"IEEEtransactionson pattern analysis and machine intelligence, vol. 32, no.3,pp.569-575,2010.