Farmers Protest - Stance Detection

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 09 Issue: 06 | June 2022 www.irjet.net p-ISSN: 2395-0072

Farmers Protest - Stance Detection

1,2,3,4Student, Department of Computer Engineering, AISSMS College of Engineering, Pune, Maharshtra, India 5Professor, Department of Computer Engineering, AISSMS College of Engineering, Pune, Maharshtra, India

Abstract Protests are an important part of the democratic way of administration and are a vital tool for conveying public demands to the ruling government. As voters come to terms with any new rules, there are an increasing range of protests everywhere the world for various reasons. With the advancement of technology, there has additionally been an exponential rise within the use of social media for the exchange data and ideas.

During this research, data was gathered from the web site “twitter.com”, regarding farmers’ protest to know the feelings that the public shared on a global level. Due to the repeal of the Farm Laws have been carried out, we aim to use this data to understand whether the government’s decision for the appeal was influenced by the public opinion about this topic.

This paper aims to provide a stance prediction deep learning model that has been achieved using the ULMFiT model after fine tuning, ULMFiT (Universal Language Model Fine Tuning) model which will be a Categorizes into For (F), Against (A) and Neutral (N). Proposed model achieved F1 score of 0.67 on our training and test data, which is essentially a labeled subset of the actual data.

Keywords Dataset, ULMFiT, deep learning, text classification, Language Model (LM)

1. INTRODUCTION

A. Motivation

The Farm Laws that were announced by the ParliamentofIndiainSeptember2020werecauseforthe 2020 2021 Indian farmers' protests. These laws were met with heavy resistance by many farmer unions who deemed it to go against their livelihood, and politicians from the competition who say it might go away farmers at the "mercy of corporates". The union authorities, however, continues that the legal guidelines will make it easy for farmers to promote their produce to large scale buyers and remarked that the protests were just the result of misinformation being spread online Despite India being in large part self enough in food grain manufacturing and having welfare schemes, starvation and nutrients stay severe troubles, with India rating as one of the worst nations withinside the international in mealssafetyparameters.

After the announcement of the acts, farmer unions basedaroundPunjabwereasourceoftheinitialprotests After a lot of protests, farmer unions especially from the states of Punjab and Haryana commenced a motion named “Dili Chalo” (transl. Let's march towards Delhi), where participants numbering in the thousands surroundedDelhi.Theauthoritiesorderedthepoliceand regulationenforcementofnumerousstatestoassaultthe protesters withtheuseofwatercannons andteargas to prevent the farmer unions from stepping into Haryana firstafterwhichDelhi.

TheSupremeCourtofIndiaorderedstayonthefarm LawsinJanuary2021.Farmerleaderswelcomedthelive order,whichstaysineffect Kerala,Punjab,Chhattisgarh, Rajasthan, Delhi and West Bengal state governments surpassedresolutionsinoppositiontothefarmsacts,and Punjab,ChhattisgarhandRajasthanhavetabledcounter regulationoftheirrespectivecountryassemblies.

The main objective of this research is to understand thestanceofthepubliconfarmers’protestsharedonthe micro blogging website “Twitter”. Our research mainly aims at analyzing factuality and polarity of twitter data usingadeeplearningmodelcalledULMFiT.

B. Inductive Transfer Learning

Fig 1. Traditional Machine Learning vs. Transfer Learning

Many NLP state of the art models need to learn from scratch and require big data to attain affordable results, they do now no longer handiest soak up big portions of reminiscence however also are pretty time consuming. In textual data classification, sufficient classified examples are hard to come by, hence we make use of

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1153
Prathamesh Badgujar1, Aditya Kamble2, Anuj Kadam3, Dhruv Shah4 , Anilkumar Kadam5
***

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 06 | June 2022 www.irjet.net p-ISSN: 2395-0072

Inductivetransferlearningtosolvethosechallenges.Itis the principal idea ULMFiT is primarily based totally upon.

Transferlearningpursuitstoimitatethehumancapacity to collect knowledge at the same time as learning one task and make use of this understanding to remedy a related task. In the conventional method, for instance, two models are trained one at a time without both maintaining and moving knowledge from one to the other. An instance for switch studying alternatively mightbetoholdknowledgefromtrainingamodel1and tothenmakeuseofthisinformationtotrainsomeother model.Inthiscase,1stmodelmightbereferredtoasthe sourcetaskand2ndmodel,thetargettask.

C. Overview of ULMFiT Model

1) General-Domain Language Model

Pretraining: In a initial step, the Language Model is pretrained on a huge general collection of texts (WikiText 103 dataset). Now, the model is able to predict the subsequent phrase in a chain with certainty. At this degree the model learns the overall features of the language. Pretraining is mostly useful for datasets containing smaller samples and allows generalization regardless of dataset size. Although this step is expensive, it only needs to be carried out once and improves overall performance and convergence of downstreammodels.

2) Target Task Language Model Fine-Tuning:

Followingthetransferlearningapproach,theknowledge thatisextractedfromtheinitialstepistobeappliedfor thegoalassignment.However,thetargetdataset(i.e.the Farmers Protest Tweets dataset) is probable from a exclusive distribution than the original dataset. To deal with this issue, the Language Model is therefore fine tuned at the records of the target dataset. Just as after the primary step, the model is at this factor capable of expecting the subsequent phrase in a chain. Now however, it has additionally discovered assignment unique functions of the language, consisting of the lifestyles of users in Twitter and their usage of slang language and regional language phrases. Regardless of diversity of the wide domain information used for pretraining, the information of the goal task will be procured from an entirely different distribution. We therefore fine tune the Language Model on data of the target task. Given a pretrained general area Language Model, this stage converges quicker because it simplest desirestoevolvetotheidiosyncrasiesofthetargetdata, and it permits us to educate a robust Language Model evenforsmalldatasets.

3) TargetTaskClassifier: It is vital a part of the transfer learning approach to fine tune a classifier. Overly competitive fine tuning can cause huge amounts

offorgetting, puttingoffthe gainofknowledgecaptured via language modeling; gradual convergence can be caused because of overly cautious fine tuning which results in the data being overfitted. Howard and Ruder[10] proposed gradual unfreezing for fine tuning the classifier. Gradual unfreezing Howard et al[10] is endorsed to frequently unfreeze the model beginning from the closing layer as this carries the least widespreadknowledgeratherthanfine tuningalllayers at once. Howard et al[10] first unfroze the closing layer and then fine tuned all unfrozen layers for one epoch. They then unfroze the subsequent lower frozen layer and repeated, till they fine tuned all layers till convergenceontheclosingiteration.Sinceultimately,in our case, we do not need our model to predict the subsequent phrase in a chain however to offer a stance classification, in a 3rd step the pretrained Language Modelisextendedthroughlinearblocksinorderthatthe final output is a distribution over the stance labels (For (F),Against(A)andNeutral(N)).

a) Gradual unfreezing: Rather than fine tuning every layer at once, which may cause forgetting, we propose to step by step unfreeze the version beginning from the last layer as this consists of the least trendy knowledge We first unfreeze the remaining layer and fine tune all unfrozen layers for one epoch. We then unfreeze the subsequent decrease frozen layer and repeat,tillwefine tunealllayerstillconvergenceonthe remainingiteration.

b) Backpropagation Through Time (BPTT) for Text Classification (BPT3C): Since the model architecture for training and fine tuning is that of an LSTM, the paper[10] implements the backpropagation through time(BPTT) approach to be able propagate gradients without them exploding or disappearing. In order to make fine tuning a classifier for big documents feasible,Howard et al[10] proposed BPTT for Text Classification (BPT3C): The document gets divided into fixed length batches. At the start of every batch, the version is initialized with the very last state of the preceding batch; a track of the hidden states for mean and max pooling is kept; gradients are back propagated to the batches whose hidden states contributed to the very last prediction. In practice, variable duration backpropagationsequencesareused.

StepsinBPT3C:

The record is split into constant duration batches.

Atthestartofeverybatch,themodelisinitiated with the very last state of the preceding batch with the aid of using maintaining tune of the hiddenstatesformeanandmax pooling.

©
| Impact
7.529 | ISO
Certified Journal
Page1154
2022, IRJET
Factor value:
9001:2008
|

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 06 | June 2022 www.irjet.net p-ISSN: 2395-0072

The gradients are back propagated to the batches whose hidden states contributed to the verylastprediction.

2. RELATED WORK

For many years, operations comparable to stemming or lemmatization, still as shallow models, such as SVMs, were popular in NLP. Young et al. [2] claim that word embedding models like word2vec and GloVe, ultimately ledthepathforthesuccessofdeeplearninginNLP.One among the most criticisms relating to pretrained word embeddings is that they solely push antecedently learned data to the first layer of a NN, whereas the remaininglayers of it still must be trainedfrom scratch. Neelakantan et al. [6] experimented with coaching each individual vector for every word. These approaches overcometheproblemofmissingtopic.But,theyhaveto traintheactualmodelfromscratch.

In their seek for better approaches, several researchers searched for strategies that had antecedently proven thriving in Computer Vision. Ruder [1] claims that language modelling is especially suited to capturing sides of language that are vital for target tasks The OpenAITransformerissimilartoELMohoweveritneeds some minor changes within the model design for transfer [12]. Both are proven to provide superb empiricalresults.

Except for achieving progressive leads to varied tasks, ULMFiTincludes many techniques for finetuningmodel that could boost performance for alternative strategies, forexampletheOpenAITransformer

3. DATASET USED

D. For Language Model Training:

We data set has been acquired from Kaggle.com. The nameofthedatasetis“FarmersProtestTweetsDataset”, which contains 2 files, first one is the one containing actualtweetsextractedfromtwitter.comhavinghashtag “#FarmersProtest” and the second one containing data abouttheuserswhomadethosetweets. Datafortweets is collected using the Twitter API through the snscrape Python library. The first (Tweets) dataset has around 855850rowsand14columnsandtheseconddatasethas around169000 rowsand19columns. Weusedonlythe tweets dataset fortraining the language model. Weonly kept the actual tweets column called “renderedContent” anddiscardedallothercolumnssincetheywereuseless forourtask.

E. For Stance Detection Classification:

For stance detection, we used a small subset of the tweets dataset previously mentioned. We manually labeled12000tweetsfromthetweetsdatasetasFor(F), Against (A) or Neutral (N). The distribution of tweets foundisshownbelow

Fig. 2 Overview of the dataset used for training classificationmodel

Asindicatedbytheabovefigure,therewereveryfew tweets which were against the Protest as compared to Supporting and Neutral ones. This would lead to an imbalanced dataset and consequently a biased model towardspositiveandneutralstances.

Totacklethisproblem,weusedatechniquecalledas artificialsuper sampling.In thistechnique,wetranslated each tweet classified as “A” (against) to some other languageofchoice,andthentranslateditbacktoEnglish, tillthenumberofsamplesclassifiedas“A”wereequalto thoseclassifiedas“F”and“N”.

Finally, we chose 2500 random samples from each categorytotrainthestancedetection(TextClassification) modelandput15%fromthemfortesting.

4. SYSTEM ARCHITECTURE

Fig. 3 Systemarchitectureoftheproposedsystem

© 2022,
| Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1155
IRJET

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

As shown in Fig. 4, proposed system mainly consists of4steps

1) Processing the input data and feeding it to the ULMFiTmodel.

2) Training the model on our data and generating volcabularyfortextclassification.

3) Training the classification model using the generatedvocabularyandlabeleddata.

4) Performing multi class classification using BPT3Cmodeltrainedinpreviousstep

5. METHODOLOGY

F. Preprocessing the data

As mentioned in section III, we used a dataset from Twitter having 920000 tweets. However, some of these tweetswereduplicates.Soasaninitialstep,wedropped these duplicates, leaving us with about 855850 unique tweets.

As the next step, we dropped all unnecessary columns. The main tweets column called “renderedContent” was used for language model training, so we cleaned the tweetsasournextstep.Aspreprocessingstepforactual textdata,weremovedallthelinkstowebsitesandother stuffbecausetheywouldnotaddanyvaluetoourmodel. Next, we removed all unnecessary punctuations and whitespaces in between as well as at the end of the tweets.Wehoweverdecidedtokeepthehashtagsaswell asemojisinthetweetsbecausetheycontributetoactual knowledge of our model as well. Please refer to the examplegivenbelow

RawTweet

Cleaned Tweet

They can’t be farmers. Looks like Gundas are having good time. They seem to be violence thirsty goons.

#FarmersProtest twitter.com/IndiaToday/sta…

They can’t be farmers. Looks like Gundas are having good time. They seem to be violence thirsty goons.

#FarmersProtest

G. Performing exploratory data analysis

Wefurtherperformsomebasicdataanalysisonthisdata bymakingdata setoperationssuchasjoiningthetweets and users datasets etc., thus providing us with the information of tweets as well as the users who made these tweets. Through our analysis, we were able to answersomequestionslikehowwasthetrendoftweets

acrosstimeframeofNov2020tillJul2021,atwhattime ofthedayweretheremostnumberoftweets,whowere thetop10mostfollowedpeoplewhotweetedaboutthis topic, amount of interactions carried out their tweets, etc.

C. Training the ULMFiT language model

AS mentioned in previous section, the only column needed for language model is the actual text column. Hence we dropped all other columns from the tweets dataset except for “renderedContent”. All these tweets are supplied to the ULMFiT model that has been pre trainedontheWikipediadatasetcontaininghugecorpus. Wedividedthedatasetinto2partsof90%and10%for training and validation respectively. The fastai library converts the text into its own format for better processing and understanding of the data. As the next step, we found optimal learning rate for training of our languagemodel,whichcameouttobe0.00209.

Fig. 4 Findingoptimallearningrateforthelanguage model

After that, we used this learning rate to train our language model using ULMFiT recommended method of training one layer while others are frozen. Finally, our language model achieved accuracy of 0.45, which is the accuracy of predicting next words, given the current word sequence. We saved this model for use in text classifier.

H. Training the stance prediction model (text classification model)

Volume: 09 Issue: 06 | June 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1156

As a final step, we make use of the manually labelled dataset mentioned in section III that contained about 7300 manually labelled tweets. We used the vocabulary fromlanguagemodelasfeaturesinourtextclassification modeltopredict3labelsnamelyF(For),A(Against)and N (Neutral) using 80% and 20% data for training and validation respectively. The classifier model was able to achieve a respectable F1 score of 0.67 and accuracy of

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 06 | June 2022 www.irjet.net p-ISSN: 2395-0072

about0.7.Weexportedthismodelasa.pklfileandused itforprediction.

I. Deploying the model

We used flask, a famous lightweight web framework for deploying our model on the web. We created a simple HTML form to take tweet as input from user, pre processed it fed it to the model generated in previous step,togivetheoutputaseitherF,NorA.

6. RESULTS

J. Transfer Learning (Language Model Training)

We trainedthe original ULMFiTmodel by providingour datasetfor betterunderstanding of the language aswell as the topic in consideration. The tweets made by Indians are not entirely in English and there are certain words/phrases written in tweets for better impact/understanding of the tweet. The performance of the model is measured by how accurately it can predict the next set of words by looking at the current set of words. The metric used for this task was accuracy. Our model achieved nearly 45% accuracy in the task, which isconsideredveryrespectable.

The extended model achieved an accuracy of 67.76% and F1 score of 68.18%. In order to determine if the model actually learned anything from our dataset, we compared our new model with the base ULMFiT language model, which was not trained on our dataset i.e. which never saw the vocabulary of the tweets, only the vocabulary it learned from its training on Wikipedia 103 dataset. The text classification model trained on base ULMFiT model achieved an accuracy of 64.7% and anF1scoreof64.3%respectively.

Fig. 6 ClassificationmodeltrainedonbaseULMFiT languagemodel

Fig. 5 Languagemodelperformance

K. Text Classification

We made use of the language model as vocabulary or feature vectors for our main task, that was the stance detection of tweets. We call it an extended model The performancemetricsthatwereusedwerequiteaccurate initially.Butaccuracytendstofavourthemostdominant class in the dataset generally, because it only considers how many predictions were correct out of the entire predictions made. Hence we decided to add another performancemetricforcomparisonthatwouldbetheF1 score.

Fig, 7 Classificationmodeltrainedonextended ULMFiTlanguagemodel

So we can say that the model actually learned the vocabulary from the dataset and that helped it achieve betterperformance,albeitveryslight.Overall,3%better performance achieved by just ingesting the topic relevant data and training on it with some moderate hardwareseemstobeworththeeffort.

Metrics

Accuracy F1score ExtendedModel 67.76% 68.18% BaseModel 64.7% 64.3%

7. CONCLUSION AND FUTURE SCOPE

Inthisresearch,wededucedanextensionto an existing state of the art model and tried to compare it to the original model, which evidently showed to have somewhat better performance. The topic relevancy of thedatahelpsthemodeltounderstandthetopicbetter.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1157

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 06 | June 2022 www.irjet.net p-ISSN: 2395-0072

Further, this model can also be used to classify tweets written in some other languages like Marathi, Punjabi etc. by providing respective language data. ULMFiT modelcanbeusedforanytypeoftextclassification,not onlystance.

8. REFERENCES

[1] Sebastian Ruder, NLP’s ImageNet moment has arrived, The Gradient, Jul. 8, 2018, https://thegradient.pub/nlp imagenet/[Blog]

[2] Tom Young, Devamanyu Hazarika, Soujanya Poria andErikCambria,RecentTrendsinDeepLearningBased Natural Language Processing, IEEE Computational Intelligence Magazine, vol. 13, issue 3, pp. 55 75, Aug. 2018

[3] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg CorradoandJeffreyDean,DistributedRepresentationsof Words and Phrases and their Compositionality, arXiv:1310.4546v1[cs.CL],Oct.16,2013

[4] Jeffrey Pennington, Richard Socher and Christopher D. Manning, GloVe: Global Vectors for Word Representation,ComputerScienceDepartment,Stanford University,2014

[5] Matthew E. Peters, Waleed Ammar, Chandra Bhagavatula and Russell Power, Semi supervised sequence tagging with bidirectional language models, arXiv:1705.00108v1[cs.CL],Apr.29,2017

[6] Arvind Neelakantan, Jeevan Shankar, Alexandre Passos and Andrew McCallum, Efficient Non parametric Estimation of Multiple Embeddings per Word in Vector Space,arXiv:1504.06654v1[cs.CL],Apr.24,2015

[7] Bryan McCann, James Bradbury, Caiming Xiong and Richard Socher, Learned in Translation: Contextualized WordVectors,arXiv:1708.00107v2[cs.CL],Jun.20,2018

[8] Kaiming He, Georgia Gkioxari, Piotr Dollár and Ross Girshick, Mask R CNN, arXiv:1703.06870v3 [cs.CV], Jan. 24,2018

[9] Joao Carreira, Andrew Zisserman, Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, arXiv:1705.07750v3[cs.CV],Feb.12,2018

[10] Jeremy Howard, Sebastian Ruder, Universal Language Model Fine tuning for Text Classification, arXiv:1801.06146,Jan.18,2018

[11] Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee and Luke Zettlemoyer, Deep contextualized word representations, arXiv:1802.05365v2[cs.CL],Mar.22,2018

[12] Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever, Improving Language Understanding byGenerativePre Training,OpenAIBlog,Jun.11,2018

[13] Sinno Jialin Pan, Qiang Yang, A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering,vol.22,issue10,Oct.2010

[14]Dipanjan Sarkar,AComprehensiveHands onGuide to Transfer Learning with Real World Applications in Deep Learning, Towards Data Science, Nov. 14, 2018 [Blog]

[15] Jeremy Howard, Lesson 10: Deep Learning Part 2 2018 NLP Classification and Translation, https://www.youtube.com/watch?v=h5Tz7gZT9Fo&t=4 191s%5D,May7,2018[Video]

[16]BenKrause,EmmanuelKahembwe,IainMurrayand Steve Renals, Dynamic Evaluation of Neural Sequence Models,arXiv:1709.07432,Sep.21,2017

[17] Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov and William W. Cohen, Breaking the Softmax Bottleneck: A High Rank RNN Language Model, arXiv:1711.03953, Nov.10,2017

[18] Jason Brownlee, What Are Word Embeddings for Text?, Machine Learning Mastery, posted on Oct. 11, 2017, https://machinelearningmastery.com/what are word embeddings/

[19] Stephen Merity, Nitish Shirish Keskar, Richard Socher, Regularizing and Optimizing LSTM Language Models,arXiv:1708.02182v1[cs.CL],Aug.8,2017

[20]Diederik P. Kingma, JimmyBa,Adam:A Method for Stochastic Optimization, arXiv:1412.6980v9 [cs.LG], Jan. 20,2017

[21]Nitish Srivastava,GeoffreyHinton,Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov, Dropout: a simplewaytopreventneuralnetworksfromoverfitting, TheJournalofMachineLearningResearch,vol.15,issue 1,pp.1929 1958Jan.2014

[22] Michael Nielsen, Improving the way neural networks learn, Neural Networks and Deep Learning, posted in Oct. 2018, http://neuralnetworksanddeeplearning.com/chap3.htm l

[23] Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan and Sune Lehmann, Using millions of emoji occurrences to learn any domain representations for detecting sentiment, emotion and sarcasm, arXiv:1708.00524v2[stat.ML],Oct.7,2017

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1158

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 06 | June 2022 www.irjet.net p-ISSN: 2395-0072

[24]LeslieN.Smith,CyclicalLearningRatesforTraining Neural Networks, arXiv:1506.01186v6 [cs.CV], Apr. 4, 2017

[25] Jason Yosinski, Jeff Clune, Yoshua Bengio and Hod Lipson, How transferable are features in deep neural networks?,arXiv:1411.1792v1[cs.LG],Nov.6,2014

[26] Sebastian Ruder, An overview of gradient descent optimization algorithms, arXiv:1609.04747v2 [cs.LG], Jun.15,2017

[27] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler und Sepp Hochreiter, GANs Trained by a Two Time Scale Update Rule Converge to a Local Nash Equilibrium, arXiv:1706.08500v6[cs.LG],Jan.12,2018

[28] Xiang Jiang, Mohammad Havaei, Gabriel Chartrand, Hassan Chouaib, Thomas Vincent, Andrew Jesson, Nicolas Chapados and Stan Matwin, Attentive Task AgnosticMeta LearningforFew ShotTextClassification, ICLR2019ConferenceBlindSubmission,Sep.28,2018

[29] Arun Rajendran, Chiyu Zhang and Muhammad Abdul Mageed, Happy Together: Learning and Understanding Appraisal From Natural Language, Natural Language Processing Lab, The University of BritishColumbia,2019

[30] Roopal Garg, Deep Learning for Natural Language Processing: Word Embeddings, datascience.com, posted onApr.26,2018[Blog]

©
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1159

Turn static files into dynamic content formats.

Create a flipbook