International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072
![]()
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072
1 Masters Student, Department of Computer Science and Engineering ,Indira Gandhi Delhi Technical University For Women Delhi
2 Assistant Professor, Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University For Women Delhi ***
Abstract – Prediction of bitcoin market trends is one of the important tasks that need dedicated attention as predicting bitcoinpricessuccessfullyleadstoattractiveprofitsbymaking the proper decision. Bitcoin market is challenging due to its non-stationary, blaring and chaotic data, and thus the prediction becomes challenging for investors to invest their money for making profits. The bitcoin price prediction based on historical data or textual information have shown to be unsatisfactory ,That’s why we have used both for predicting thebitcoinpriceandrecommendationsystemtobuyorsellthe bitcoin as per the fluctuation in the market. As the bitcoin price prediction is a Time series problem we have used differentmachinelearninganddeeplearningtechniquessuch as LSTM, ARIMA, and Linear Regression. Out of these ARIMA has performed really well with an accuracy of 80%.Existing Studies in the sentiment analysis have shown that there is a correlation between the fluctuation of bitcoin prices and twitter tweets. We have performed sentiment analysis on the latest 100 tweets about the bitcoin which will provide a recommendation to buy or sell the bitcoin.
market trends. Three models are used as a part of this research work. The model are ARIMA,LSTM, and Linear Regression. Sentiment Analysis is performed on the latest tweetsaboutbitcoin.
LSTM model was first introduced by Hoc Hochreiter & Schmidhuber[1]whichwascapableoflearninglongterm dependencies. Later on, many researched improves this workin[2][3][4].
The rest of the paper is organized as follows. Section 2 includesthe researchstate ofthe bitcoinprice prediction. Section 3 includes the Data collection & Preprocessing. Section4consistsofmethodologiesused.Section5includes theExperimentalResults.Section6concludesthepaper.
: LSTM, ARIMA, Machine Learning, Sentiment Analysis, Bitcoin Trade Open, Bitcoin Trade Close
Bitcoinisacryptocurrency thatisbeingusedgloballyfor digitalpayment,investmentorfortrading.Bitcoinpricesare difficulttopredictduetotheirextremevolatility,whichis influencedbyavarietyofpoliticalandeconomicissues,as wellaschangesinleadership,investorattitude,andavariety of other factors. A model which is considering one component may not be reliable. As a combining both the tweets and the historical price data might improve the accuracy.Thereareprimarilytwoapproachesforpredicting markets trends. Technical analysis and the fundamental analysisaretwotypesofanalysis.Fundamentalanalysisuses previous price and volume to forecast future trends, but technical analysis does not. Fundamental analysis of a currentbitcoinprices,ontheotherhand,entailsevaluating financial data to get insights. The efficient market theory, which holds that bitcoin market prices are basically unpredictable, casts doubt on the usefulness of both technical and fundamental analysis. The goal of this researchworkistobuildamodelwhichpredictsthebitcoin
TheSiliverstovsofManhHaDuongBoris[5],investigatedthe relationshipbetweenequitypricesandcombinedfinancesin key European countries such as the United Kingdom and Germany.AccelerationinEuropeancountryinvestmentsis likelytoresultinastrongerlinkbetweenEuropeannation equityprices.Ifinnovationsinbitcoinmarketseffectactual financialinstrumentslikeinvestment andconsumption.Lui Li[6],Examines the technical indicators and procedures of trading.
KunalGaur[7],thestockmarkettimeseriespredictionand recommendation system is developed using the parallel combinationofLSTM-ARIMA-LinearRegressionwithTwitter Sentimentanalysiswhichhasshowandimprovedaccuracyof 83%.
R Batra[8] experimented the Sentiment analysis for better PredictionofStockPriceMovementusethetwittersentiment analysis for developing the prediction tool on the basis of tweetspolarity.
AMittal[9]proposedacrossvalidationmethodforfinancial dataandobtained75.6%accuracyinpriceprediction using Self Organization Fuzzy Neural Networks on the Twitter feeds.
A Raheman[10] had analyzed that the “out-of-the-box” Aigents model which had a correlation of ~0.33,and after
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
fine-tuning,“aigents+” has a correlation of ~0.57. "ensemble(all)" corresponds to average metrics across all models,and'"ensemble(top3)"correspondstotheaverageof thebestthreemodels(aigents+,aigentsandfinBERT).
Pour [11] proposed that the cryptocurrency market predictionhasbeendonewithDeepLearningtoolssuchas LSTMandBayesianOptimizationhasbeentestedonstock marketindices.TheminimumbatchsizeoftrainingEpochsin deeplearningalgorithmsissetequalto32.
Fromalltheseresearchesdonewewerenotabletofindany work which have used more than 1 models and based on Polarityoftweetsarecommendationofbuy/sellisgivento theuser.
Thedatahasbeencollectedfromthreesources.
The historicdata hasbeendownloadedfrom the year 2014 to 2022 dataset for Bitcoin PricesbyusingtheYahooFinanceAPI.
To validate the data on real time data I have usedthecryptoCMDAPI.Thislibraryallowsto collect the latest data from CoinMarketCap I haveused thepredefinedscraperobjecttoget thedatafromCoincapMarketAPI.
Toperformthesentimentanalysisofthetweets relatedtoBitcoinpricesIhaveusedtheTwitter API which is accessible from the developers account.
To make the data from the mode of entry appropriate for trustworthyanalysis,ithastobepre-processed.
DataPre-processingForARIMAModel:S.No. Techniques
We pre-processed the historical data in the followingmanner:
1. Filling the null values with backward fill method.
2. Taken 80% of the data for training and 20%datafortesting.
DataPre-processingforLSTMModel:S.No. Techniques
1. ScaledthevaluesusingMin-MaxScaler
2. Storingtrendsofaparticularcompanyfrom7days before current day to predict 1 next output and storingthemtotrainingpart
3. Convertingtraininglistintonumpyarrays
4. Adding3rdDimensiontotrainingpart.
DataPre-processingforLinearRegAlgorithm:S.No. Techniques
1. Declaring number of days (n) to beforecastedinfuture.
2. Declaring new dataframe with relevantdata
DataPre-processingforTweets:S.No. Techniques
1. Cleaningupthetweets.
2. Passing the tweets to TextBlob forcalculatingthePolarity.
ARIMA stands for auto regressive integrated moving average.Itisastatisticalanalysismodelthatusestimeseries datatobetterunderstandthedatasetoranticipatefuture trends.Ifastatisticalmodelpredictsfuturevaluesbasedon previousvalues,itiscalledautoregressive.Forexample,an ARIMA model may try to anticipate a company's earnings based on prior periods or predict a stock's future pricing basedonhistoricalperformance.Themodel'sfinalgoalisto forecast future time series movement by looking at disparitiesbetweenvaluesintheseriesratherthanactual values. When there is evidence of non-stationarity in the data,ARIMAmodelsareused.Nonstationarydataarealways turnedintostationarydataintimeseriesanalysis.
We can break down the model into smaller components basedonthename:
The AR which stands for Autoregressive Model, showsarandomprocess.Theoutputofthemodelis linearly dependent on its prio value , such as the
Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page63
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072
number of lagged data points or previous observations.
Integrative [I] :denotes the separating of raw observations so that the time series can become stationary [i.e. data values are replaced by the differencebetweenthedatavaluesandtheprevious values].
Moving Average : It takes into account the relationshipbetweenanobservationandaresidual errorfromalaggedmovingaveragemodel.
It’sauniquetypeofrecurrentneuralnetworkthatcanlearn long-term data relationships. This is possible because the model’s recurring module is made up of four layers that interactwithoneanother.AnLSTMmodulehasacellstate and three gates, giving it the ability to learn, unlearn, or retain information from each of the units selectively. By permitting only a few linear interactions, the cell state in LSTMallowsinformationtotravelacrosstheunitswithout being altered. Each unit contains an input, output, and a forgetgatethataddsorremovesdatafromthecellstate.The forgetgateutilizesa sigmoidfunctiontodeterminewhich informationfromthepreviouscellstateshouldbeignored. Theinputgateusesapoint-wisemultiplicationoperationof 'sigmoid'and 'tanh' tocontrol the information flow to the currentcellstate.Finally,theoutputgatedetermineswhich datashouldbetransmittedonthenexthiddenstate.
LSTMcanbeusedinmanyapplicationssuchasforweather forecasting, NLP, speech recognition, hand writing recognition, time-series prediction, etc .The cell state is representedbythehorizontallinethatrunsacrossthetopof thefigure.Theconditionofthecellissimilartoaconveyor belt.Thisflowsstraightdownthechainwithjustminimal linear interactions. The ability of LSTM to add or delete informationfromthecellstateiscontrolledbygates.Gates areusedtoallowinformationtopassthroughifdesired.
Asigmoidneuralnetlayerplusapointwisemultiplication operationmakeupgates.Thesigmoidlayerproducesvalues rangingfrom0to1,indicatinghowmuchofeachcomponent shouldbeallowedtopass.Letnothingthroughwithavalue of0,andeverythingthroughwithavalueof1!Tosafeguard andgovernthecell state,anLSTMcontainsthreeofthese gates.
Thepriorhiddenstate(ht-1),previous cell state(Ct-1)and presentinputaretheinputstothecurrentcellstate(Ct),as illustratedinFigure3.4.1(Xt).Theforgetgate,inputgate andoutputgatearethethreegatesthatmakeupthecell.
Linear regression is a supervised machine learning algorithm. It carries out a regression task. Based on independentvariables,regressionmodelsagoalprediction value.Itcarriesoutaregressiontask.Basedonindependent variables, regression models a goal prediction value. It is mostly utilised in forecasting and determining the link between variables. Different regression models differ in terms of the type of relationship they evaluate between dependentandindependentvariables,aswellastheamount ofindependentvariablestheyemploy.Linearregressionis usedtopredictthevalueofadependentvariable(y)given an independent variable (x). As a result of this regression technique , a linear relationship between x (input) and y output is discovered (output). Linear Regression gets its namefrom theequation
Y=m*x+c
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072
Real-time data is retrieved through the CryptoCMD API, historicdataisretrievedfromtheYahooFinanceAPI,and sentiment analysis data is retrieved from the Twitter API. ARIMA,LSTM,andLinearRegressionmodelsweredeveloped usingthefunctionsARIMA ALGO(),LSTMALGO(),andLIN REGALGO().Thesemodelsacceptthedatasetvariableasa parameter, and each model returns projected values and root mean square error. These models are trained for predictingitsfuturepriceupto7days.
TheTwitterSentimentanalysismodelwasdevelopedusing thefunctionretrieving_tweets_polarity.
This model is accepting a string variable i.e. “btc” a short form for Bitcoin Currency. The tweets were fetched from twitterAPIbyauthenticatingit.
TheTextBloblibraryforNatural LanguageProcessinghas beentoanalysethepolarityandsubjectivityofthesentence. The polarity of texblob function lies between [-1,1].I have counted the positive polarity and negative polarity of the latesttweets.
This function is returning the polarity ,tweets list , tweets poolgeneratingtheoverallpolarityofthesentences,count ofpositivepolarity,countofnegativepolarityandthecount ofneutralpolarity.
These function are called individually and their value has beenstoredinGlobalvariable.
These global values are passed into the recommending functionwhichisacceptinghistoric-dataset,global_polarity, real-time-datasetandthemeanvaluewhichistheprediction valueofthemostaccuratemodeli.e.ARIMA.
Thisfunctionisreturningtheideaanddecision,wherethe ideadescribesthenextobservedpatterninmarketi.e.Rise orFall and decisionisgivingtheusersuggestiontobuyor sellthebitcoinasperthefluctuationofmarket.
Themodelsranforrecentbitcoinprices.Someillustrations are given below. We’ll be seeing the RMSE(Root Mean SquaredError)forthesebitcoinprice.
ARIMA:-ARIMA model was on the latest bitcoin prices.WehaveachievedanRMSEvalueof19.17.
Fig5.1:ARIMAModelPredictions LSTM:-LSTMmodelwasonthelatestbitcoinprices forvariousepochs asdepictedinthetablebelow. After30epochswegotagoodRMSEvalue. Epochs RMSE 5 150.150 10 302.187 15 390.529 30 78.27
Table5.1(LSTMOnVariousEpochs)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072
In recent years, most people have been investing in the bitcoinmarketinordertomakequickmoney.Atthesame time,aninvestorstandsagoodprobabilityoflosingallofhis orhermoney.Tocomprehendfuturemarkettrends,theuser willneedaneffectivemodel.
Linear Regression :-Linear Regression model was ranandgotanRMSEof89.53.Followingisthegraph plottedforLinearRegressionModel.
Many prediction models exist that can anticipate whether themarketisgoingupordown,buttheyareinaccurate.A modelforpredictingthebitcoinmarketmovementforthe nextdayhasbeenattempted.Amodelhasbeenconstructed andevaluatedusingdiversebitcoinmarketdataaccessible opensource,takingintoaccountnumerouspatternssuchas continuous up/down, volume traded each day, and also includescorporatesentiment.
On the considered dataset, LSTM and ARIMA model are performingbest.
Wehavealsoperformedsentimentanalysisontwitterdata todetectpolarityofthatparticulartweet.Recommendation System is running well with the help of polarity of each tweet.
InFuturewecanmakethisresearchbroaderbypredicting other crypto currencies prices and including different advancedmodelforsentimentanalysis.
Figure
SentimentAnalysis: Takenlatest100tweetsaboutthebitcoin.
Calculated Polarity by using TextBlob and got overallpolarityasPositive.
On this auspicious occasion of accomplishment of our internship project on “ Bitcoin Price Prediction and Recommendation System using Deep learning techniques andtwittersentimentanalysis”,IwouldliketothankProf. Indra Thanaya,Professor,CSEDepartment,IGDTUW, who has supported us through all the ups and downs in the completionofthisproject.Henotonlyguidedusthroughout theprojectbutalsotaughtlifevaluessuchasnevergivingup attitudeandeagernesstolearnandexploremoreinthefield ofML.
We would be thankful to our honourable vice chancellor Dr.AmitaDev,IndiraGandhiDelhiTechnicalUniversityfor including the internship program and providing an opportunitytogainpracticalexperienceinresearchwork.
We are grateful to our honorable HOD, Prof. Seeja. K.R., ComputerScienceandEngineeringDepartment,IGDTUWfor showinghertrustandchoosingusforthisinternshipproject workandencouragingusfurther.
Finally,thesupportandcoordinationreceivedfromallthe teammembersisgreatandtheircontributionwasvitalfor the completion of this internship project. I hope we will achievemoreinourfutureendeavors.
[1] Hochreiter,Sepp,andJürgenSchmidhuber."LSTMcan solve hard long time lag problems." Advances in neural information processing systems 9(1996).
[2] Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. "Learninglong-termdependencieswithgradientdescentis difficult." IEEE transactions on neural networks 5.2(1994): 157-166.
[3]Hochreiter, Sepp,and Jürgen Schmidhuber."LSTM can solve hard long time lag problems." Advances in neural information processing systems 9(1996).
[4] Hochreiter, Sepp. "The vanishing gradient problem during learning recurrent neural nets and problem solutions." InternationalJournalofUncertainty,Fuzzinessand Knowledge-Based Systems 6.02(1998):107-116.
[5] Duong, Manh Ha, and Boriss Siliverstovs. "The stock marketandinvestment." unpublished Manuscript, Available at http://www. finprop. de/Paper5_The_Stock_Market. pdf (accessed 2 June 2010) (2006).
[6] Liu,Li."AreBitconreturnspredictable?:Evidencefrom technicalindicators." PhysicaA:StatisticalMechanicsand its Applications 533(2019):121950.
[7] Kunal Gaur. “STOCK PRICE PREDICTION AND RECOMMENDATION USINGMACHINE LEARNING TECHNIQUES AND TWITTERSENTIMENT ANALYSIS” IRJET 2020
[8]R.BatraandS.M.Daudpota,"IntegratingStockTwitswith sentiment analysis for better prediction of stock price movement," 2018International ConferenceonComputing, MathematicsandEngineeringTechnologies(iCoMET),2018, pp.1-5,doi:10.1109/ICOMET.2018.8346382.
[9] Mittal, Anshul, and Arpit Goel. "Stock prediction using twitter sentiment analysis." Standford University, CS229 (2011 http://cs229. stanford. edu/proj2011/GoelMittalStockMarketPredictionUsingTwitterSentimentAnalysis. pdf) 15(2012):2352.
[10] Raheman,Ali,etal."SocialMediaSentimentAnalysis for Cryptocurrency Market Prediction." arXiv preprint arXiv:2204.10185 (2022).
[11] Pour, Ehsan Sadeghi, et al. "Cryptocurrency Price Prediction with Neural Networks of LSTM and Bayesian Optimization." EuropeanJournalofBusinessandManagement Research 7.2(2022):20-27.
[12]Abraham,Jethin,etal."Cryptocurrencypriceprediction using tweet volumes and sentiment analysis." SMU Data Science Review 1.3(2018):1.
[13]Rather,AkhterMohiuddin."LSTM-basedDeep Learning Model for Stock Prediction and Predictive Optimization Model." EURO Journal on Decision Processes 9 (2021): 100001.
[14]Li,Xinyi,etal."DP-LSTM:Differentialprivacy-inspired LSTM for stock prediction using financial news." arXiv preprintarXiv:1912.10806(2019).
[15] Gandhmal, Dattatray P., and K. Kumar. "WrapperEnabledfeatureselectionandCPLM-basedNARXmodelfor stockmarketprediction."TheComputerJournal64.2(2021): 169-184.
[16]Kumar,K.,andDattatrayP.Gandhmal."Anintelligent Indian stock market forecasting system using LSTM deep learning." Indones J Electr Eng Comput Sci 21.2 (2021): 1082-1089.
Name:GargiSingh Masters Student, Department of ComputerScienceandEngineering ,Indira Gandhi Delhi Technical UniversityForWomenDelhi
Name:B.IndraThanaya,Assistant Professor, Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University For Women Delhi
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page67