Study of Data Analysis Model Based on Big Data Technology

Page 1

Study of Data Analysis Model Based on Big Data Technology

Abstract Thetraditionaldataanalysisaregroundedon the cause and effect relationship, formed a sample bitsy analysis,qualitativeandquantitativeanalysis,thethinking mode of trend extrapolation analysis. Big data has a abecedarianimpactonthetraditionaldataanalysis.Bigdata analysis grounded on correlation, formed global macro analysisˈdataandspecializedanalysisˈcorrelationanalysis and new thinking mode of correlation analysis. videlicet, from unproductive analysis to correlation analysis and knowledge discovery, from model fitting to data mining, fromlogical logicto associationrules.Data analysisinthe period of big data have taken great changes, videlicet, Big dataanalysis,fromtheanalysisofobjects,themodeofdata processing,logicalstylesandtools,logicalthinking.

Keywords - big data; data analysis; qualitative and quantitative analysis

1. INTRODUCTION

Bigdataisoneoftheworld'shottestvocabulariesafterthe Internetofeffectsandpallcomputing.Bigdatahasbroughta greatimpact.Onallowingmode,educationmodel,business operationmodel,scientificexplorationmodelandmedical individualmodel,etc.Bigdatahasaabecedarianimpacton allfields.Traditionaldataanalysishasbeendevelopedfrom theanalysisofthesampleof“Toseeonlyonespot"intothe time of overall analysis of “ the overall situation ”. Traditionaldataanalysisofsmalldataallowingmodeland fine model has been delicate to acclimatize to the data processingrequirementsoflargedataperiod.Chancingthe knowledge,miningvalue,lookingforassociationisthereal need of data analysis in the period of bigdata.However, discardingtherubbishandelecttheessential,Butanalysisof theageofbig

Ifthetraditionaldataanalysisisthenuggetsfromthemine. dataistakingthegoldfrombeach,discardingthefalseand retainthetrue beachbegan toseegold”.“Blowingbeach onlyseegold”and“discoverorderfromchaos”canbesaid thatthemosttruedepictionofdataanalysisoftheperiodof bigdataanalysis.

2. DATA ANALYSIS

A. Summary of Data Analysis

TheconnotationofdataanalysisDataanalysishasabroad andnarrowsense,Generalizeddataanalysisreferstothe sorting, sorting, sorting, organizing, storing, recycling, assaying and studying on the base of collecting and enwrappingthedata,thewholeprocessofdiscoveringnew knowledge.Narrowdataanalysisreferstothedataanalysis ofthecolorfullinks,similarassorting,sorting,screening, association, storehouse, processing, analysis and exploration,etc.Dataanalysisistheidentification,of the originaldataandthedatacollectedthroughthecollection.

Mining rules, intelligence and knowledge hidden in the data, which are give a prophetic , scientific, and comprehensive and vacuity conclusion or plan, for operationanddecisionmakingservices

Data analysis has a different understanding of different disciplines. But the substance is the same. In the field of Statistics,Dataanalysisisgenerallyinterpretedasadata analysisorstatisticalanalysis;Inthefieldofinformation wisdom and data operation, Data analysis is generally understood as information analysis or information exploration;InthefieldofComputerScience,Dataanalysis is generally interpreted as data mining or knowledge discovery.

rudimentsofdataanalysisFromtheviewoftheconception of data analysis, data analysis is an organic total that composedofaseriesoffactors,similarasorigin,substance, system,process,resultandpurpose.Fromtheviewofthe substance, Data analysis is the discovery of the nature, characteristics,attributes,rulesandassociationsfromthe datamiracle.Fromtheviewoforigin,dataanalysiscomes fromthedemandofsocialdata;Fromtheviewofprocess, data analysis needs a series of links and procedures to collect, sort, elect, organize, storehouse, processing, analysis and exploration, just draw a scientific and dependable conclusion; From the view of system, data analysis system can be divided into qualitative analysis systemandquantitativeanalysissystem,whichcomposed

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page498
Shraddha Sanjay Paralkar1 , Shubham Sadashiv Dhage2 , Arshad Shaukat Mulani3, Asst.prof.S.V. Thorat4 1stShraddha Sanjay Paralkar, MCA YTC, Satara 2ndShubham Sadashiv Dhage, MCA YTC Satara
***
3rd Arshad Shaukat Mulani, 4 Prof. S.V. Thorat Dept. of MCA Yashoda Technical Campus,Satara-415003

ofscientificthinkingsystem,statisticalsystem,sociological system, information wisdom system. From the view of achievements, Data analysis process will produce new value- added products, videlicet knowledge, intelligence, scheme,report,etc;Fromtheviewofideal,Dataanalysisis substantiallyforscientificoperationandscientificdecisionmakingservices

Objectofdata analysisTherearetwomaintypesofdata analysisAclassofnumericaldata,substantiallyrefersto theoriginalanddeduceddata,Thepurposeistodiscover knowledge, intelligence, wisdom and law from the data through quantitative analysis system; A class of nonnumerical data, substantially refers to effects and their marvels, the purpose is to find out the substance, trait, characteristic, rule and relation of the thing from the miraclethroughthequalitativeanalysissystem.

Function of data analysis Data analysis plays a data collation, objective evaluation, trend vaticination, data feedback, and other introductory functions in scientific operationandscientificdecision- timber,whichPlaysan important part in the identification and selection, arrangement and sequencing, monitoring and early warning,aswellasstaffandnavigation.

B) Data analysis model

PrincipleofdataanalysisDataanalysisisgroundedonthe attributes, characteristics, nature, law and correlation of datatoexpandthequalitativeandquantitativeanalysis,in order to discover new knowledge. thus, Data analysis is groundedontheunproductiverelationshiporcorrelation betweeneffects,marvelsanddata.Relationshiprefersto the correlation between effects due to time, order, structure, movement and so on, including time, space, circumstance and development sense. The relationship between effects, marvels and detail is veritably Complex anddifferent.Butitcanbeclassifiedastwokindsofquery relation and certainty relation. query relation is substantiallytheaffiliatedrelationship,whichisthebaseof qualitative analysis; while the certainty relation is substantially quantitative relation, which is the base of quantitativeanalysis.

Dialecticalmaterialismtellsustheworldisuniversaland noindependentactualityofthemiracleandeffects.Small worldmiracle(sixdegreesofseparationproposition)and socialnetworkanalysissystemtellusthatbetweenpeople is generally and through a variety of connections to forming social networks. Meanwhile, everything always happensanddevelopsinacertaintimeandspace,which has egregious heritage and development and show a logicalrelationship.The universal actuality of effects, marvels and data is the base of data analysis. Although some connections are direct and significant and easy to find, and some connections are circular and implicit

relations, it's delicate to find. Because of time, these connectionsmayhaveacauseandeffectrelationship.

The thinking mode of data analysis For a long time, the dataanalysissubstantiallyfollowsthreeintroductoryideas, videlicet sample and population, qualitative and quantitative, trend extrapolation, which formed a set of allowingmodeandhasplayedanimportantplacesinthe" smalldata"analysisofthetimes.

a)SamplebitsyanalysisDataanalysistakesthedataand the miracle as the objects, It's generally named from the whole or part of the overall samples for analysis and be calledsampleanalysisorsliceanalysis.

b) Qualitative and Quantitative analysis Its grounded on correlation.Sample’snature,law,characteristic,traitand relationofsampleareanatomizedbyqualitativesystem;Its groundedoncauseandeffect,thecharacteristics,lawsand relationsofthesampleswerequantitativelydescribedor fitted by fine and statistical models. Quantitative connectionsbetweensamplesaregenerallynotrigorously functional, but the approximate function relationship, whichneedtousefunctionrelationtoroughlydescribethe relationship.

TrendextrapolationanalysisGroundedonthequalitative and quantitative analysis, the nature, the rule, the characteristic,thetraitandtherelationofthesamplesare attained,andthetendencyisdecidedtothewholeorthe population, and the overall vaticination or estimation is carriedout.

systemandtoolfordataanalysisDataanalysisstylesare substantially deduced from the sense system, system analysis system, quantitative, sociological system, statisticalsystem,finesystem,whichgenerallydividedinto threesituationsofphilosophicalstyles,generalstylesand specificstyles.Concreteanalysissystemisgenerallyalso divided into three types qualitative system, quantitative systemandsemiquantitativesystem.Thequalitativestyles substantiallyhavelogicalthinkingandscientificthinking system,whichIncludedbracketandcomparison,analysis and conflation, induction and deduction, analogy and imagination,etc.Thequantitativestylessubstantiallyhave multivariate analysis system( similar as correlation analysis,retrogressionanalysis,clusteranalysis,etc.),Time seriesanalysis(similarasmovingaverage,

exponential smoothing, direct trend, seasonal indicator,etc.), Literature dimension system, etc. Semi quantitativesystemsubstantiallyincludescontentanalysis system,logicalscaleprocess,Delphisystem,etc.Thereare fourmaintypesoftoolsfordataanalysisFirst,socialcheck andexpertchecktoolssecond,logicalthinkingtool;third, Mathematicalandstatisticalmodels;Forth,Datebaseand computerdataminingtools.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page499

Thesestylesandtoolscandissectdata,dataandmarvels from different perspectives and position and give the necessary qualitative and quantitative base for scientific operationandscientificdecisionmaking

3. BIG DATA ANALYSIS

A. Big Data Overview

GenerationanddevelopmentoflargedataThegeneration anddevelopmentofbigdatahasenduredthreestagesof development.From1980stomiddleofthe90s,that'sthe embryonicstageofbigdata.In1980,ThevisionaryofAlvin TofflerofAmericathinksthatbigdatewillbepraisedas" thethirdsurgeofthecadenzainthe"thirdsurge".Inthe middle of 1990s to the first 10 times of twenty-first Century,Bigdateisextensivelyconcernedstage.Bigdata hascomeahotcontentinthefieldofcolorfuldiligenceand disciplines.Bigdata,thislanguagecanbetracedbacktothe orgApach’sopensourcedesign Nutch.Atthattime,big datawasusedtodescribealargenumberofdatatsetsthat need to be reused or anatomized at the same time to modernize the network hunt. September 2008, Nature magazine published" Big Data Science in the petabyte periodBig"seriesofSpecialpapersandtheconceptionof" big data" was put forward. Thenˈthe big data has come popularwordintheITassiduity.Academia,assiduityand governmenthavegivenahighdegreeofconcern.President of the United States Science and Technology Advisory CommitteegavePresidentObamaandCongressa report that entitled" The future of digital planning". In 2011, wisdomalsolaunchedSpecialcolumnsabout"Dealingwith Date ”, which bandied the significance of in scientific explorationandoperationofdata.InJuneofthesametime, McKinsey&companyreleasedadetailedreportaboutbig data,videlicet"BigDataThecomingfrontierforinvention, competition, and productivity( big data invention, competition and the coming frontier of productivity), whichwascarriedoutadetailedanalysisinimpactonbig data, crucial technology and operation fields, etc. IBM, Microsoft,AppleofITtitanshaveenforcedbigdataplans andsystems,whicharetryingtoenthrallthecommanding elevation in the field of large data. After 2012, big data pours into the rapid-fire development stage. The United States,Japan,othercountriesandtheEuropeanUnionhave putforwardtheresponsemeasuresaboutthedevelopment of large data. China is also laboriously involved in. Of February

2012,TheUnitedStatesObamagovernmentpublished“big data exploration and development proffers ”, planned to use big data in the field of biology, technology, drug and other fields. March, Davos World Economic Forum released"bigdata,bigimpact";InMay,theUnitedNations SecretaryGeneral'sofficehasissued"bigdatatopromote development challenges and openings"; June, The ninth

session of the OECD Statistics Committee issued a explorationreport-Usebigdatafordecisiontimber;InJuly, The Japanese Ministry of internal affairs put new comprehensivestrategyforCIT,videlicet"theexertionof CITinJapan,thefocusonbigdataoperations.InJanuary 2013, the British government blazoned that it would invest1.89Billionpoundinthefieldofobservation,Medical andhealthworkoflargedataandenergysavingcalculating technology.Thedevelopmentandexplorationofbigdate got into the climax and in our country is also a hot. The time of 2011, is China's first time of big data. 2012 is China'sbigdataimportanttime.colorfulkindsofbigdata forumheldconstantlyandavarietyoflargedatasystems, planning,reporting,andstrategyweresurfacedOneafter another.2013isnamedbythefirsttimeofChina'sbigdata statistics. In November 2013, The National Bureau of Statistics, Ali, Baidu and other 11 companies inked a big datastrategiccooperationframeagreement,whichhasput bigdatatothepeak.Atthemorningof2013,TheMinistry of wisdom and technology of China blazoned the time 2014" National crucial introductory exploration and development plan( videlicet 973 Plan, including major scientificexplorationdesign",amongthis,"theResearchon the base of large data calculating" come an important directiontosupport.

ThecharacteristicsofthedataThecomputerwisdomand artificial intelligence laboratory at the Massachusetts Institute of Technology professor Sam Madden first summarizesthe"3v"characteristicsofbigdata,videlicet the volume, variety, haste. IDC holds that the characteristicsshouldalsoaddvalue.IBMconsidersthat bigdatashouldalsoincludeveracity.ForrestercriticBrian Hopkins and WeiErSong epitomize the characteristics of thebigdataasmass,diversity,highspeedandvariability. Overall,bigdatahasthecharacteristicsof“6v1c",videlicet the large volume of data( Volume), the variety of type( Variety), the fast processing haste( haste), the large operation value( Value), carrying and transferring freely and flexibly( Vender), the veracity( Veracity), Great difficulty in processing and analysis( Complexity). presently,colorfuldiligencehavedifferentinterpretation onthecharacteristicsofbigdata.The"4v"characteristics ofbigdata,videlicetthevolume(largecapacity),variety( colorfultypes),haste(highspeed)andthemostimportant value(lowviscosity),arewidelyhonored

B. The Model of Big Data Analysis

The arrival of the period of big data has changed the thinkingmodeoftraditionaldataanalysis.Inthe periodof big data, we need not only the traditional, micro data analysisgroundedonasample,butalsothe ultramodern, macro data analysis grounded on the overall. 1) The proposition of big data analysis The data analysis of the periodofbigdatacanbecalledthebigdataanalysis,which

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page500

substantiallyfollowsthree introductory generalities.First, concentrateonallnot sliceBigdataanalysisisthemacro data analysis, which needs to completely observe the substance,characteristics,attributes,lawsandcontactof theoverall,ratherthanSampleto ramifytheconnection between detail or marvels. Second, concentrate on correlationnot reasonInthe periodofbigdata,facethe challengeofhuge quantitiesofdata,knowingwhat'smore importantthanknowingwhy.similarasstockdata,it'seasy toknowwhetheritrisesorfallsaccordingtothebigdata analysis,butit'shardtoknowwhyitcanriseorfall.The typicaltaskofbigdataanalysisistorealizepatternmining and vaticination analysis through correlation. Big data analysis emphasizes set up we should find the new patterns we do not know in advance and the unknown correlation. Third, concentrate on effectiveness not delicacyInthe periodof bigdata,timeandcostis more meaningfulthantheaccurateresults.Becauseofbigdata analysistoalloroverallasanobject,it's nearly insolvable tofindasuitablestatisticalor finemodeltodescribetheall oroverallcharacteristic,chronicity,andcontact.However, timeandcostmustbeamazing,Ifany.Atthesametime,it's delicate to directly or intimately set up all or overall substance, parcels,characteristics,chronicity,andcontact.

2)Bigdata logicalallowingmodeBigdataanalysisfocuses on data analysis, onmulti-source data emulsion, emphasizesoncorrelationanalysisasthecore,hasformed anewmodeofthinking.That'sfromunproductiveanalysis tothecorrelationanalysisandknowledgediscovery,from model fitting to data mining, from logical logic to association rule timber. a) The whole and macroscopic analysisBigdatatakesallthedataoroverallastheanalysis object,andthedataisthecoreandkey.Thenature, trait, characteristic, rule and relation of big data should be observedonthewholeandmacro. b)Dataandspecialized analysis Big data takes data and technology( computer technology and network technology) as the core, takes database,dataminingandknowledgediscoveryalgorithm as tools. The emphasis is association discovery( 18). c) CorrelationanalysisandknowledgediscoveryBigdatais grounded on the correlation relationship rather than reason,andfocusesonthe retiredrules,linksandvaluesof thedata. 3)crucialtechnologiesoflargedataanalysisThe coreofbigdataanalysisisbigdatatechnologies,whichisa collectionof preciousdatafrom colorfultypesofmassive data. The crucial technology of big data analysis substantially include data accession, data access technology, structure,dataprocessing,statisticalanalysis, data mining technology, model vaticination technology, andthepresenttechnology.

4.NEW TREND OF DATA ANALYSIS MODEL AND DEVELOPMENT BASED ON BIG DATA

A. Data Analysis Model Based on Big Data

Duetothelargedataanalysisandtraditionaldataanalysis hasthedifferenceintheanalysisoftheobject,foundation, Patternsandanalysisoftheresultsandotheraspects.thus, in the period of big data needs tore-build a large data analysis model. Big data analysis model includes large number of accession and collection, processing and processing,dispersionandsharingofanalysis,serviceand application, and so on. Data sources stem from mortal conditioning, computer, network and the physical world leavesthetrack.Atpresent,theselargedata substantially through hunt machine and the data inflow machine, databasemachineormiddleware,orETImachineaccession andcollectiontoformasetoftargetdata.also,itusesthebig dataplatformtocarryontherealtimeprocessing(including the static data and the dynamic on- line data batch processing or the structure, the semi structure, thenonstructurebatchprocessing).Eventually,itshowsthevisual display,to giveservicesanduses.Dataanalysismodelwith largedataplatformgroundedbydatacorrelationanddata association mining algorithm to deal with comprehensive data,andvisualdisplay,to givesupportforthe operation anddecision- timber.AsshownFigure1

B. New Trends in the Development of Large Data Analysis

Dataanalysisistheintroductorydirectionoftheperiodof big data analysis of the development and with nonstop expansionoflargedatacapacity.Bigdataanalysisprocess and analysis technology, analysis styles and analysis modelsshowingsomenewtrends

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page501

First,largedataaccessionincludingtheaccurateselection of data sources, high quality raw data accession styles,multi-sourcedataprocessingstyles,dataformand automaticcorrectionstyles.

Alternate,largedataprocessingincludingalargerquantum of data analysis and mining styles, large data real- time processing, big data analysis and mining algorithm to ameliorate.

Third, large data visualization including image analysis, mortal computer commerce, scalability andmulti-level issues,visualizationandautomaticdataminingcombined withthevisualizationtoolforthemillions.

Fourth, big data security contains APT attacks, social networksequestrationprotection,threatadaptiveaccess control, data accession, storehouse, analysis of 3 independentprocess.

Thereareotherbigdataeffectivehigh-speedtransmission system, large data virtual machine exploration, super computerlinkstojoin,bigdatagifttraining,etc.

5. CONCLUSION

Dataanalysisisanimportantprocessofresearchorsimply discoveringinformationrelatedtoanywork.Dataderived fromtheobservation,experiment,andotherprimaryand secondarydatacollectionmethodsislargeandcannotbe takenasitis.Notalldataisrelevant,neithercanitdirectly signifyanytrends,relations,facts,andassociationswithin thedata.Tofindoutthoserequiredtrendsandrelations, thedataneedstobereconstructedintherelevantformand modified.Thisprocessiscalleddataanalysis.Dataanalysis andconclusiontakeforwardtheresearch.

6. REFERENCES

1. Huang Yihua. Deep understanding of big data: big dataprocessingandprogrammingpractice.Machinery IndustryPress,2014.

2. Li Baodong , Song Hantao. Research status and development of data mining language. computer engineeringandapplication,pp.78-81,2003.

3. Davenport series, Wu Junshen translation. Big data analysis: data driven enterprise performance optimization, process management and operation decision.MachineryIndustryPress,pp.67-68,2015

4. LuanWenpeng,YuYixin,WangBing.AMIdataanalysis method.. proceedings of the Chinese society of electricalengineering,pp.178-189,2015.

5. Zhang Xiaoyu, Zou Kai. The research progress of the bigdatainthefieldofLibraryandInformationScience inChina.libraryscienceresearch,pp.45-51,2015.

6. Agnes Wa. Subversion big data analysis: Based on Storm,HadoopandotherSparkalternativetechnology in real time applications, the electronics industry press,pp.76-80,2015.

7. Sun Qiunian, Rao yuan. Research on network data visualization technology based on association analysis].computerscience,pp.67-86,2015.

8. Party Qian Na, Luo Tianyu. Multidimensional data evolution in the field of technological innovation, frontier and characteristics . Science science and management of science and technology. pp.3440,2015.

9. Guo Chong. Based on large data analysis of online shopping customer loyalty modeling simulation . computersimulation,pp.56-67,2015.

10. YuXiaoji.Researchontheconstructionofpersonalized teachinginformation service platform based on big dataapplication.informationsciencepp.66-71,2015.

11. panfan.Bigdataconceptisnotnosolution rambling dataofthree[EB/OL].[2016-01-06].Chinainformation news network version http://www.zgxxb.com.cn/ppsd/201409020016.shtm lanyofthetenmajorenterprisesinthepracticeofbig data.InternetWeekly,pp.56-59,2014

12. Nature. Big Data [EB/OL].[2016-0106].http://www.nature.com/.

13. Wikipedia [EB/OL]. [2016-01-06]. http://zh.wikipedia.org/wiki/bigdatabigdata.

14. WuFatih,muZhijia.Ebookpackagebasedonthedata ofstudents'individualanalysismodelconstructionand realizationpath.TheChineseaudio-visualeducation, pp,62-65,2014.

15. Gao Zhipeng, Niu Kun. Analysis of big data oriented technology.JournalofBeijingUniversityofPostsand Telecommunications,pp,2-9,2015.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page502

Turn static files into dynamic content formats.

Create a flipbook