International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072
![]()
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072
Rahul thakur1, kewal krishan2
1Rahul thakur rahulthakurhm@gmail.com 2kewal krishan kewal.krishan@lpu.co.in ***
Abstract - Data integration is a challenge that involves combining different data from multiple sources andproviding a viewer with a uniform or consistent picture of the data. In the modern world, combining multiple heterogeneous sources into a single truth is one of the biggest challenges. Everyone wants to look at and feel the data in one way. Different methods are there, but there are still some challenges. We have compared and analyzed so that the appropriate methods can be used for the system. This document presents an overview of different data integration techniques appropriate for the system and their challenges. This article pays special attention to comparative analysis of different techniques of data integration. A special spotlight on the following aspects: data integration techniques to deal with unreliable data.
Key Words: Big Data, Big Data Integration, Extract Transform Load.
Integrationofdataisacollectionoftechniquesforretrieving andintegratingdatafrommultipledatasourcestoproduce meaningfulinformation.Nowadays,alargeamountofdatais gatheredfromarangeofheterogeneousdatasourcesinreal time,resultingindataofvaryingquality.Thisisreferredto as "Big Data." Big data integration is highly challenging, particularly when conventional data integration solutions havefailed.Bigdataintegrationvariesfromtraditionaldata integration in so many ways, including volume, velocity, variety,andveracity,whicharetheprimaryfeaturesofbig data.
1.1 Different types of big data integration processes.
Bigdataintegrationinvolvescombiningdatafromvarious sourcesanddisplayingitinasingleinterface(BDI).Building an enterprise data warehouse, transferring data between databases,andkeepingdatasynchronisedacrossplatforms all require BDI. To present an overall view, the BDI integratesdatafrombothinternalandexternalsources.
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072
ETLisamethodofintegratingdata.Extractinginformation fromonesystemandloadingitintoanotherfterithasbeen transformed isknownasETL.
Thismethodofdataintegrationisusedtoconveycompiled datasetsondemand.EIIallowsprogrammersandenterprise clients to combine data from multiple sources into one database.
EDR stands for enterprise data replication (EDR).
Data migration from one storage platform to another is a part of the EDR process. In its most basic form, while maintaining the data's structure, EDR moves it from one repositorytoanother.
Dataconsolidationistheprocedureofcombiningdatafrom severalsourcesintoasinglestoragearea.Dataconsolidation usesnetworkingserversasdatasourcestoreducethefigure of locations for data storage required by an organisation. Data consolidation uses ETL (Extract, Transform, Load) to gather data from main databases. Data is collected from various sources, cleaned, normalised, and stored for processing.Inmostcases,ETLconsolidationsareprocessedin batchesof24hoursorless.A"holdingarea"isusedforbatch consolidationpriortodeliverytoanintegrateddatastorage facility.Largedatasetsshouldavoidthisprocessingduetoits highlatency.Dataconsolidation,unlikeotherdataintegration solutions,reliesondelay.Theamountoftimeitrequiresfor datatomovefromonelocationtoanotherisreferredtoas datalatency.
An integration technique known as "data propagation" involves copying data from one source to another. Local access databases are provided data from source data warehousesthroughpropagationrules.
Datapropagationrulesthreecategories:
FTPisusedtoretrieveandtransmitdatafromasource(file transfer protocol). The extracted data may need to be re formatted to fit into the destination data storage. Bulk extractionisperfectforsmallsourcefilesorlargechanges.
This method does not distinguish between modified and unaffectedentries.
Unlikebulkextract,filecomparisonproducesanincremental changerecord.Smallfileswithfewchangesmaybenefitfrom thisstrategytotrackchangesovertime.
Changedataandcapture(CDC)isareal timeapproachfor identifyingandreplicatingchangesinthesourcestore.CDC propagationkeepsstoredatabasesupdatedquickly(seconds orminutes).Minorchangesornewdatacanbediscovered quicklywithouthavingtoupdatetheentiredatawarehouse. Trigger basedandlog basedCDCareexamples.
Considerdata federationas"linkingup" or"becomingone unit."It'satermfor"middleware"technologythatconnects datafromvarioussourcesandformatsintoasinglepicture. With a relational database management system (RDBMS), analystscancreatetableswithrowsandcolumnsofdata.At theendpoint,theFederationusesadatamodeltogeneratea single viewvisualisation.UsingtheRDBMS'sSQL(Structured QueryLanguage)interface,
Integrationisnow theworld’slargestItchallengewith1of 6Itdollarsgloballyspentonintegration.83%ofexecutives have admitted to data silos being present in their organizations,and97%sayitharmsoveralldecisionmaking and 57% of marketers recognize integrating disparate technologiesasthemostsignificantbarriertosuccess.Data integrationprovidesthepotentialtoproducemoretimely, more disaggregated statistics at higher frequencies than traditionalapproachesalone.Dataintegrationactivitieswill therefore only increase. With ever more data sources becomingavailableandincreasedcapacitiesofITanddata infrastructure,theneedforintegratingdifferentsourceswill grow.
Dataintegrationistheprocessofcombiningdatafrommany sources.Dataintegrationmustcontendwithissuessuchas duplicated data, inconsistent data, duplicate data, old systems,etc.Manualdataintegrationcanbeaccomplished through the use of middleware and applications. You can even use uniform access or data warehousing. There are severaltoolsavailableonthemarketthatmaybeusedtodo dataintegration.
In this paper, we provided a high level summary of the constraints and problems that data integration must overcomewithcomparison.Thereisnooneanswertoany
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072
of these issues. They're all connected in some way or another.Eachdataintegrationdifficultydemandsadistinctly differentsolution,whichmustberecognizedinordertobe successful in the long run. Attempts have been made to collect as many obstacles and concerns as feasible in this documentsothatadditionalworkmaybedoneinthefuture tosolvetheseissues.
[1] Hasliza, N., Hassana, M., Ahmada, K. &Salehuddina, H. (2020). Diagnosing the Issues and Challenges in Data IntegrationImplementationinPublicSector,International Journal Advanced Science Engineering Information Technology,10(2).
[2]Zhang,Y.(2020).TheIntegrationofProfessionalEthicsof Modern Etiquette Students under the Background of Big Data,JournalofPhysics:ConferenceSeries1574.
[3] Bansal, S. K. (2014). Towards a Semantic Extract Transform Load(ETL)frameworkforBigDataIntegration, IEEEInternationalCongressonBigData,978 1 4799 5057 7/14©2014IEEE,DOI10.1109/BigData.Congress.2014.82
[4]Zheng,Y.(2015).MethodologiesforCross DomainData Fusion:AnOverview.IEEETransactionsOnBigData,1(1).
[5] Munné R. (2016). Big Data in the Public Sector. In: CavanillasJ.,CurryE.,WahlsterW.(eds)NewHorizonsfora Data Driven Economy. Springer, Cham. https://doi. org/10.1007/978 3 319 21569 3_11.
[6]Camargo Perez,J.A.,PuentesVelasquez,A.M.,&Sanchez Perilla, A. L. (2019). Integration of big data in small and medium organizations: Business intelligence and cloud computing,J.Phys.:Conf.Ser.1388012029.
[7] Stonebraker, M., & Ilyas, I. F. (2018). Data Integration: The Current Status and the Way Forward, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering.
[8] Sazontev, V., &Stupnikov, S. (2019). An Extensible Approach for Materialized Big Data Integration in DistributedComputationEnvironments,IvannikovMemorial Workshop(IVMEM),978 1 7281 4623 2/19/©2019IEEE DOI10.1109/IVMEM.2019.00011
[9]Alsghaier,H.,Akour,M.,Shehabat,I.,&Aldiabat,S.(2017). The Importance of Big Data Analytics in Business: A Case Study. American Journal of Software Engineering and Applications,6(4),111 115.
[10] Alam, J. R., Sajid, A., Talib, R., & Niaz, M. (2014). A Review on the Role of Big Data in Business. International Journal of Computer Science and Mobile Computing, 3(4), 446 453.
[11]Fikri,N.,Rida,M.,Abghour,N.,Moussaid,K.,&Omri,A.I. (2019). An adaptive and real time based architecture for financialdataintegration.JournalofBigData,6(97).
[12]Bucea Manea Tonis,R.(2018).Deductivesystemsfor Big data integration, Journal of Economic Development, EnvironmentandPeople,7(1).
[13]Chen, W., Wang,R., Wu,R.,Tang,L.,& Fan,J. (2016). Multi sourceandHeterogeneousDataIntegrationModelfor Big Data Analytics in Power DCS [Paper Presentation]. International Conference on Cyber Enabled Distributed ComputingandKnowledgeDiscovery.
[14]HussainK.,PrietoE.(2016).BigDataintheFinanceand Insurance Sectors. In: Cavanillas J., Curry E., Wahlster W. (eds) NewHorizonsfora Data DrivenEconomy.Springer, Cham.https://doi.org/10.1007/978 3 319 21569 3_12
[15] Avi V., Kamaruddin S. (2017). Big Data Analytics Enabled Smart Financial Services: Opportunities and Challenges.In:ReddyP.,SurekaA.,ChakravarthyS.,BhallaS. (eds) Big Data Analytics. BDA 2017. Lecture Notes in ComputerScience,vol10721.Springer,Cham.https://doi. org/10.1007/978 3 319 72413 3_2
[16] Nabrzyski, J., Liu, C., Vardaman, C., Gesing, S., &Budhatoki,M.(2014).AgricultureDataforAll Integrated ToolsforAgricultureDataIntegration,AnalyticsandSharing. IEEEInternationalCongressonBigData.978 1 4799 5057 7/14©2014IEEEDOI10.1109/BigData.Congress.2014.117
[17] Kim, J. K., & Tam, S. (2020). Data integration by combining big data and survey sample data for finite populationinference.arXiv:2003.12156v3.
[18] Saggi, M. K., & Jain, S. (2018). A survey towards the integration of big data analytics to big insights for valuecreation.InformationProcessing&Management,54.
[19] Ribarics, P. (2016). Big Data and its impact on agriculture.Ecocycles,2(1),33 34.
[20] Sarker, M. N., Islam, M. S., Murmu, H., &Rozario, E. (2020). Role of Big Data on Digital Farming. International JournalofScientific&TechnologyResearch,9(04).
[21] Kaur, H., & Kushwaha, A. S. (2018). A Review on IntegrationofBigDataandIoT.4thInternationalConference on Computing Sciences. 978 1 5386 8025 4/18/$31.00 ©2018IEEEDOI10.1109/ICCS.2018.00040.
[22]Huang,E.,Quiroz,A.,&Ceriani,L.(2014).Automating DataIntegrationwithHiperFuse[PaperPresentation]2014 IEEEInternationalConferenceonBigData
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072
[23]Nuaimi,E.A.,Neyadi,H.A.,Mohamed,N.,&Jaroodi,J. (2015). Applications of big data to smart cities. Journal of InternetServicesandApplications,6(25).
[24] Gomes, E., Dantas, M. A., Macedo, D. D., Rolt, C. D., Brocardo, M. L., & Foschini, L. (2016). Towards an InfrastructuretoSupportBigDataforaSmartCityProject [Paper Presentation]. 2016 IEEE 25th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), Paris, 2016, pp. 107 112,DOI:10.1109/WETICE.2016.31
[25]Alshawish,r.A.,Alfagih,S.M.,&Musbah,M.S.(2016). Big data applications in smart cities. 2016 International ConferenceonEngineering&MIS(ICE),Agadir,2016,pp.1 7,DOI:10.1109/ICEMIS.2016.7745338
[26] Ahmed, F., Samorani, M., Bellinger, C., &Zaiane, O. R. (2016). Advantage of Integration in BigData: Feature Generation in Multi Relational Databases for Imbalanced Learning,2016IEEEInternationalConferenceonBigData (BigData),978 1 4673 9005 7/16/$31.00©2016IEEE
[27] Bennani, N., Ghedira Guegan, C., Musicante, M. A., & Vargas Solar, G. (2014). SLA Guided Data Integration on Cloud Environments [Paper Presentation]. 2014 IEEE International Conference on Cloud Computing, Alaska, UnitedStates.934 935.
[28] Qi, Q ., & Tao, F. (2018). Digital Twin and Big Data TowardsSmartManufacturingandIndustry4.0:360Degree Comparison.IEEEAccess,6,3585 3593.
[29]Hufnagel,J.,&Vogel Heuser,B.(2015).Dataintegration inmanufacturingindustry:Model basedintegrationofdata distributed from ERP to PLC [Paper Presentation]. 2015 IEEE 13th International Conference on Industrial Informatics (INDIN), Cambridge, 2015, pp. 275 281, DOI: 10.1109/INDIN.2015.7281747.
[30]O’Donovan,P.,Leahy,K.,Bruton,K.,&T. J.O’Sullivan. (2015). Journal of Big Data, 2(20). DOI 10.1186/ s40537 015 0028 x
[31] Hardiman, G. (2020). An Introduction to Systems AnalyticsandIntegrationofBigOmicsData,Genes,11(245).
[32]Bhandari,S.,Lewis,P.,Craft,E.,Marvel,s.W.,Reif,D.M., & Chiu, W. A. (2020). HGBEnviroScreen: Enabling CommunityActionthroughDataIntegrationintheHouston Galveston BrazoriaRegion,IntJEnvironResPublicHealth, 17(4):1130.
[33]Dhayne,H.,Haque,R.,Kilany,R.,&Taher,Y.(2019).In Search of Big Medical Data Integration Solutions A ComprehensiveSurvey.IEEEAccess,7.
[34] Eftekhari, A., Zulkernine, F., & Martin, P. (2016). BINARY:AFrameworkforBigDataIntegrationforAd hoc Querying,2016IEEEInternationalConferenceonBigData (BigData),978 1 4673 9005 7/16/©2016IEEE
[35]Vidal,M.,&Sakor,A.(2019).SemanticDataIntegration Techniques for Transforming Big Biomedical Data into Actionable Knowledge, 2019 IEEE 32nd International SymposiumonComputer BasedMedicalSystems(CBMS).
[36] Husain, S., Kalinin, A., Truong, A., &Dinov, D. (2015). SOCR Data Dashboard: An integrated Big Data archive mashing Medicare, labour, census and econometric information.JournalofBigData,2(13).
[37]Cheng,Y.,Zhou,K.,Wang,J.,&Yan,J.(2020).BigEarth ObservationDataIntegrationinRemoteSensingBasedona DistributedSpatialFramework.RemoteSens.12,972.
[38]Wang,Z.,Wei,G.,Zhan,Y.,&Sun,Y.(2017).Bigdatain telecommunicationoperators:data,platformandpractices. JournalofCommunicationsandInformationNetworks,2(3). DOI:10.1007/s41650 017 0010 1
[39]Yayah,F.C.,Ghauth,K.I.,&Ting,C.(2017).AdoptingBig DataAnalyticsStrategyintheTelecommunicationIndustry. JournalofComputerScience&ComputationalMathematics. 7(3).DOI:10.20967/jcscm.2017.03.002
[40]Nwanga,M.E.,Onwuka,E.N.,Aibinu,A.M.,&Ubadike, O. C. (2015). Impact of Big Data Analytics to the Nigerian Mobile Phone Industry. Proceedings of the 2015 International Conference on Industrial Engineering and OperationsManagementDubai,UnitedArabEmirates(UAE), March3 5,2015.
[41]Antonio,A.C.,Luis,M.S.,Santos,M.Y.,Guilherme,A.B., & Jose, A. O. (2020). Supply chain data integration: A literature review. Journal of Industrial Information Integration19100161.
[42] Ostrowski, D., & Kim, M. (2017). Semantic Based Framework for Big Data Integration [Paper Presentation]. 2017 IEEE 11th International Conference on Semantic Computing
[43]Awwad,M.,Kulkarni,P.,Bapna,R.,&Marathe,A.(2018). Big Data Analytics in Supply Chain: A Literature Review. Proceedings of the International Conference on Industrial EngineeringandOperationsManagement,WashingtonDC, USA,September27 29.
[44] Lia, Q ., Liu, A. (2019). Big DataDriven Supply Chain Management, Procedia CIRP 81 ScienceDirect 52nd CIRP ConferenceonManufacturingSystems,1089 1094.
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072
[45] Benabdellah, A. C., Benghabrit, A., Bouhaddou, I., &Zemmouri, E. M. (2016). Big Data for Supply Chain Management: Opportunities and Challenges. International JournalofScientific&EngineeringResearch,7(11).
[46]Li,J.(2020).ResearchontheIntegrationofChineseand Russian Original Ecological Dance Elements and Modern ElementsBasedonComputerBigDataAnalysis.Journalof Physics:ConferenceSeries1578.
[47]Arputhamary,B.&Arockiam,L.(2015).DataIntegration inBigDataEnvironment.BonfringInternationalJournalof DataMining,5(1),1 5.
[48]Kadadi,A.,Agrawal,R.,Nyamful,C.,&Atiq,R.(2014). Challenges of Data Integration and Interoperability in Big Data.2014IEEEInternationalConferenceonBigData,978 1 4799 5666 1/14/$31.00©2014IEEE
[49] Ostrowski, D., Rychtyckyj, N., MacNeille, P., Kim, M. (2016). Integration of Big Data Using Semantic Web Technologies.2016IEEETenthInternationalConferenceon SemanticComputing,978 1 5090 0662 5/16©2016IEEE DOI10.1109/ICSC.2016.101
[50]Sottovia,P.,Paganelli,M.,Guerra,F.,&Vincini,M.(2019). Big Data Integration of Heterogeneous Data Sources: the ResearchAlpsCaseStudy.2019IEEEInternationalCongress on Big Data (BigData Congress), 978 1 7281 2772 9/19 ©2019IEEEDOI10.1109/BigDataCongress.2019.00027
[51]Portugal,I.,David,P.A.,&Cowan,D.(2016).Towardsa ProvenanceAware Spatial Temporal Architectural FrameworkforMassiveDataIntegrationandAnalysis,2016 IEEEInternationalConferenceonBigData(BigData).