International Research Journal of Engineering and Technology (IRJET) Volume: 04 Issue: 02 | Feb -2017
e-ISSN: 2395 -0056
www.irjet.net
p-ISSN: 2395-0072
Peer-to-Peer Data Sharing and Deduplication using Genetic Algorithm Prof . J. R. Waykole**, Ms. S. P. Band*, Ms. V. D. Amritkar*, Ms. P. R. Adsul*, Ms. S. P. Agawane*. **(Associate Professor,Department of Computer Engineering, Pune University) *(UG Student, Pune University) ----------------------------------------------------------------------------***-----------------------------------------------------------------------
ABSTRACT: To form corporate network organization simply join using register their sites with the peer-to-peer(P2P) service provider, and share their information among the participating organization. It can effectively help the organization to reduce their operational costs and increase the revenues. However, the inter- organization data sharing and processing posses unique challenges to such a data management system including scalability, performance, throughput, and security , a system which delivers elastic data sharing services for corporate network applications in the cloud based on a peer-to-peer based data management platform. By integrating cloud computing, database, and P2P technologies and genetic algorithm for deduplication into one system. P2P provides an economical, flexible and scalable platform for corporate network applications and delivers data sharing services to participants based on the widely accepted pay-asyou-go business model. Keywords: Cloud computing, Deduplication, Genetic algorithm.
proposed a new system peer to peer, which is used to deliver data sharing facilities by including P2P technology[4]. To configure a corporate network, organization simply register their sites provider; launch peer to peer instances in the network and exports the data to those instances for sharing purpose[3].
1. INTRODUCTION Different companies which have common interest for sharing data are always connected to corporate network[1]. The era of cloud computing technology provides various services to the human which is need. Cloud computing provides a platform for other advanced technology like big data, mobile computing to inculate its service and provides QOS to the customers[1]. The cloud has grown to a vast extend over the period of years. All the services that are provided to the customer are done using cloud as their backbone, it give vast amount of resources and infrastructure and consumer to act as vendors to small scale business and cloud could provide services to fully fledged organization less cost. Cloud provides space for extending the services as service provider and also it can provide infrastructure service to small scale service vendors[2].
2. LITERATURE SURVEY PeerDB: A P2P-based System for Distributed Data Sharing Peer-to-peer (P2P) technology is an emerging paradigm that is now viewed as a potential technology that could distributed architectures (e.g., the Internet). In a P2P distributed system, a large number of nodes(e.g., PCs connected to the Internet) can potentially be pooled together to share their resources, information and services. These nodes, which can both consume as well as provide data and/or services, may join and leave the P2P network at any time, resulting in a truly dynamic and adhoc environment. The distributed nature of such a design provides exciting opportunities for new killer applications to be developed[4].
Deduplication is key operation in integrating data from heterogeneous sources. The main challenge in this task is designing a function that can be resolve when a pair of records refers to same entity inspite of various data inconsistencies. Deduplication reduce amount of storing data by eliminating redundant copy of data. Problems in sharing and processing data in corporate network and
Š 2017, IRJET
|
Impact Factor value: 5.181
Detection of Duplicate Record using Genetic Algorithm: Genetic algorithms are ideal for these types of problems where the search space is large and the number of feasible solutions is small. To apply a genetic algorithm to a scheduling problem we must first represent it as a
|
ISO 9001:2008 Certified Journal
|
Page 301