Analysis on Deduplication Techniques for Storage of Data in Cloud by IRJET Journal

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 09 Issue: 05 | May 2022

p-ISSN: 2395-0072

www.irjet.net

Analysis on Deduplication Techniques for Storage of Data in Cloud Ujjwal Rajput1, Sanket Shinde2, Pratap Thakur3, Gaurav Patil4, Prof. Poonam Deokar5 1,2,3,4Student, Dept. of Information Technology, Dr. D. Y. Patil Institute of Technology, Pimpri, Pune, Maharashtra, India 5Professor, Dept. of Information Technology, Dr. D. Y.

Patil Institute of Technology, Pimpri, Pune, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - Cloud storage service providers address the need for organizations and individuals by allowing them to store,

transfer and back up their ever-increasing amount of data at low cost and provide access to other cloud services. In order to provide efficient data storage, cloud service providers use a widely used drag-and-drop method as it allows for single-data storage and removes duplicate copies, thereby reducing overhead storage and saving upload bandwidth. A customer who uploads his or her data to the cloud is very concerned about the security, integrity, privacy and confidentiality of his or her data. The extraction method is used to manage the duplication of data in the cloud. Although there are some defrosting methods used to avoid data overload, they still lack efficiency. The main purpose of this paper is to obtain sufficient information and a good idea of the extraction strategies by examining existing methods and this work can assist the researcher and the work in their future research in developing effective cloud storage management strategies. Key Words: Big data, Cloud computing, Cloud storage, Data deduplication, Data management.

1. INTRODUCTION With the continuous development of the internet, growth, and usage of the internet of things and social networking environments, data size is also increasing exponentially which leads to the requirement of a huge amount of storage space. As per International Data Corporation report, Global Data sphere is the combination of data generated, captured or replicated through the digital content from all over the world. IDC predicts that the Global Data sphere will grow from 33 Zettabytes (ZB) (1 ZB = 1021 Bytes or 270 Bytes) in 2018 to 175 ZB by 2025. Cloud computing offers many resources or services, especially the huge volume of storage to back up the big data [2]. Cloud computing is an optimal paradigm for providing storage, computing, and managing big data of the internet of things (IoT) or organization [3] [4]. Cloud service providers provide many services with the features of elasticity, scalability, and pay for usage [5]. To maintain data privacy and security, all data owners store only encrypted data on clouds. Many users are storing their own data, and it has the chance of data duplication in clouds such that different users may send the same data but with different encrypted technology. Even CSP provides a large amount of storage; data redundancy requires extra storage space and higher bandwidth, and also it is a tedious task for service providers to manage a large amount of storage space and duplicate copies. Deduplication is an optimal technique to manage data duplication [6]. It compresses the data by eliminating duplicate copies of data. Deduplication reduces the storage space up to 90 to 95 percent, bandwidth rate, and provides good storage management [7]. Most cloud service providers implement a deduplication mechanism to achieve the efficient storage management of big data [9] [10]. Data deduplication approaches overcome the issues in the storage management of increasing big data in clouds.

Despite of various choices available for data storage including cloud data storage, one of the major challenges faced by users & organizations is about data duplication. It has been observed that for a single user or single transaction, there is a lot of duplication of data resulting due to usage of different sources of information. Data deduplication is one of the effective techniques for data reduction. This technique ensures storage one single copy of each data. This is possible by comparison of data fingerprints with the existing stored data and thus identifies duplicate data. The replication factor is the minimum number of copies of the same data. The Cloud storage system maintains a replication factor for all data. If any data is greater than the replication factor, then the deduplication technique eliminates that data to reduce the storage requirement, cost, and bandwidth rate. All existing deduplication techniques are still lack of efficiency because of the demerits of data comparing and matching algorithms and security issues. All data are stored in the memory location, and each location is identified by pointer or address. Thus only one copy is maintained, and storing a pointer in the duplicate data place helps to free those locations in the storage space. The main goal of this article is to study the efficiency and inefficiency of existing deduplication techniques. Hence the deduplication process is mandatory for the cloud service provider to reduce the huge amount of storage space requirement, cost, and higher network transfer rate. Another intention of this paper is to bring together researchers and practitioners to have a great idea in various deduplication approaches. This paper summarizes the merits, demerits, and © 2022, IRJET

Impact Factor value: 7.529

ISO 9001:2008 Certified Journal

Page 296