Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clustering

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395 -0056

Volume: 04 Issue: 01 | Jan -2017

p-ISSN: 2395-0072

www.irjet.net

Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clustering Sushant Y. Kamble , R. P. Mirajkar 1Computer

Science and Engineering Dept., Bharati Vidyapeeth’s College of Engineering, Kolhapur (District), Maharashtra-416012, INDIA sushantykamble@gmail.com 2Asst.Prof. at Computer Science and Engineering Dept., Bharati Vidyapeeth’s College of Engineering, Kolhapur (District), Maharashtra-416012, INDIA rahulmirajkar982@gmail.com ---------------------------------------------------------------------***--------------------------------------------------------------------with other documents. Therefore maintaining the relationship between documents is important to fully express a document.

Abstract - Cloud computing provides the facility to store and

manage data remotely. The volume of information is increasing per day. The owners choose to store the sensitive data on the cloud storage. To protect the data from unauthorized accesses, the data must be uploaded in encrypted form. Due to large amount of information is stored on the cloud storage; the association between the documents is hiding when the documents are encrypted. It is necessary to design a search technique which gives the results on the basis of the similarity values of the encrypted documents. In this paper a cosine similarity clustering method is proposed to make the clusters of similar documents based on the cosine values of the document vectors. We also proposed a MRSE-CSI model used to search the documents which are in encrypted form. The proposed search technique only finds the cluster of documents with the highest similarity value instead of searching on the whole dataset. Processing the dataset on two algorithms shows that the time needed to form the clusters in the proposed method is less. When the documents in the dataset increases, the time needed to form clusters also increases. The result of the search shows that increasing the documents also increases the search time of the proposed method. Keywords: Cloud computing, multi-Keyword search, cosine similarity clustering, encrypted data

The results of search returned to the users may contain damaged information due to hardware failure or storage corruption. Thus a mechanism should be given to users to check the accuracy of the search results. The proposed architecture of search technique is based on the cosine similarity clustering which maintain the association between plain text and encrypted text to improve the efficiency of search.

2. LITERATURE REVIEW Chi Chen and Xiaojie Zhu [7] used a hierarchical clustering method to maintain the close relationship between plain documents and encrypted documents to increase search efficiency within a big data environment. They also used a coordinate matching technique [8] to measure the relevance score between query and document. They did a model for the efficient multi-keyword ranked search and maintain the privacy of documents, rank security and relevance between retrieved documents.

1. INTRODUCTION

Jiadi Yu and Peng Lu [9] focused on the problems of the cipher text search using Searchable Symmetric Encryption (SSE) [10], [11]. This SSE technique helps data users to retrieve the documents over the encrypted documents. In Two Round Searchable Encryption (TRSE), they used the similarity relevance concept to solve the privacy issues in searchable encryption. They also showed server side ranking according to order preserving encryption (OPE).

Cloud computing becomes popular as it provides huge storage space and high quality services. The large amount of data is created per day. It is a difficult task for the owner of the data to store and manage this large amount of data. To overcome this difficulty, the data owners can store their data on the cloud server to use the on demand applications and services from shared resources [1]. The cloud server providers agreed that their cloud service is armed with strong security constraints though security and privacy are major hindrances which avoid the use of cloud computing services [2]. To protect the sensitive data on the cloud server from unauthorized users, the data owners may encrypt the documents and uploads to cloud server [3]. In the earlier various strong cryptography methods were used to design the search techniques on the cipher text [4], [5], [6]. These techniques needs many operations and require large amount of time. So these techniques are not suitable for big data where information volume is huge. The property of a document depends on its association

|

Impact Factor value: 5.181

N. Cao, C. Wang and M. Li [12] used “inner product similarity” concept which can find the similarity measure of the information and the keywords of search. Ruksana Akter, Yoojin Chung [13] defined an evolutionary approach based on cosine similarity clustering. A document vector is used to create the index of every document. The cosine values between the document vectors are calculated. Clusters of the most relevant documents are formed on the basis of the cosine values. Another good feature of their work is that they do not require

|

ISO 9001:2008 Certified Journal

|

Page 926

Turn static files into dynamic content formats.

Create a flipbook