International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395 -0056
Volume: 04 Issue: 05 | May -2017
p-ISSN: 2395-0072
www.irjet.net
Elimination of redundant Files using Feature Selection Algorithm Ch Sundeep1, B Vamshi 2, P Sampath3, V Ethirajulu , B. Tech, M. Tech4 1Student,
Dept. of Computer Science Engineering SRM University, Tamilnadu, India Dept. of Computer Science Engineering SRM University, Tamilnadu, India 3Student, Dept. of Computer Science Engineering SRM University, Tamilnadu, India 4Assistant Professor, Dept. of Computer Science Engineering SRM University, Tamilnadu, India 2Student,
------------------------------------------------------------***----------------------------------------------------------------
Abstract – Demonstrating late advances in the machine learning systems to best in class discrete decision models, we build up an way to deal with oversee reason the captivating and complex fundamental activity system of a manager (DM), which is portrayed by the DM's needs and attitudinal character, near to the properties investment, to give a couple of delineations. This work presents a different method with regards to any learning of proximity relations.
Fig – 1: ETL Diagram
Key Words: Data Mining
2. METHODOLOGY
1. INTRODUCTION
In this work, we propose a simple yet robust mechanism to eliminate redundant files which are uploaded to the Database every day. We use Feature Selection algorithm to sort the files based on whether or not they are redundant. The algorithm also reduces the Dimensionality of Data to a manageable level. The most important step here is the Deduplication process which does the actual work of eliminating the redundant files.
The process of data mining is to assess data from several instances and summarize it into relevant information in order for us to understand it completely. The techniques of data mining are the result of incessant research methods. This idea started when workable data was actually stored on computers, continued with significant advancements in accessing data, and more recently, generating technologies which permit users to steer through their data in actual time scenarios. It consists of five major parts:
ETL (Extract, transform, and load) onto the database system with your data.
Store and handle the data in a multidimensional environment.
Provide complete access to data to authenticated persons.
Examine the data by any software application.
Express the data in a widely used form that is easily understandable.
The process involves several steps such as: Creation of a File, uploading the File, Deduplication, Storing the File, and Removing the File as shown in Fig – 2.
Fig – 2: Block Diagram © 2017, IRJET
|
Impact Factor value: 5.181
|
ISO 9001:2008 Certified Journal
| Page 263