International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017
p-ISSN: 2395-0072
www.irjet.net
SVM CLASSIFIER ALGORITHM FOR DATA STREAM MINING USING HIVE AND R Mrs.Pranamita Nanda1,B.Sandhiya2,R.Sandhiya3,A.S.Vanaja4 1Assistant
Professor,2,3,4Students
Department of Computer Science and Engineering Velammal Institute Of Technology, Ponneri, Tiruvallur. ---------------------------------------------------------------------***--------------------------------------------------------------------Abstract: Big data is a challenging functionality for analyzing query. The keyword inpath or externalpath is used for the large volume of data in the IT deployment in a different importing data from internal device and external device. dimension. To make that analysis process in more efficient Then the data is extracted from the database using test data manner we use Hive tool for query processing and providing and trained data. The trained data is already existing data’s statistical report using RStudio. The processing load in data which is just a predicted one. With the trained data the stream mining has been reduced by the technique know as testing is done for analyzing. Both the test data and trained Feature Selection. However, when it comes to mining over high data are used for classification algorithm known as Support dimensional data the search space from which an optimal Vector Machine. The SVM classifier is the classification feature subset is derived grows exponentially in size, leading to algorithm. For a dataset consisting of feature s set and label an intractable demand in computation. To reduce the set an classifier build a model to predict classes. The complexity of using accelerated particle swarm parameter used for this process is accuracy. The SVM optimization.(APSO), we connect the data by using Hadoop classifier evaluate the predicted data and provides the technology. Hadoop technology is easier to store and retrieve accuracy. Thus the efficient accuracy is taken into the data in a big data environment. With the dataset the consideration. data’s are analysed and the statistical report is produced using SVM algorithm in R software where R language is used. This REXISTING SYSTEM: software environment is used to provide a statisical computing and graphics. This statistical report compares the accuracy The light weight feature selection technique known as between the linear and non linear grid where the higher swarm search is used for classfing the dataset. There are accuracy dataset is efficient. The final graph provides many feature selection technique like CCV, Improved PSO combination of the linear and nonlinear with respect to cost etc.,The amount of data feed is potentially infinite and the and sigma which is the userdefined value. PSO with SVM data delivery is continuous like a high speed train of algorithm increases the performance of analysing the data. information.The processing hence is expected to be real time and instantly responsive. The retrieval of data from large INTRODUCTION: volume of data and maintaining them is difficult and the The process of handling large volume of data, storing and accuracy of the data is little lower which is been overcomed retrieval of data is challenging factor. Data stream mining is using best classifier algorithm. The complication on top of the process of extracting knowledge structures from quantitatively computing the non-linear relations between continuous, rapid data records. A data stream is an ordered the feature value and target classes is the temporal nature of sequence of instances that in many application of data such data stream, One must crunch on the data stream long stream mining can be read only once or a small number of enough for accurately modeling seasonal cycles or regular times using limited computing and storage capabilities. Thus pattern if they ever exist. There are no straight-forward for retrieval of data we use data stream mining technique. To relations that can easily map the attribute data into a specific make the retrieval of data in efficient manner we use class without a long-term observation. This impacts hadoop-hive tool for query processing. It takes less time to considerately on the data mining algorithm design that process. Process such as converting the unstructured data should be capable of just reading and forgetting the data into structured data by creating schema. Then in hadoop stream. environment there is a data storage place known as hadoop distributed file system where our database is imported from the external device or internal device such as server or system that we are working in to the HDFS using the hive
© 2017, IRJET
|
Impact Factor value: 5.181
|
ISO 9001:2008 Certified Journal
|
Page 1341