International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017
p-ISSN: 2395-0072
www.irjet.net
Touch with Industry Priyanka Thakur 1, Karuna Sharma 2, Priyanka Chaudhari 3, Sayali Borade 4 1,2,3,4
Student, Computer Department, MET’s BKC IOE Nashik, Maharashtra, India
---------------------------------------------------------------------***--------------------------------------------------------------------1.2 Motivation
Abstract - Web content mining is the extraction and
integration of relevant data, information and knowledge from different forms of web page contents. Extracting data from web pages is challenging task as web data is mainly in semistructured or unstructured format, while web content mining deals primarily with structured data. We have proposed a system with title "Touch with Industry" that mainly focuses on gathering information from various trusted sites on the basis of company names and fields selected by the users. The provided fields not limited to Company CEO, Address, Contact No, Year of Establishment, Employee Strength, Images of Company, and People on LinkedIn.
Basically the motivation came with the survey among people who faced problem while searching company details. Users need to crawl through a number of pages to get desired information about a company. To overcome this problem the proposed system will be develop a desktop application through which users can easily search about a company in less amount of time and get relevant details.
2. LITERATURE SURVEY The exhaustive literature survey consisting of the conceptual base for this project is briefly outlined here.
The proposed system uses web crawling, which makes it easier for the application to return the most relevant results to users. The proposed system is advantageous to people like consultants, HR representatives, Employed and Unemployed peoples who need company overview in short period of time, which reduces the searching time and provide relevant information from different web pages in one go.
1. Web Data Extraction: Extracting structured data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. This motivates us to seek a different way for deep Web data extraction to overcome the limitations of previous works by utilizing some interesting common visual features on the deep Web pages. In this paper, a novel vision-based approach that is Web- pageprogramming- language-independent is proposed. This approach primarily utilizes the visual features on the deep Web pages to implement deep Web data extraction, including data record extraction and data item extraction.[1]
Key Words: web mining, web crawling, pattern matchin, filtering.
1.INTRODUCTION
2. Data Restruction: It synthesizes ideas from query languages for the Web, for semi structured data and for website restructuring and makes several contributions, most notably, the idea of querying documents by manipulating their abstract syntax trees and the support of the concept of web as a data type. [2]
The World Wide Web has large online databases of the companies. This database needs to be accessed by millions of people. According to recent survey, it has been found that some websites provide fake information to people. Also, it has been observed that people needs to search a lot for relevant information which consumes large amount of time
3. Hoovers: It is an official website to search the company's profile. It searches the world’s largest database of company and industry information. It provides less detail company information to non-subscribers. It maintains a database of about 85 million companies and 100 million peoples.[3]
1.1 Project Idea The idea behind implementing project is to provide solution to problem of people which they face while they are searching for company information. This system is mainly proposed to promote efficient communication between job seekers and consultants which will lead to ideal and effective system.
© 2017, IRJET
|
Impact Factor value: 5.181
4. Web Information Extraction Systems: The Internet presents a huge amount of useful information which is usually formatted for its users, which makes it difficult to extract relevant data from various sources. This paper surveys the major Web data extraction approaches and compares them in three dimensions: the task domain, the automation degree, and the techniques used. [4]
|
ISO 9001:2008 Certified Journal
| Page 2631