International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 04 Issue: 07 | July -2017
p-ISSN: 2395-0072
www.irjet.net
MULTI DOCUMENT TEXT SUMMARIZATION USING BACKPROPAGATION NETWORK Ashlesha Giradkar1, S.D. Sawarkar2, Archana Gulati3 PG student, Datta Meghe College of Engineering Professor, Dept. of computer Engineering, DMCE , India 3 Principal and Professor, Dept. of computer Engineering, DMCE , India ---------------------------------------------------------------------***--------------------------------------------------------------------2. EXPERIMENT Abstract - For English language lots of research work has 1
2
been carried out in the field of text summarization but not for Hindi language. In the proposed system idea is to summarize multiple Hindi documents. This summarization is based on features extracted from documents such as sentence length, sentence position, sentence similarity, subject similarity etc. Thus, the proposed system can be used for Hindi text summarization of multiple documents based on backpropagation network.
We use Hindi news articles, URLs as an input to summarization system. The text portion of Hindi news article fetched from URL is saved in a text document that acts as input documents to the summarizer. We used more than one news documents on same topic. Hindi news documents are collected from different news channels like Aajtak, dainikbhaskar etc.
Key Words: Text summarization, Stemming, Hindi text
3. APPROACH OF SUMMARIZATION
summarization, Backpropagation Network, Sentence feature etc.
1. INTRODUCTION When we are talking about Text summarization, first we must be aware of what is a summary. Summary is a text that is produced from one or more texts documents that covers important information in the original text and it is shorter than original text document. The main aim of automatic text summarization is transform the source text into a shorter version which will further reduce reading time of user. Basically text Summarization methods can be classified into extractive and abstractive summarization. An extractive summarization method consists of selecting important sentences from the original document. An Abstractive summarization is an understanding the most ideas and concepts in a document and then rewriting it in own words. With the exponential growth in the quantity and complexity of information sources on the internet, it has become increasingly important to provide improved mechanisms to user to find exact information from available documents. The total system is work into three phases: pre-processing the text document, sentence scoring and summarization generation. This summarization is based on features extracted from documents such as sentence length, sentence position, sentence similarity etc. Thus, this system partially implemented for Hindi text summarization of multiple documents based on Backpropagation network.
The proposed methods find out most relevant sentences from multiple Hindi documents by using statistical and linguistic approach. This summarization process has three major steps pre-processing, extraction of feature and implementation of backpropagation network.
3.1. Preprocessing Preprocessing is nothing but preparing source document for analysis [8]. This preparation is basically going to perform in four steps sentence segmentation, sentence tokenization, stop word removal and stemming. 3.1.1. Segmentation Given document is divided into sentences in segmentation step. Ex. 1. इ औ 2. इ
|
Impact Factor value: 5.181
|
2016
इ
3.1.2.Tokenization In tokenization splitting of sentences into words takes place. Ex. , , , , , , , ,इ , , , , ,
© 2017, IRJET
6
,
,
,औ ,
,
,
,
,
ISO 9001:2008 Certified Journal
|
Page 3512
,
,
,