International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 10 Issue: 05 | May 2023
p-ISSN: 2395-0072
www.irjet.net
QUICK GLANCE: UNSUPERVISED EXTRACTIVE SUMMARIZATION MODEL L.Sumathi 1, A.Selvapriya 2 1 Assistant Professor, Government College of Technology, TamilNadu, India, 2PG Scholar, Government College of Technology, TamilNadu, India.
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - It is an era of fast movement, where people
data which can reduce the time that being spent on reading an article. So that many different information can be acknowledged on the time that being spent on reading a single article.
wants to know the essence of any document (technical or nontechnical) at the first glance in a fast manner. This motivates for the proposed GUI based unsupervised extractive summarization model Quick Glance, which summarizes the user input. The proposed approach uses BERT based pretrained sentence embedding vectors and attention model to derive attention score for each sentence present in the user input. Next, Principal component analysis is applied on the attention scores to automatically to evaluate the importance of each sentence and to determine the number of sentences to be extracted Experimental results show that the proposed model is user friendly and exhibits acceptable performance and is highly recommendable for non – critical application like review, news glance and so on.
2. LITERATURE SURVEY McDonald et.al [2] proposed the first ILP method for extractive summarization. It generates summaries by maximizing relevance (i.e., importance) of the selected sentences and minimizing their redundancy (i.e., similarity).McDonald et.al [2] represented each sentence as a bag-of-words vector with TF-IDF values. The importance scores are computed by using the positional information of the sentences and the similarity between each sentence vector and the document vector. The cosine similarity is used to compute the similarity between sentence vectors.
Key Words: Text Summarizer, BERT, PCA ,Extractive Summarization, Linear Programming
Berg-Kirkpatricket.al[3]constructed an ILP summarization model based on the notion of concept, which is a set of bi-grams. The distinctive characteristic of this model is that it extracts, and compresses sentences simultaneously. The model not only selects bi-grams with high importance but also chooses whether to cut (delete) individual subtrees from each sentence's parsing tree. The objective function of this model is the following:
1. INTRODUCTION In the modern era, where a vast amount of information is accessible online, it is crucial to offer an enhanced system for swiftly and effectively extracting the information. It is quite challenging for humans to manually extract the summary from a lengthy written document. Finding appropriate documents from the many documents accessible and learning essential information from them presents a challenge. Automatic text summary is crucial for resolving the aforementioned two issues. The technique of extracting the most significant information from a document or group of related texts and condensing it into a concise version while maintaining its overall meanings is known as text summarizing. a briefer version of the information in the original text. Automatic text summary aims to deliver the source text in a concise, semantically-rich form. The main benefit of adopting a summary is that it shortens the reading process.
where bi and ciare binary variables that indicate the selection of the ithbi-gram as a summary and its deletion from the parsing tree. wiand uiindicate the weights of bigrams and possible subtree cuts, respectively. Additionally, the model has a constraint of maximum allowed summary length, which is determined by the user. The weights are estimated by soft margin support vector machine optimization with bi-gram recall loss function. Therefore, the model is trained in a supervised manner, which requires gold-standard summaries.
Choosing to read an article mainly depends on the size of it and the time to be spent on reading it. If the article contains less critical information and contains large amount of textual data, people tend to skip it due to its less important information and large amount of time required to consume it. Here, the articles with less critical information contain repetitive contents, which could be shrunken. But it requires deep understanding of the semantics present in the document in order to extract the most informative piece of
© 2023, IRJET
|
Impact Factor value: 8.226
Galanis [4] also presented a supervised extractive summarization model that extracts sentences and concepts by maximizing sentence importance and diversity (i.e., minimizing redundancy). To represent sentences in a structured form, they leveraged various features, such as sentence position, named entities, word overlap, content word frequency, and document frequency. The model has a
|
ISO 9001:2008 Certified Journal
|
Page 464