International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 09 Issue: 05 | May 2022
p-ISSN: 2395-0072
www.irjet.net
Comparative Study on News Summarization using various Transformer Based Models Tejas Karkera1, Nileema Pathak2 1Student
, Dept. of Information Technology, Atharva College of Engineering Dept. of Information Technology , Atharva College of Engineering ---------------------------------------------------------------------***--------------------------------------------------------------------2Professor
Abstract - As the world is moving at a very fast pace,
Some of them are as follows :-
there is a lot of information which is amassed on a daily basis with events that happen around the world and it does happen at times when it is very time consuming to keep abreast with all this information. Hence a summarized way of interpreting things around is paramount which creates paraphrased shorter version of the long texts. News Summarization is one such methodology which tries to encompass the majority of information in news articles either by extracting the major points ( Extractive summarization ) [7] or by rephrasing the major points ( Abstractive summarization ) [8] . Building on this thought, this paper presents a comparative study of various transformer based models like T5 , BART , Pegasus on three major news dataset CNN/Daily , Multinews and XSum and contemplating their performance individually .
1. It is a very time consuming process to manually write each summary. 2. The number of summaries generated are very less in a day. 3. It is not scalable and labor utilized here can be used elsewhere. Hence it is very important to build an efficient automated news summarizer which can handle the shortcomings of the manual news summarization. Also it is very important to ruminate on the performance of these automated models such that they are able to recognize the context precisely and are able to generate the best possible summary.
2. LITERATURE SURVEY
Key Words: Summarization , NLP , Transformer , T5 , BART , PEGASUS
Language models have undergone a lot of transformation and upgradation in the past few years. With the advent of transformer based models a lot of language model based tasks have been substantiated with better performance. News summarization too has shown really good results when it comes to usage of transformer models as shown in the paper “Automated News Summarization Using Transformers” [9]. This paper showcases the usage of transformer models like BART, T5 and Pegasus for summarizing news articles from BBC news data and comparing them to human generated summaries. This BBC news dataset contained around 2225 news articles and performance for all the models were judged on Rouge score values which was around 0.40 for all the three models. Another work which proposed a transformer based pipeline for this task was in the paper “News Summarization Application Based on Deep NLP Transformers for SARS-CoV-2” [10] where the dataset used was the Covid-19 Public Media Dataset which contained news pertaining to Covid and its worldwide effects.They had done a comparative study on five models which were namely BERT, XLNet, GPT-2, BART and T5 and had contemplated their results too on Rouge score values. Their results showed that Bert, an auto-encoder based model outperformed all the other models.
1. INTRODUCTION It can be very well contemplated that everyday there are hundreds of articles which get published either in the newspaper or online on websites. As the world around is facing numerous changes it is becoming extremely difficult for people around to be aware of almost everything which happens around them. Hence people often rely on headlines to atleast make themselves aware of the worldly affairs. Still somewhere headlines will not always give the user everything which is required from the news and hence there comes the demand to have a concise version of the news articles inkling for news summarization .News summarization as the name suggests is an approach to get the most relevant information from the news article and still not accumulating a lot of information.Often in olden times this was considered a tedious job as the generation of summaries had to be done manually. There are a lot of drawbacks when it comes to manual writing of summaries.
© 2022, IRJET
|
Impact Factor value: 7.529
|
ISO 9001:2008 Certified Journal
|
Page 1225