Presentation Summarizer: A Full-Fledged NLP Service

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 09 Issue: 07 | July 2022

p-ISSN: 2395-0072

www.irjet.net

Presentation Summarizer: A Full-Fledged NLP Service Ayush Bhosle1, Mohd.Raza Deshpande2, Manas Bokilwar3, AnasMustafa Dhakwala4, Prof. Rahul Sonkamble5 Dept. of Computer Science & Engineering, MIT School of Engineering, MIT Art, Design and Technology University, Pune - 412201, India ---------------------------------------------------------------------***--------------------------------------------------------------------models and dialects, including English, Indian English, Abstract - With the latest advancements in Computational

German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, and Ukrainian. Kaldi models can have base models (smaller in size) and large models (large size), yet they offer continuous huge vocabulary transcription, lowlatency response with streaming API, and changeable vocabulary with speaker identification support.

Linguistics, complex Natural Language Processing tasks such as Text Summarization, Language Translation, Text Classification, Question Answering, Grammar Error Correction, etc. have become very feasible. Automatic text summarization on various types of transcripts has proved to be a useful way that best describes the content. However, the traditional methods rely on simpler extractive summarizationbased technique. In recent years, Transformers have proved to achieve groundbreaking results, especially for Abstractive summarization task typically known to be involuted.

VOSK model achieves a Word-Error-Rate of ~13 for Indian English.

Another application where Computational Linguistics has enhanced significantly is Automatic Speech Recognition (ASR), Allowing computers to understand human speech.

Key Words: Sequence to Sequence, Speech Recognition, Grammar Correction, Natural Language Processing.

1. INTRODUCTION Our Web Service comprises three main components i.e the ASR Module, followed by a Grammar Correction Module and Abstractive Summarizer Module. Users can start transcription (ASR) where speech will continuously be transcribed, and output is displayed in real-time. The Audio Input Stream can be any monologue speech such as a lecture, speech, or conversation.

Fig1: Transcription in Real-Time

1.2 Grammar Error Correction (GEC) Present Speech Recognition models are only trained to identify spoken terms; hence punctuation marks and prose of the sentence will not be proper. This results in only long words that make reading interpretability difficult.

Once the speech is finished, this transcription is processed by a Grammar Error Correction model since the Kaldi model can only detect words and not the semantic meaning containing punctuations. Hence, reading interpretability is an issue tackled by the GEC model.

To tackle this, we incorporate a Grammar Error Correction (GEC) model. This GEC model is a t5-small model trained originally on Wav2Vec2 results mapping incorrect sequence with encoder to a grammatically correct sequence by a decoder.

In addition, a Named Entity Recognition model will identify entity keywords and highlight the specific part in the text field for visual reading which can be exported by the user to view results later.

The Grammar Error Correction approach takes the entire transcript at once and processes the refined text. Since the transcript can be incomplete, it wouldn’t contain the proper meaning hence model is applied after the speech utterance has finished, where corrections are made wherever necessary.

Lastly, this corrected transcript is fed to the summarization model to provide a summary.

1.1 Speech Recognition We have used the VOSK model based on Kaldi ASR. Kaldi ASR is an offline open-source speech recognition toolkit that is utilized for speech-to-text task. It supports 18 language

© 2022, IRJET

|

Impact Factor value: 7.529

|

ISO 9001:2008 Certified Journal

|

Page 1171


Turn static files into dynamic content formats.

Create a flipbook