Issuu

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 11 Issue: 05 | May 2024

www.irjet.net

p-ISSN: 2395-0072

SnapSummaries Sakshi Bohra1, Sneh Sinha2, Riddhi Bora3, Savitri Chougule4 1 Student, School of Engineering, MIT ADT University, Pune, Maharashtra, India

2 Student, School of Engineering, MIT ADT University, Pune, Maharashtra, India 3 Student, School of Engineering, MIT ADT University, Pune, Maharashtra, India 4 Professor, School of Engineering, MIT ADT University, Pune, Maharashtra, India

---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - Efficient multi-modal summarization (MMS) is crucial in the era of huge digital data due to the increase in multimedia content. In order to automatically compress several content forms—text, images, audio, and videos—related to particular issues, this paper offers an extractive multi-modal summarizing approach. Closing semantic gaps between modalities is the main goal. This approach tries to improve salience, non-redundancy, readability, and coverage in producing textual summaries by selecting transcribing audio, employing neural networks for simultaneous text-image representation, and optimizing submodular functions. The review talks about the progress made in Automatic Text Summarization (ATS), with a focus on Extractive and Abstractive techniques and the function of the Text Rank algorithm. It also investigates video summarization using a viewer-centered computational attention model, providing a substitute for intricate video semantic analysis in the summation of multimodal content. Most notably, ongoing one area of work that is presently being developed is the translation of the summary text into other languages.

Key Words: multimedia model, summarization, text, audio, language translation, neural network 1.INTRODUCTION With Snap Summaries, an inventive way to speed up information retrieval is presented. It offers concise summaries of multiple documents. Our approach compresses a variety of content into clear, in-depth summaries by using sophisticated algorithms. Snap Summaries seeks to maximize efficiency and accessibility by providing users with a concise and all- encompassing summary of various documents through the use of state-of-the art multimodal summation techniques. Anubhav Jangra et al[11] Multi-modal summarization is required to extract important data while eliminating redundant information because the current explosion of multimedia content has made it difficult to extract meaningful information. This method gives a more thorough portrayal by offering a variety of viewpoints and providing more tangible reinforcement for concepts. Nevertheless, this work focuses on text image- video summary generation (TIVS) through a differential evolutionbased multi-modal summarizing model (DE-MMS-MOO), whereas previous research largely focused on uni-modal summarization (text or images). Through multiobjective optimization, the model maximizes consistency between modalities and cohesion within them, providing a general framework that may be tailored to different optimization strategies. This innovative model addresses the problem of asynchronous data without alignment among various modalities by taking multimodal input and producing variable-size multimodal output summaries that include text, photos, and videos. Haoran Li et al[9] Efficient information retrieval is challenged by the exponential rise of multimedia data. Text summaries are provided by Multi-Modal Summarization (MMS), which allows users to quickly understand the main points of multimedia content without having to go through lengthy documents or films. This work presents a novel method for creating text summaries using asynchronous text, images, audio, and video on a given subject. However, bridging the semantic gap between multiple modalities for MMS is a substantial difficulty due to the heterogeneous nature of multimedia data. MAST, a new model for Multimodal Abstractive Text Summarization, is presented by Aman Khullar et al.[10]. It uses data from a multimodal movie that includes elements of text, audio, and video. A innovative extractive multiobjective optimization based methodology is proposed by Anubhav Jangra et al.[11] to generate a multimodal summary that includes text, graphics, and videos.

Impact Factor value: 8.226

ISO 9001:2008 Certified Journal

Page 422