Skip to main content

Neuro Pulse: An AI-Based Multimodal Emotion Detection System

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 13 Issue: 01 | Jan 2026

p-ISSN: 2395-0072

www.irjet.net

Neuro Pulse: An AI-Based Multimodal Emotion Detection System Sneha Bhargava¹, Shubhranshu Das², Rajguru Singh³, Rohit Soni´ 1,2,3,4 Department of Artificial Intelligence & Machine Learning, Oriental Institute of Science and Technology,

Bhopal, India

-------------------------------------------------------------------------------***-------------------------------------------------------------------------consists of several modules or components where each Abstract – Neuro Pulse is an AI-based multimodal of the emotion detection uses a separate mode thus one emotion detection system designed to analyze human can, for example, experiment with the facial detection emotions using text, speech audio and facial expressions to part or extend it easily in the future. support emotion-aware human–computer interaction. Traditional emotion recognition systems often rely on II. LITERATURE REVIEW unimodal inputs, limiting their ability to interpret complex emotional behavior. The proposed system independently A. Text-Based Emotion Detection processes textual, audio, and visual inputs using transformer-based Natural Language Processing models Text-based emotion detection has been widely studied in for text emotion detection, pretrained deep learning the field of affective computing and Natural Language models for speech emotion recognition, and convolutional Processing. Early approaches relied on rule-based and neural networks with computer vision techniques for facial keyword-driven sentiment analysis techniques, which emotion analysis. Neuro Pulse follows modular client– were limited in handling semantic and contextual server architecture with a React-based frontend and a variations in language. Recent advancements introduced Flask-based backend communicating through RESTful machine learning and transformer-based models that APIs. Experimental evaluation under controlled conditions significantly improved emotion classification by demonstrates reliable qualitative emotion detection across capturing contextual relationships within textual data. all modalities, highlighting the system’s effectiveness and suitability for academic and experimental applications.

B. Speech Emotion Recognition

Key Words: Multimodal Emotion Detection, Affective Computing, Natural Language Processing, Speech Emotion Recognition, Facial Emotion Analysis, Artificial Intelligence.

Speech emotion recognition focuses on identifying emotional states from vocal characteristics such as pitch, tone, and intensity. Traditional methods used handcrafted acoustic features, which required extensive feature engineering. Modern approaches utilize pretrained deep learning models that automatically learn emotional patterns from speech signals, resulting in improved robustness and performance under controlled conditions.

I. INTRODUCTION One of the key factors contributing to successful human– computer interaction is the ability to recognize emotions, as human emotions significantly influence decisionmaking, communication, and behavior. Considering the rapid growth of digital platforms such as online learning systems, virtual assistants, and remote communication tools, there is a greater demand for smart systems that are capable of understanding and responding to human emotions.

C. Facial Emotion Detection Facial emotion detection employs computer vision techniques to analyze facial expressions and identify emotional states. Convolutional neural networks have become the dominant approach due to their ability to automatically extract facial features and classify basic emotions such as happiness, sadness, anger, and surprise.

Most traditional emotion recognition systems are based on unimodal approaches like text sentiment analysis or facial expression recognition. These approaches certainly have their advantages; however, they tend to overlook the fact that people's emotional expressions in real life are multimodal, thus multifaceted. Hence people usually mix their language with voice tone and facial expressions to convey their emotions.

D. Multimodal Emotion Detection Recent studies highlight that multimodal emotion detection systems outperform unimodal approaches by combining emotional cues from multiple sources. By integrating text, speech, and facial modalities, multimodal systems provide a more comprehensive understanding of emotional states. The Neuro Pulse system adopts this multimodal approach with a modular architecture suitable for academic applications.

Neuro Pulse addresses these limitations by combining the three major sources of emotional information, namely text, speech, and facial expressions, into one integrated and coherent emotion model. The system

© 2026, IRJET

|

Impact Factor value: 8.315

|

ISO 9001:2008 Certified Journal

|

Page 301


Turn static files into dynamic content formats.

Create a flipbook
Neuro Pulse: An AI-Based Multimodal Emotion Detection System by IRJET Journal - Issuu