Issuu

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 12 Issue: 11 | Nov 2025

p-ISSN: 2395-0072

www.irjet.net

A Lightweight Transfer Learning Approach for Environmental Sound Classification on Edge Devices Hasan Al-Qadhi 1 1 Department of Electrical and Computer Engineering, King Abdulaziz University, Jeddah, Saudi Arabia

---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - Environmental Sound Classification (ESC) is a

general city governance, smart-city infrastructure, and public security systems and may also be utilised live in emergencies to identify alarms, sirens, or other unusual sounds [3]. Additionally, it may be applied live to supervise ecosystems, including detecting acoustic patterns like animal sounds or chirping birds [4]. Most recent progress was made possible due to the emergence of deep learning, particularly Convolutional Neural Networks (CNNs), which enable models to automatically spot features in raw audio signals. Due to the CNN’s capacity to capture both spectral and quick visual qualities from log-Mel spectrograms, such architectures have become the major strategy for ESC duties [8]. The VGGish model, which was already trained on vast audio recordings, is a model template and a good starting point for transfer learning in low-resource situations, allowing for light, strong models suitable for edge and embedded settings to be readily prepared.

critical computation in the intelligent perception for smart environments and environmental monitoring. The objective of this paper is to present a light-weight ESC model developed using transfer learning on a pre-trained VGGish model suitable for real-time inference on CPU-based and resource-constrained platforms. The method involves converting the raw environmental audio input signals to log-Mel spectrograms, fine-tuned using a small convolutional head, with the rest of the model backbone frozen. Moreover, the model is trained using durability expansion approaches such as low-scale amplitude sound and arbitrary volume scaling to boost endurance and reduce overfitting, respectively. All the scripts were carried out in MATLAB Online R2025b on the ESC-10 sub-dataset, with the model having an overall accuracy of 75.0% and a macro-F1 score of 74.34% on the validation set. Therefore, the results showed that transferring the learning-based CNN network can strike a pleasant medium between efficiency and accuracy; hence such a model can be used in real-time without a GPU on the edge or embedded platforms

1.2 Problem Statement and Research Gap Although Environmental Sound Classification (ESC) has made substantial advancements, it continues to face specific challenges that set it apart from speech and music classification tasks. Environmental sounds are often irregular in pattern, vary in both duration and intensity, and are frequently embedded within background noise [13].

Key Words: Environmental Sound Classification (ESC), Transfer Learning, VGGish, Lightweight CNN, Edge AI, MATLAB Online, Audio Feature Extraction.

1.INTRODUCTION Sound is one of the most crucial human senses that provide moderation and reality. Environmental sounds, such as the rain, car honking, footsteps, or bird chirping, are vast sources of contextual information that allow a person to understand and engage with their environment. In addition to everyday life, sound moderation is essential for safety, judgement, and context understanding [1]. Moreover, given that sound moderation is critical for intelligent behaviour, researchers have long sought to develop frameworks that enable machines to automatically detect and label the sounds present, similar to the human brains’ auditory perception. Thus, the field has a long history but has gained a new dynamic due to the recent progress in Artificial Intelligence and Machine Learning [2]. The technology developed based on this idea of environmental sound classification has multiple applications, including wildlife supervision, traffic and

Impact Factor value: 8.315

Earlier methods based on manually engineered features— such as Mel-Frequency Cepstral Coefficients (MFCCs), Chroma features, and spectrogram descriptors—used in combination with traditional classifiers like Support Vector Machines (SVMs), Gaussian Mixture Models (GMMs), and Hidden Markov Models (HMMs), performed adequately under controlled conditions. However, these approaches generally failed to generalize well in noisy or real-world environments [13]. The adoption of deep learning has helped overcome several of these limitations. Convolutional Neural Networks (CNNs), in particular, have shown strong performance in automatically extracting spatial and spectral features from spectrogram representations, especially when used in conjunction with transfer learning techniques [9].

ISO 9001:2008 Certified Journal

Page 27