A DEEP LEARNING FRAMEWORK COMBINING DCNN AND LSTM FOR AQI TIME-SERIES CLASSIFICATION

Page 1


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

A DEEP LEARNING FRAMEWORK COMBINING DCNN AND LSTM FOR AQI TIME-SERIES

CLASSIFICATION

1PG Scholar, Dept. of CSE, Government College of Technology, Coimbatore, India

2ASSISTANT PROFESSOR, DEPT OF CSE, GOVERNMENT COLLEGE OF TECHNOLOGY, COIMBATORE, INDIA

Abstract - Air pollution is an increasingly serious environmental and public health concern, especially in the urban regions of developing nations like India. Harmful pollutants such as particulate matter and carbon monoxide pose significant health risks. This research presents a hybrid deep learning model that integrates Deep Convolutional Neural Networks with Long Short-Term Memory networks to forecast the Air Quality Index using time-series data. The model incorporates meteorological variables such as temperature, humidity, and atmospheric pressure alongside AQI data for both training and evaluation. The proposed architecture outperforms individual DCNN models and traditional machine learning techniques. While the DCNN component is responsible for learning spatial patterns, the LSTM component effectively captures temporal trends. The hybrid model achieves a classification accuracy of 97.45% and an AUC-ROC score of 0.97, surpassing the performance of existing approaches. This study highlights the effectiveness of the combined model in AQI forecasting, offering valuable insights for early warning systems and publichealthinterventionsagainstairpollution.

Key Words: Air Pollution, Air Quality Index , Deep Learning, Hybrid Model, Deep Convolutional Neural Network, Long Short-Term Memory, Time-Series Prediction,Meteorologicalparameters.

1.INTRODUCTION

Air pollution has become a major issue for both environmental sustainability and public health, especially in theurban regions of developingcountriessuchasIndia. Rapid industrialization, vehicular emissions, and biomass burning are major contributors to the degradation of air quality in Indian cities. This presents serious health risks, such as cardiovascular and respiratory conditions, so monitoring and forecasting air quality is crucial to urban management. The Air Quality Index is a standardized measure used to represent the concentration of air pollutionanditspotentialeffectsonpublichealth.

Accurate AQI prediction is crucial for enabling timely interventions, raising public awareness, and supporting evidence-based policy making. However, the complexity of air pollution, driven by non-linear interactions between various pollutants and meteorological factors, makes accurateforecastingachallengingtask.

A hybrid model is one that was created by combining two conventional models together, that gains its own characteristics, workflow, and advantages, ultimately creatingamoreeffective,adaptable,andefficientmodel.

ADCNNisanadvancedANNtailoredtoanalyzestructured data,suchastime-series,videos,andimages.Theirprimary advantage is their capacity to automatically. They are very goodatpatternrecognitiontasksbecausetheycandirectly learn hierarchical features from raw input data. DCNNs were firstcreated forcomputer vision, but theyhavesince been effectively used in a variety of other domains, including audio analysis, environmental modeling, and naturallanguageprocessing.

The LSTM is a specialized form of RNN optimized for processing sequential and time-series data, effectively managing long-term dependencies. They are especially skilled at obtaining long-term relationships and resolving issues with vanishing or expanding gradients that frequently impede conventional RNN training. Because of this feature, LSTMs are ideal for applications involving temporal patterns, like natural language processing, audio identification, and timeseries forecasting. Memory cells with systems in place to handle and store data for lengthy periods of time make up an LSTM network. In the architecture, the movement of data is governed by core partssuchastheinput,forget,andoutputgates,aswellas the internal cell state. A memory-enhanced RNN model cell's workflow consists of removing unnecessary informationfromthecellstate,incorporatingfreshdatavia the input gate, combining the new and retained data to enhance the cell state, and producing an output for the currentstepthatalsoaffectsthesubsequentphase.

2. RELATED WORK

The most populous province in Turkey, Van, has seen a serious environmental and public health problem with air pollution. Seasonal and climatic factors make this issue especially acute during the winter months. Increased use ofinferiorfuelsforhomeheatingisoneofthemaincauses of the declining air quality during this time. Particulate matter and an increase in atmospheric sulfur dioxide is causedbyhouseholdsusingcheapbutextremelypolluting fuel sources when evening temperatures drop sharply. Weather patterns have a significant impact on how pollutantsspreadandbuildup.Adetailedstudyconducted in Van City Center over a five-year period (2015–2020)

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

showed a strong link between unfavorable weather conditions and higher pollution levels. Specifically, lower wind speeds, which hinder the dispersal of airborne pollutants, combined with high atmospheric pressure and humidity,werefoundtointensifypollutantconcentrations. These conditions create stagnant air, trapping emissions closetothesurface,whichleadstoprolongedexposurefor localresidents.TotackletheissueofforecastingCondition of the air, the study used the Multiple Non-Linear Regressionmodel,whichdeliveredaccuratepredictionsby incorporatingkeymeteorologicalfactorssuchasminimum temperature, average wind speed, peak atmospheric pressure, and relative humidity[3]. The effectiveness of this model emphasizes the importance of integrating weatherdataintopollutionforecastingsystemstoimprove environmentalplanningandpublichealthmeasures.

Inanefforttoimproveairpollutionpredictiontechniques, a recent study proposed an innovative optimization method that combines bio-inspired algorithms with deep learning. The model, named BChOA LSTM, utilizes the Binary Chimp Optimization Algorithm to improve the efficiency of LSTM networks This algorithm, inspired via the coordinated hunting behaviors of chimpanzees, dynamically fine-tunes the hyperparameters of the LSTM model, optimizing its learning efficiency and predictive power. To validate the effectiveness of this approach, the research evaluated the effectiveness of eight different hierarchicalmachinelearningmodelsusingrigorouscrossvalidation[12]. Key performance indicators such as validation accuracy, the evaluation was carried out using Root Mean Squared Error and the coefficient of determination. The BChOA LSTM model consistently outperformedallothers,achievingavalidationaccuracyof 96.41%, significantly better than its counterparts. Additionally, it demonstrated lower RMSE values, signifyingreducedpredictionerrors,andhigherR²values, which suggest a strong capacity to explain data variance[4]. The convergence curve showed that BChOA LSTM trained faster and more stably, turning it into an optimal solution for real-time air quality monitoring forecasting. These results underline the model's potential for managing the impacts of fine particulate matter and otherairpollutants,showcasingitspromiseforimproving environmental analytics and public health risk management by combining evolutionary intelligence with time-seriesmodeling.

Air pollution continues to be a significant global environmental concern, posing risks to public health., ecologicalbalance,andoverallqualityoflife,particularlyin urbanized and industrialized regions. Several factors contribute to declining air quality, including emissions frommotorvehicles,thewidespreaduseofinefficientand pollutant-heavy fuels, and the dense infrastructure found in large cities. Particulate Matter 10, which contains small particles with diameters of 10 micrometres or below, is a major pollutant of concern due to its severe health effects

and prevalence in urban areas. To better understand the factors affecting air pollution, a study was conducted to examine the relationship between PM10 concentrations andkeymeteorologicalvariables,suchasairtemperature, relative humidity, and wind speed, in two different monitoringzoneswithinİzmirProvince,Turkey.Thestudy coveredafive-yearperiodfromJanuary2017toDecember 2021, providing a comprehensive temporal analysis. Statistical analysis using a one-tailed t-test revealed that while air temperature had no significant effect on PM10 concentrations,relativehumiditywasfoundtobethemost influential factor in both zones[5][8]. Wind speed also showeda connection,thoughitsimpactvaried bylocation and season. These findings highlight the complex relationship between meteorological conditions and pollutant behavior, emphasizing the need for localized assessments when developing air quality control and forecasting strategies. In particular, the significant role of relative humidity in PM10 dispersion suggests that air qualityinterventionsinIzmirandpotentiallysimilarurban areas should consider atmospheric moisture dynamics as animportantfactor.

Human Activity Recognition refers to the process of identifying everyday actions through the use of sensor technologies and machine learning techniques. However, implementing real-time HAR remains a significant challenge, largely because many existing studies rely on data collected in controlled, simulated environments rather than in real-world conditions. To address this gap, the study proposes a robust recognition model that integratesConvolutionalNeuralNetworksandLongShortTerm Memory networks, enhanced by an adaptive batch size mechanism. This adaptive training approach dynamicallyadjuststhebatchsizewithinarangeof128 to 1024duringmodeltrainingandvalidation,enablingbetter handling of data imbalances and variations in feature normalization[13]. A novel HAR dataset, curated in a public and uncontrolled setting, was used to evaluate the model. The system demonstrated exceptional performance, achieving a lowest average loss of 0.08 ± 0.136%andpeakingataclassificationaccuracyof99.29%. Additionally, the model outperformed existing methods, reaching accuracy rates of 99.5% and 99.8% in comparative benchmarks. These results highlight the effectiveness of adaptive deep learning frameworks in enhancing real-time activity recognition under real-world conditions.

This research investigates the short-term prediction of air pollution levels in Xi’an and Lanzhou two heavily urbanized cities in China that regularly face air quality issues. The study utilizes a blend of correlation analysis and neural network modeling to examine both direct and complex associations between the Air Pollution Index and various meteorological variables. The objective was to improve forecasting reliability by pinpointing the most relevant predictors and applying a data-driven model for

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

enhancedaccuracy.Afteranalyzingextensivedatasets,the researchers found that API values from the three preceding days, combined with sixteen meteorological indicators including average, minimum, and maximum temperatures as well as water vapor pressure were the most influential for model input. These historical API values, especially those from one to three days earlier, showedstrongpredictivepowerforestimatingnear-future pollution levels. To effectively capture these patterns, a Wavelet-basedNeuralNetworkmodelwasemployed.This type of architecture is particularly well-suited for interpreting nonlinear characteristics within time-series data.Thetrainingprocesswasrefinedusingamethodthat helpsreduceoverfitting,thusimprovingthemodel’sability to perform accurately on new, unseen data. When evaluated against other forecasting models, the wavelet neural network demonstrated consistently higher accuracy, reflecting its strength in recognizing both temporaltrendsandintricatevariableinteractions[6].The model's performance offers practical insights for improving short-term pollution forecasts, which could significantly aid city authorities and public health officials in crafting timely responses to deteriorating air quality conditions[4].

Particulate matter, especially PM2.5, represents a critical risktobothenvironmentalqualityandpublichealthdueto itssmallsize,whichenablesittodeeplyinfiltratethelungs and remain airborne for long periods. Accurately predicting PM2.5 concentrations remains a considerable challenge,primarilybecausethedataexhibitnonlinearand non-stationary behaviors. These fluctuations stem from a range of dynamic factors, including vehicular emissions, weather variations, industrial processes, and seasonal changes, which complicate the modeling of long-term patternsusingtraditionalforecastingtechniques.Totackle this issue, a novel hybrid model has been introduced that combines the strengths of Gated Recurrent Units with Empirical Mode Decomposition. The forecasting framework begins by applying EMD to the original PM2.5 timeseries,whichseparatesthedataintomultipleintrinsic mode functions. These IMFs represent different frequency bands, enabling the model to disentangle short-term variations from broader trends. Once decomposed, the IMFs are categorized based on their frequency characteristics, and each group is processed through a dedicatedGRUnetwork[6][8].GRUsareastreamlinedtype of recurrent neural network capable of capturing timedependent patterns effectively, while offering lower computational demands than more complex architectures such as LSTMs. After processing, the outputs from each GRU are combined to form a comprehensive prediction. The model was trained and validated using actual PM2.5 measurements collected in Beijing, a city known for its recurringairqualityissues.Resultsfromthestudyindicate that the EMD-GRU approach provides significantly improved prediction accuracy compared to conventional models[10]. Its capacity to handle the complexities of

nonlinear air pollution data highlights its potential as a valuableforecastingtoolforenvironmentalregulatorsand urban planners focused on mitigating the effects of fine particulatematter.

This research presents a customized Support Vector Machine framework designed for forecasting the Air Pollution Index, offering a solution to the limitations of conventional prediction methods, which are often computationally intensive and time-consuming. SVMs are particularly effective in managing complex, highdimensional datasets an essential feature when dealing with environmental and air quality data that often exhibit nonlinear characteristics. The effectiveness of the SVM modeldependsheavilyonthreemainparameters:thetype of kernel function applied, the epsilon (ε) used for error tolerance, and the penalty parameter (C) that controls the trade-off between model complexity and error margin. Amongthevariouskernelfunctionstested,theRadialBasis Function proved most efficient, thanks to its ability to model nonlinear patterns with high precision. To evaluate the model's performance, the study utilized key metrics suchastheSumofSquaresError,MeanSquaredStandard Error, and the coefficient of determination[7]. The results were notably strong, with an SSE of 3.14440, an MSSE of 0.9843,andanR²scorethatmatchedtheSSE,highlighting the model’s accuracy in identifying trends and relationships in air quality data[4]. The selection of the RBF kernel played a vital role in not only enhancing the prediction quality but also improving the model's computational efficiency. This makes the approach wellsuited for real-time air quality forecasting applications in urbanareas.

Theuseofbudget-friendlyairqualitysensorsrepresentsa valuable advancement in enhancing air pollution monitoring systems and raising public awareness about individual pollutant exposure. Thesecost-effective devices are particularly useful in regions where traditional monitoring infrastructure is either insufficient or absent due to financial constraints. Despite their advantages, one significant drawback of low-cost sensors is their vulnerability to environmental factors and the tendency for pollutants to interfere with one another issues that standard lab-based calibration methods often fail to address. To explore this issue, a recent investigation assessed the accuracy of various calibration methods applied to a real-time, low-cost sensor system capable of tracking multiple pollutants, including carbon monoxide, nitrogen dioxide, ozone, and carbon dioxide. Among the three, the random forest model consistently achieved higherorequivalentlevelsofprecisionandreliabilityover time[8][9].Thestrengthoftherandomforestapproachlies in its capacity to handle complex, nonlinear interactions andtoadjustforfluctuatingenvironmentalconditionsand interferences between pollutant signals capabilities that traditionalregressionmodelslack.Thisadaptabilitymakes it particularly well-suited for enhancing the reliability of

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

sensor outputs. The study highlights the critical role machine learning can play in refining the calibration process for affordable air quality monitors[12]. By incorporating intelligent algorithms into sensor calibration, it becomes possible to extend the reach and utility of air monitoring efforts, allowing for real-time, localized pollution tracking. This integration not only improves data accuracy but also supports more informed public health policies and community-level environmental action.

Motor imagery-based brain-computer interface systems utilize electroencephalogram signals to decode neural activity into actionable digital commands, allowing users to interact with or control external devices solely through imagined motor actions. These systems have shown significantpromiseacrossdiversedomains,particularlyin neurorehabilitation for stroke and spinal cord injury patients, in the control of prosthetic limbs, and in the development of assistive technologies for individuals with severe motor impairments. Despite this potential, one of thecritical challenges limitingthe widespread adoption of EEG-based BCIs is the inconsistent and often suboptimal performance of signal classification algorithms. This is primarily attributed to the noisy, non-linear, and nonstationary nature of EEG data, which complicates the task of reliably extracting discriminative features. To address these challenges, a novel hybrid deep learning model has been developed, integrating Two-Dimensional Convolutional Neural Networks with Long Short-Term Memory networks[2]. The CNN component is designed to extract robust spatial features by processing EEG signals represented as two-dimensional matrices capturing the spatial distribution and relationships among electrode channels. This representation leverages the topographical structure of EEG caps to model the brain’s functional layoutmoreeffectively.Inparallel,theLSTMcomponentis adept at modeling temporal dynamics by capturing longtermdependenciesandsequentialpatternswithintheEEG signal.This dual-stream architecture enablesthe model to jointly learn both spatial and temporal features, a significant advantage in decoding complex motor imagery patterns.A particularly innovative feature of this architecture is the inclusion of connectivity-based metrics such as phase-locking value, coherence, or mutual information into the LSTM training process. These features, extracted during the preprocessing stage, quantify the functional connectivity between different cortical regions, reflecting how various parts of the brain coordinate during motor imagery tasks. Their integration notonlyenhancesthephysiologicalrelevanceofthemodel but also contributes to increased interpretability and robustnessagainstinter-subjectvariabilityandsession-tosession fluctuations. Experimental evaluations conducted using benchmark EEG datasets demonstrate that the hybrid CNN-LSTM model significantly outperforms traditional machine learning approaches and standalone deep learning models. It achieved an average validation

accuracyof98.5%andatestaccuracyof93.3%,markinga notable 3.1% increase in classification accuracy over conventional techniques[9]. Furthermore, it yielded an improvement of 0.04 in the F1-score, indicating better balance between precision and recall. These performance gains highlight the model’s strong generalization capabilities and its resilience to the inherent variability of EEGdata.

To enhance the precision and robustness of Air Quality Index forecasting, this study introduces a novel hybrid deep learning architecture known as VMD-SE-LSTM. This model is meticulously designed to address the inherent complexity,nonlinearity,andnon-stationarityofAQItimeseries data challenges that arise due to the dynamic interplay of environmental variables, anthropogenic emissions, meteorological conditions, and other unpredictable pollution sources, especially in densely populated urban regions.The framework initiates the forecastingpipelinewithVariationalModeDecomposition, a powerful signal processing technique that decomposes the AQI time-series into a set of Intrinsic Mode Functions. Each IMF encapsulates distinct frequency components of the original signal, enabling the model to isolate and analyze various underlying temporal patterns such as noise, cyclical behaviors, and long-term trends[10]. This decomposition step not only enhances signal clarity but alsoallowsforamoregranularunderstandingofpollution dynamics by separating transient anomalies from persistent environmental signals.Following the decomposition, Sample Entropy is applied to each IMF to quantifyitscomplexity.Thismetricassessestheregularity and predictability of time-series data, providing an objective basis for selecting the most informative and structurally meaningful components. By discarding lowentropy IMFs and emphasizing the more complex and informative sub-series, the model effectively reduces computational overhead while enhancing focus on signal elements that contribute most significantly to forecast accuracy. Each selected IMF is then modeled using Long Short-Term Memory networks, which are well-suited for capturing the temporal dependencies and sequential dynamics inherent in time-series data. LSTM networks excel at learning long-term relationships and mitigating the vanishing gradient problem, making them particularly effectiveforAQIforecasting,wherepollutantlevelsmaybe influenced by events and conditions from prior days or even weeks. The independent LSTM models learn to predict AQI values based on their respective input subseries, each contributing a specialized perspective to the overallprediction.Thisfusionmechanismensuresthatthe prediction integrates both high-frequency variations, such as rapid pollutant spikes caused by traffic congestion or industrialactivity,andlow-frequencycomponents,suchas seasonal or climatological patterns[6][8]. By combining thesetemporalinsights,theVMD-SE-LSTMmodelachieves a more comprehensive understanding of AQI evolution, leading to more accurate and timely forecasts.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

Experimental evaluations on benchmark urban datasets reveal that the VMD-SE-LSTM framework outperforms conventional and state-of-the-art hybrid forecasting models. In comparative studies, it demonstrated substantial improvements in standard evaluation metrics suchasRootMeanSquareError,MeanAbsoluteError,and classification accuracy across multiple AQI categories.

S.No STUDY

Notably, its enhanced generalization ability enables consistent performance across cities with diverse environmental and industrial profiles, underscoring the model'sadaptabilityandscalability

Table – 1: MethodologyandKeyfindings

METHODOLOGY

1 The Influence of Meteorological FactorsonAirQualityinVan,Turkey MNLRModel

2 Optimizing LSTM Network Using BinaryChimpOptimization BChOA-LSTM

3 Influence of Meteorological Parameters on PM10 in Izmir, Turkey StatisticalAnalysis (t-test)

4 Adaptive Batch Size-Based CNNLSTMFramework CNN-LSTMwithAdaptiveBatch Sizes

5 Air Pollution Forecasting Using WANN with Meteorological Conditions Wavelet Artificial Neural Networks(WANN)

6 Deep Hybrid Model Using EMD for AirQualityPrediction

EMD+GRUHybridModel

7 Prediction of Air Pollution Index (API)UsingSVM Support Vector Machine (SVM) withRBFKernel

8 Random Forest CalibrationforLowCostAirQualitySensors

9 2D CNN-LSTM Hybrid Algorithm for EEGData

10 DailyAQIForecastingusingVMD,SE, andLSTM

3. PROPOSED SYSTEM

RandomForest(RF)Calibration Model

KEY FINDINGS

Meteorological factors significantly impactPM10andSO2levels.

Achieved 96.41% accuracy; outperformed other deep learning modelsforPM2.5prediction.

Relative humidity had the most significantimpactonPM10.

Achieved99.29%accuracyforhuman activity recognition, effectively handlingimbalanceddata.

Improved short-term API prediction; Bayesian regularization enhanced accuracy.

Improved PM2.5 prediction accuracy by decomposing data into frequencybasedcomponents.

Achievedhighaccuracyandlowerror ratesforcomplexAPIpredictiontasks.

Enhanced sensor performance by addressing pollutant crosssensitivities and environmental variations.

CNN-LSTMHybridModel Achieved 98.5% accuracy; applicable for time-series data and sequential predictiontasks.

VMD+SE+LSTMHybridModel

Theobjectiveistopredicttheairqualityindexinadvance usingahybridmodelthatintegratesbothDCNNandLSTM.

3.1 System Model

The dataset used for testing and training purpose is time series air quality dataset which includes pollutants and meterological parameters. After preprocessing the model is learned with the extracted features. Once the model is

Improved daily AQI predictions by capturing frequency-specific AQI seriestrends.

properlytrainedanditisassessedbythetestdata.Several metrics,includingaccuracy,precision,recall,F1score,and AUC score, are used to assess performance. Figure 3.1 providesthesystemarchitecture.

Fig – 1: Systemarchitecture

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

3.2 Data Collection

The dataset for this study includes time-series air quality data from India, collected from the Central Pollution Control Board monitoring stations across 14 sites in Delhi.It provides air quality data spanning the years 2022 and2023,coveringarangeofIndiancities. Thedataspans from January1,2022, toDecember31,2023,andincludes measurements of six air pollutants: PM 2.5, PM 10, Nitrogen Dioxide, Sulfur Dioxide, and additional key

pollutants. It also includes environmental data, such as wind speed, temperature, humidity, and wind direction.The datasets has splitted into two set such as trainingsetandtestingset.

Table-2: Datasetdetails

TRAININGDAYS 584

TESTINGDAYS 146

3.3 Data Preparation

The data is pre-processed using cleaning techniques, enhancing the learning efficiency of deep learning models.Raw data is transformed through data preprocessing into a format that can be processed more quickly and efficiently in deep learning applications. To achievereliableresults,thestrategiesaretypicallyapplied intheverybeginningstagesofdeeplearningprocessing.

3.4 Air Quality Index Computation

The Air Quality Index is calculated using collection of pollutantsandbreakpointsthatcorrespondtothem.Using the following formula, a subindex is determined for every pollutant:

Ip=[IHi–ILo/BPHi–BPLo](Cp–BPLo)+ILo

Ip=Sub-indexforpollutant

Cp=Measuredconcentrationofpollutant

BPLo, BPHi = Lower and upper concentration breakpoints correspondingtoCp

ILo,IHi=AQIbreakpointscorrespondingtoBPHi,BPLo

The aggregate AQI is calculated as the maximum value of sub-indexofallpollutantstakenintoconsideration:

AQI=max(Ip1,Ip2,…,Ipn)

Once the AQI is computed, it is categorized into classes basedonpredefinedranges:

Table – 3: AQICategorization

AQI RANGE CLASSIFICATION PHYSICAL IMPACT

0–50 Good Air is safe for breathing.

51–100 Moderate Acceptable air condition with potential minor healthconcernsfor susceptiblegroups.

101–150 Unhealthyfor Sensitive Groups It causes major effectstothe individuals who are weak.

151–200 Unhealthy Increased health risks for the general population.

>200 VeryUnhealthy Severe health impacts.

3.5 Windowing

Windowing is a technique used in time-series analysis to create fixed-length sequences from data, typically for use in sequential models like LSTMs. It involves defining a window size and sliding it across the data tocreateinputoutputpairs,whereeachwindowrepresentsasequenceof past observations, and the target is usually the next value in the series. This method helps capture temporal dependencies, making it ideal for tasks like air quality prediction, where past pollutant concentrations are used topredictfutureAQIlevels.

4. MODELS

4.1 Deep Convolutional Neural Network

ThethreelayersthatcompriseDCNNaretheoutputlayer, input layer, and hidden layer. The layers of convolution employ different sets of learnable filters that process the inputimage.Thesefiltersgeneratefeaturemapsthatdraw attention to particular characteristics and patterns in the text by sliding in the input layer and performing elementwise multiplication and combining.Following convolution layers the pooling layers are applied to downsample the dimensions of feature maps and lower computational complexity.After every layers of convolution and pooling, the Rectified Linear Unit activation function is used.They give model non-linearity, which facilitates the recognition of intricate patterns and representations.The fully connected layers combine the obtained properties to provide predictions. The dropout layer is one regularisationtechnique.Its objectivesare to make neural networks more robust, reduce overfitting, and promote bettermodelgeneralisation.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

CONVOLUTIONLAYER

MAXPOOLINGLAYER

DROPOUTLAYER

CONVOLUTIONLAYER

MAXPOOLINGLAYER

DROPOUTLAYER

4.2 Long Short Term Memory

The Long Short-Term Memory model uses its special architecture, which consists of input gates, forget gates, memory cells and output gates, to efficiently process

4.3 Hybrid Model

sequentialdata.Asaresult,bothlong-termandshort-term dependencies present in data can be captured by the model.The information flow through the network is managedbygatesintheLSTMmodel.Theseconsistsofthe candidatehiddenstateht′,updategatezt,andresetgatert.

Theupdategateformulais,

zt=σ(wz[ht−1,xt]+bz)

Theresetgateformulais,

rt=σ(wr[ht−1,xt]+br)

Theresetgateandthepresentinputxtareusedtoupdate thecandidatehiddenstateht:

ht′=tanh(w⋅[rt∗ht−1,xt]+b)

Lastly,update gate zt weights the collection of candidate hidden state ht′ and the previous hidden state ht−1 to generatethefinalhiddenstateht:

ht=(1−zt)∗ht−1+zt∗ht′

‘σ ‘= sigmoid function, ’.’ = dot product, ’*’=element wise multiplication, ‘h(t-1)’= previous hidden state, ‘xt’= current input,’ w’= weightparameter,‘b’=biasparameters.

DATAPREPROCESSING

DATASET

Fig – 2: DCNNArchitecture
Fig – 3: Hybridmodelarchitecture

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

By fusing Deep Convolutional Neural Networks with Long Short-TermMemorynetworks,thehybridmodelforecasts the Air Quality Index. Nitrogen Dioxide, Sulphur Dioxide, Carbon Monoxide, Ozone, Particulate Matter 2.5, Particulate Matter 10 , and other pollutants are processed bytheDCNNforspatialfeatureextraction,whiletheLSTM learns from the sequential nature of the air quality measurements to obtain time-related dependencies in the time-series data. This hybrid approach produces a more accurate and reliable AWI prediction model by enabling the model to handle both the spatial relationship in the dataandthetime-dependentpatterns.

4.4 Convolution layer

DCNNs are made up of many layers of convolution, every layer processes the given image using a different set of learnablefilters,orkernels.Thesefilterscreatefeature

maps that capture specific details and patterns in the text by sliding across the input and performing element-wise multiplicationandsummation.

4.5

Max pooling layer

Pooling layersarecommonlyapplied to reducethespatial sizeoffeaturemapsfollowingtheconvolutionallayers. By poolingtheresources,thenetworkbecomesmoreresilient to even tiny changes in the input and helps to lower the computationalcomplexity.

4.6

Fully connected layer

DCNNs usually include single or many fully connected layers that combine the learned parameters to make predictions. These layers are frequently utilized in the network'slastphases

4.7

Activation function

The Rectified Linear Unit activation function is used after every layers of convolution and pooling layer. They provide the model non-linearity, which helps it pick up complexpatternsandrepresentations.

4.8

Dropout layer:

One method of regularisation is the dropout layer. It aims to increase the resilience of neural networks, lessen overfitting,andimprovemodelgeneralisation.

5. PERFORMANCE AND RESULT ANALYSIS

5.1 Accuracy

Accuracy refers to the proportion of correct predictions comparedtothetotalnumberofpredictions.Itshowshow

wellthemodelrecognizesrelationshipsandpatternsinthe dataset.Theaccuracyformulais:

5.2 Precision

Precisionistheratioofcorrectlyidentifiedinstancestothe total number of instances that were predicted as positive. Everyinstanceofdatathatisclassifiedaspositiveandhas a precision value of 1 is a positive instance of data. The number of positive cases with the label negative that are projected as positive, it is vital to note, is unaffected by this.

5.3 Recall

The proportion of positively classified events among all positively occurring events is represented by recall, also knownassensitivity.Itisdefinedasfollows,

5.4 F1 Score:

Precision and recall are not thought to be accurate indicators of a classifier's performance. F1 has been rated more significant because it incorporates recall and precision and provides a score between 0 and 1. It is calculatedastheharmonicaverageofprecisionandrecall, providingabalancedmeasureofboth.

5.5 AUC-ROC

The Area Under the Receiver Operating Characteristic Curve (AUC-ROC) measures how well the model can distinguishbetweendifferentclasses. Improved categorization ability is indicated by a higher AUCscore.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

Chart-1: Performanceanalysis

5.6 Result Analysis

The performance of the DCNN, LSTM, and the hybrid DCNN-LSTM model was evaluated using the metrics described. The hybrid model consistently outperformed theindividualmodels,asshowninthetablebelow:

Table-4:Evaluationmetrics

6. CONCLUSIONS

The Air Quality Index is predicted in urban settings using the suggested hybrid DCNN-LSTM mode. Convolutional layers were used in the hybrid DCNN-LSTM model to extract spatial features, while LSTM layers were used to extract temporal dependencies. Metrics like precision, recall, F1, and AUC score are used to assess the suggested system's robustness and effectiveness. The capacity of deep learning to handle complicated, high-dimensional data isused inthisapproachtoget overthedrawbacksof traditional machine learning approaches. The model is particularlywell-suitedforreal-timeairqualityforecasting andmonitoringinurban environmental management.The study demonstrates how deep learning models might enhance systems for predicting air quality, giving urban planners and policymakers a potentially helpful tool to lessen the health hazards related to air pollution. Future research could concentrate on integrating the model with real-time monitoring systems for continuous air quality predictionandrefiningitfordeploymentinvariouscities.

REFERENCES

[1] Qin, S., Liu, F., Wang, J., & Sun, B. (2014). Analysis and forecasting of the particulate matter (PM) concentration levelsoverfourmajorcitiesofChinausinghybridmodels. Atmospheric Environment, 98,665-675.

[2] Gu, K., Qiao, J., & Lin, W. (2018). Recurrent air quality predictor based on meteorology-and pollution-related factors. IEEE Transactions on Industrial Informatics, 14(9), 3946-3955

[3] Cekim, H. O. (2020). Forecasting PM10 concentrations usingtimeseriesmodels:acaseofthemostpollutedcities in Turkey. Environmental Science and Pollution Research, 27(20),25612-25624.

[4] Gu, K., Zhou, Y., Sun, H., Zhao, L., & Liu, S. (2020). Prediction of air quality in Shenzhen based on neural network algorithm. Neural ComputingandApplications, 32, 1879-1892.

[5]Koo,J.W.,Wong,S.W.,Selvachandran,G.,Long,H.V.,& Son,L.H.(2020).PredictionofAirPollutionIndexinKuala Lumpur using fuzzy time series and statistical models. Air Quality,Atmosphere&Health, 13,77-88.

[6] Navares, R., & Aznarte, J. L. (2020). Predicting air qualitywithdeeplearningLSTM:Towardscomprehensive models. EcologicalInformatics, 55,101019.

[7] Photphanloet, C., & Lipikorn, R. (2020). PM10 concentration forecast using modified depth-first search and supervised learning neural network. Science of the totalenvironment, 727,138507.

[8] Rajak, R., & Chattopadhyay, A. (2020). Short and long term exposure to ambient air pollution and impact on health in India: a systematic review. International journal ofenvironmentalhealthresearch, 30(6),593-617.

[9] Schibuola, L., & Tambani, C. (2020). Indoor environmental quality classification of school environments by monitoring PM and CO2 concentration levels. Atmospheric PollutionResearch, 11(2),332-342.

[10] Wang, Z., Yue, S., & Song, C. (2021). Video-based air qualitymeasurement with dual-channel 3-Dconvolutional network. IEEE Internet of Things Journal, 8(18), 1437214384.

[11] Wang, Z., Yang, Y., & Yue, S. (2022). Air quality classification and measurement based on double output vision transformer. IEEE Internet of Things Journal, 9(21), 20975-20984.

[12] Baniasadi, S., Salehi, R., Soltani, S., Martín, D., Pourmand, P., & Ghafourian, E. (2023). Optimizing long short-term memory network for air pollution prediction using a novel binary chimp optimization algorithm. Electronics, 12(18),3985.

[13]Choudhury,N.A.,&Soni,B.(2023).Anadaptivebatch size-based-CNN-LSTM framework for human activity recognition in uncontrolled environment. IEEE

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

Transactions on Industrial Informatics, 19(10), 1037910387.

[14] Li, Q., Guan, X., & Liu, J. (2023). A CNN-LSTM framework for flight delay prediction. Expert Systems with Applications, 227,120287.

[15] Barthwal, A., & Goel, A. K. (2024). Advancing air quality prediction models in urban India: a deep learning approach integrating DCNN and LSTM architectures for AQI time-series classification. Modeling Earth Systems and Environment, 10(2),2935-2955.

© 2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008 Certified

| Page

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.