International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 11 Issue: 12 | Dec 2024
p-ISSN: 2395-0072
www.irjet.net
Optimizing Machine Learning Algorithms for Enhanced Data Quality and Integrity in Real-Time Processing Environments Purvaja Biche1, Aditya Utpat2 1Department of Computer Science SP College, Pune, India 2Department of Computer Science JSPM’s Rajarshi Shahu College Of Engineering, Pune, India
---------------------------------------------------------------------------***--------------------------------------------------------------------------Abstract—Continual advancement in real-time data Convolutional Neural Networks (CNN),Support Vector Machines (SVM),Random Forest Models processing has brought to light the demand for highquality and high- integrity data to support judgment in a dynamic atmosphere. The current study I. INTRODUCTION ambitions to upscale data quality and in- tegrity In the crucible of contemporary data-driven landscapes, through optimization of the machine learning the optimization of machine learning algorithms emerges as model as illustrated by the context of JPMorgan a decisive factor in the pursuit of impeccable data quality transactional data. This environment often brings and integrity (Allioui et al., 2023). Our focus narrows onto a together high-frequency trading and incredible spe- cific arena, leveraging JPMorgan’s transaction data, volumes of real-time transactions. The proposed unraveling the intricate tapestry of challenges embedded in study relies on a robust methodology to evaluate the real-time data processing environments. impact of hyper parameter tuning on three predictive models, i.e., CNN, SVMs, and Random Forests. By mandating to a duteous data pre-processing process and enacting measured hyper parameter optimization, the study finds that model performance notably im- proved. Discovery highlights the SVM and Random Forest models that demonstrated refined predictive capability as measured by a substation reduction in RMSE and a notable enhancement in accuracy. By contrast, while performance remained stabilized, the CNN model showcased a trade-off between RMSE and persistence, suggesting adaptable output in dynamic settings. This finding demonstrates the fine balance amidst precision and adaptiveness critical to real-time usage. Outcome indicate that upgraded, optimized model exhibit potential transformational abilities when utilized in real-world use cases including fraud detection, predicting stock market shifts, and image identification. The study augments the existent literature regarding algorithmic harmony while also enabling a particular course of action to make maximal utilization of machine learning models in real-life, fast-paced data ecosystems. The study contributes to promoting data integrity as a crucial aspect that underpins efficacy, consistency, and judgment making in contemporary finance.
A. Contextualizing the Challenge Institutions like JPMorgan navigate the complexities of the financial world by managing vast volumes of transactional data, a stark contrast to the traditional batch processing systems (George, 2024). This real-time data ecosystem, especially evident in high-frequency trading, demands instantaneous decision-making, introducing a set of unique challenges. For example, during a particularly volatile trading day, JPMorgan’s systems must accurately process over 100,000 transactions per second, each needing validation and execution within mil- liseconds to capitalise on fleeting market opportunities. This scenario underscores the critical need for ultra-low latency and high reliability in their trading infrastructure to ensure competitive advantage and operational integrity in a landscape where every fraction of a second counts (Bi et al., 2024). Additionally, maintaining data accuracy and security amidst this high-speed transactional flow poses significant challenges, requiring sophisticated algorithms and robust cybersecurity measures to navigate the dynamic, high-stakes environment of realtime financial markets. B. Implications of Inaccurate Data The implications of a misstep in this real-time dance with data are profound, especially when tethered to the intricate web of JPMorgan’s operations (Hoffman, 2022). Imagine a momentary glitch distorting transactional records a ripple effect emerges. From inaccurate financial reporting to opera-
Index Terms—Machine Learning, Real-Time Processing, Data Quality, Data Integrity, Hyperparameter Tuning, Financial Data Analytics,
© 2024, IRJET
|
Impact Factor value: 8.315
|
ISO 9001:2008 Certified Journal
|
Page 707