International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 11 Issue: 11 | Nov 2024
p-ISSN: 2395-0072
www.irjet.net
Recognition Of Mathematical Symbols From Images and Test Papers Viraj Shah1, Vaibhav Shah2, Rishikesh Sharma3 Soham Kulkarni4 Durgam Devani5 Arjun Jaiswal6 1,2,3,4Dwarkadas J. Sanghvi College Of Engineering , Mumbai, India 5St Francis Institute of Technology, Mumbai, India
6 Professor of Dwarkadas J. Sanghvi College Of Engineering , Mumbai, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - In this work, the automatic generation of
was heightened the more due to the disruption that was brought about by the COVID-19 pandemic.
LaTeX code coding for mathematical equations based on images is described. It can be tedious and inaccurate to code math equations in LaTeX manually considering we live in a world of speed and accessibility. For this task, we apply the self-attention strategies in deep encoder-decoder neural networks in order to convert mathematical images into floating vectors and then generate the corresponding LaTeX code accurately. Its architecture is designed to address the complexities and variations present within mathematical notation which makes the system competent in recognizing handwritten, printed, and scanned equations. After making a thorough review of the literature, we have managed to synthesize the key challenges concerning the identification of mathematical symbols as data collection, preprocessing, feature engineering, and selection of models. We tackle these challenges by employing multiple efficient machine learning models on a rich dataset which also went through extensive data cleaning. The results indicated the effectiveness of the proposed models as they performed exceptionally well on different datasets, surpassing the existing techniques. The proposed system is promising in enhancing the efficiency and the overall usability of mathematical symbols in the digital format.
Many techniques, with varying degrees of effectiveness, have been attempted to restore mathematics material, the first such attempt was made as early as 1967 [6]. Nowadays, thanks in great part to technical advances, optical character recognition (OCR) algorithms have become accurate enough to make it practical to recognize script in electronic documents. However, the comprehension and reconstruction of graphic information such as mathematical formulas is complex since apart from the conventional meaning of certain figures, the relational aspects of figures have to be apprehended as well. The inquiry has been carried out on the syntax analyzers and rule-based structural analysis strategies creating an algebraic formula to its markup languages. INFTY was the name of a working project and the essential part of its functioning was transformation of papers from LaTeX-like forms in the paper to structured ones. Infty Reader is a commercial software solution for processing digital images. It has been demonstrated that this deep learning technology enabled the replacement of manually created features and rules with learnable feature representations.
Key Words: Mathematical Symbol Recognition, LaTeX Code Generation, Encoder-Decoder Model, Deep Learning, Self-Attention Mechanisms.
The system under consideration may bring the greatest benefits for such a diverse group of users as teachers, students, researchers and professionals of various professions who use mathematical expressions more frequently. It will result in a more combined and productive environment for education and research by simplifying the process of creating LaTeX code for equations to a minimum. This paper discusses the main issues related to the data collection, data preprocessing, feature extraction, and model selection of mathematical symbol identification. The study included a very comprehensive literature review that aimed at the identification of best practices. Also, the authors conducted a survey of latest state-of-the-art systems to understand their respective weaknesses and strengths.
1.INTRODUCTION In the twenty-first century, the world has witnessed a large scale transformation into the digital space as gadgets and automatic systems have been integrated into the system of things. In the context of a developed form and rapid expansion of the information technologies, the performance of activities all has significantly improved and is much more efficient in the present day. Nonetheless, with these improvements, there are some problems still today, especially in the sphere of mathematical language. Despite the significance of mathematics in so many fields, complex equations continue to be presented in image form rather than in text form. This poses great problems for people who wish to type or revise mathematical expressions especially when such operations call for functionality and ease of use. The need for digitization of academic resources especially mathematical equations
Ā© 2024, IRJET
|
Impact Factor value: 8.315
We also describe the different system modules and features by describing the scope of our project. We also detail the assumptions and limitations involved in the development process. We also provide details of the dataset used for training, validating, and testing. Lastly, we discuss the architecture of our proposed system where we describe the
|
ISO 9001:2008 Certified Journal
|
Page 248