OPTICAL CHARACTER RECOGNITION(OCR) TEXT DETECTION USING TESSERACT by IRJET Journal

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 11 Issue: 04 | Apr 2024

p-ISSN: 2395-0072

www.irjet.net

OPTICAL CHARACTER RECOGNITION(OCR) TEXT DETECTION USING TESSERACT Gaytri Sirskar1, Mrunali Wande2 , Bhagyashree Bobde3 , Dhanraj Morkhade4 , Nandan Pokale5, Prof. S. R. Gudadhe6 1,2,3,4,5 Students of Final Year, Sipna College of Engineering and Technology, Amravati, Maharashtra, India 6Assistant Professor, Department of Computer Science and Engineering, Sipna College of Engineering and

Technology, Amravati, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract – Digitalized data is required in the modern day to

environment. Manual data entry and document handling techniques are time-consuming and error-prone, which reduces productivity and obstructs the smooth flow of information. This environment has completely changed with the introduction of optical character recognition(OCR) technology, Which makes it possible to automatically extract text from photographs and expedite the digitalization of documents. To fulfill the needs of contemporary information processing requirements, OCR systems speed and efficiency are still vital components.

enable speedier task completion and processing. Extracting the text from the images is the most effective technique to digitalize them. Many text identification task including image text recognition and optical character recognition can be used to process text. The technology of optical character recognition(OCR) was used to transform printed text into editable text. In a variety of applications, OCR is very helpful and popular approach. Text preparation and segmentation techniques can influence OCR accuracy. It is a technology that recognizes text within a digital image.

OCR is commonly utilized in banking, where it can process demand drafts and checks without the need for human intervention. With the use of a smartphone camera, one can instantaneously scan the writing on a demand draft or check, transferring the exact amount of money. This method is fairly accurate for handwritten demand drafts or checks as well, however signature verification is necessary. It is almost perfected for printed demand drafts or checks. A notable trend toward digitizing paper documents has also emerged in the legal sector. Documents are being digitized to reduce storage requirements and do away with the need to go through bins of paper files. By enabling text searching for documents, OCR streamlines the process even further by making it simpler to find and manipulate them within the database. Legal practitioners can now quickly and easily search through a vast electronic document repository by only entering a few keywords. Numerous other industries, like as education, banking, and government organizations, heavily rely on OCR.Our technology is ready to enable not just faster transitions to paperless settings but also higher levels of data accessibility and accuracy, from the legal to the financial, healthcare and government sectors.

These days, there is a huge need for information that may be found on paper, such books or newspapers. Information can currently be stored by scanning the desired text, but this method just saves the data as an image that cannot be processed further. For example, text recorded in scanned photos cannot be read line by line or word by word; we would have to completely redo the language included in these images before we could use them again. Text detection from papers when text is integrated in intricately colored document images is an extremely difficult problem. Many possible users would like to extract text from documents, archive documents, and other images. The user needs optical character recognition (OCR) because of this. Its goal is to identify textual areas in the document and distinguish them from the graphical section. obtaining data straight from application forms and greatly reducing time. This study explains the basic ideas behind the OCR, including feature extraction strategies, picture preprocessing approaches, and recognition algorithms. It highlights significant turning points and innovations as it examines the development of OCR technology from early character recognition systems to contemporary deep learning-based techniques.

An eye can recognize, view and extract text from images, but person’s brain must analyze any text that the eye detects or extracts. Naturally, OCR technology is still not as sophisticated as human talent. The quality of the input that the eye reads directly affects hoe well the brain functions when it comes to text recognition in humans. Numerous issues and difficulties may arise during the planning and execution of a computarized optical character recognition system. For instance some figures and letters differ just enough frm one another for computers to accurately identify

Key Words: Optical Character Recognition, Tesseract, Python, Django, Image Preprocessing, OpenCV

1.INTRODUCTION The need for effective and quick document processing and digitization has increased in an increasingly digital

Impact Factor value: 8.226

ISO 9001:2008 Certified Journal

Page 1569