International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395 -0056
Volume: 04 Issue: 02 | Feb -2017
p-ISSN: 2395-0072
www.irjet.net
Design and Description of Feature Extraction Algorithm for Old English Font Sreesha Bhaskar1, Dr Saravanan K N 2 1 Dept. 2Associative
of Computer Science, Christ University, Bangalore, India Professor, Dept. Of Computer Science, Christ University, Bangalore, India
---------------------------------------------------------------------***--------------------------------------------------------------------1.1 PROBLEM DEFINION Abstract - The recognition of character has been challenging these days. In the field of text recognition of alphabets, much advancement has been done. This paper proposes a design and description of feature extraction algorithm for recognizing Old English Font. The proposed method consists of four stages, which states, data collection, preprocessing, feature extraction and recognition (minimum distance classifier). The old English font characters must be preprocessed in order to remove noises. The binarized image can be used for feature extraction method. Algorithm Old English Font is used for feature extraction. Minimum distance classifier has been used for the recognition of the character. The method gives a 79% satisfactory recognition rate.
Key Words: Old English font characters, Pre-processing, Feature extraction, Minimum distance classifier.
1. INTRODUCTION Character recognition or optical character recognition is a mechanism of conversion of handwritten or printed text (which is scanned) into machine editable form. The research on OCR system is improving day by day. The system uses OCR engine, a computer program which will recognize the character image represents. The OCR was proposed by Taushek and Handel. The proposed system focuses on old English font characters. Old English font is difficult for a human to read. There are few letters which are very difficult to recognize. The starting stage is scanning of the printed document in to image. The image which is scanned is processed in different stages and converts to character code, so that it can be edited and manipulated. Pre-processing is necessary after scanning a document since it contains many noises. Methods like binarization, noise removal, feature extraction, classification etc has been done. The type of preprocessing algorithms depends on the age of the document, quality of the paper and scanned image. Accuracy of OCR system depends of algorithms which we are using for feature extraction and classification. Though it does not mean other methods are not important. There is an increment found in use of multi-lingual documents which requires feature list to be more novel and classification process to be more precise for achieving more accuracy rate and less error rate in recognition process.
Š 2017, IRJET
|
Impact Factor value: 5.181
|
Old English or black letters were used thousand years ago and are important part of European cultural heritage.. Libraries, as the model preservers of printed archives, can no longer avoid the worldwide spin of digitization. Colleges, libraries wishing to digitize their old document collection face a great challenge. Recognition of old English font characters is an area where much researches has not done.
1.2 LITERATURE REVIEW OF EXISTING TECHNIQUES Ankit et al. in [1] explains how to identify engine number which has been engraved in two wheeler and four wheeler using optical character recognition techniques. This paper gives an accurate rate of 99.9%. In this paper all the preprocessing steps are done using java programming. An accurate result of 99.9% is obtained by storing all the images in the database with a particular format. Using java programming, vehicle registration gives correct result which matches the images in the database. Engine number which written in any language and any font will be identified with highest accuracy. There is some limitation with this technique, as number of images in the database increases a good care has to be taken. Identifying engraved number during vehicle moving condition kept as a future work. Adityaraj in [2] explains about how feature extraction carried out using detection of vertical line a character and detection of open space in lower zone of a character. Classification is done by combination of binary tree and naive Bayesian classifier. Image enhancement has been done using spatial filtering. Binarization is done using OTSU’s thresholding and for segmentation of each character is done using bounding box methodology. The feature extraction used till now are moment based feature and structural feature. Proposed OCR system uses two structural features. Detection of vertical line in a character is used to classify the oriya script into characters having vertical lines and characters with no vertical lines. Detection of open space in lower zone classifies characters which has open space on the lower zone. All the characters which are categorized in to four classes are put into the Bayesian classifier for the recognition process. It gives an average 99.25% of accuracy. Shuwair et al. in [3] explains the proposed OCR for Urdu framework which is built on MATLAB and Microsoft C#.Net. ISO 9001:2008 Certified Journal
|
Page 722