ISSN 2348-1196 (print) International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online) Vol. 8, Issue 1, pp: (12-23), Month: January - March 2020, Available at: www.researchpublish.com
Ge’ez POS Tagger Using Hybrid Approach HAGOS GEBREMEDHIN GEBREMESKEL Nankai University, college of Software Engineering, December 2019 Email: hagosgebrem@gmail.com or hagosgebrem@yahoo.com
Abstract: This paper proposes a series of carefully designed a Ge’ez POS tagging using Hybrid approach. Trigram N tag tagger combined with the human written rule, regular expression and morphological pattern analysis based tagger of Ge’ez part of speech tagger. Ge’ez literature on syntax, morphology and grammar are reviewed to understand the nature of the language and also to identify possible tag sets. Experiments aiming at evaluating the influence of automatic pre-annotated on the manual part-of-speech annotation of a corpus, both from the efficiency and the accuracy points of view, with a specific attention drawn to biases. As a result, 26 broad tag sets were identified and 15,154 words from around 1,305 sentences collected from one genre i.e., Holy bible. Then, those words ware manually tagged by Ge’ez language professionals for training and testing purpose. The hybrid of TnT with human annotated rule, regex and morphological pattern analysis of Ge’ez language is assumed to perform better than the TnT taggers taken alone. Individual and hybrid experiments have conducted for the three types of taggers namely the TnT tagger, TnT with Regex tagger and Hybrid tagger. The results are 77.87%, 82.23% and 94.32% performances are obtained for TnT tagger, TnT with Regex tagger and Hybrid taggers respectively. Therefore, the performance of Hybrid approach have the best than individual performance. Finally, this paper concludes Hybrid approach have permissive result for Semitic languages. Keywords: Ge’ez, POS tagger, NLP, TnT, Regex, Hybrid POS tagger.
1. INTRODUCTION Language is one of the fundamental features of human behavior and it constitutes a crucial component of our lives. In its written form, it serves as a means of recording information and knowledge on a long term-basis and transmitting what it records from one generation to the next. In its spoken form, it serves as a means of coordinating our day-to-day life with others (Allen_James, 1995). According to Noam Chomsky (Anon., n.d.), a language is a set (finite or infinite) of sentences, each finite in length and constructed out of a finite set of elements. Language is an aspect of human behavior. In written form, it is a long-term record of knowledge from one generation to the next while in the spoken form it is a means of communication. Language is the key aspect of human intelligence and can be categorized as natural and Artificial language. Natural language is an ordinary language that has evolved as a normal means of communication among people. Examples: English, Ge’ez, Amharic, Afaan-Oromo and Tigrigna. Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve: natural language understanding, enabling computers to derive meaning from human or natural language input; and others involve generation is a theoretically motivated range of computational techniques for analyzing and representing naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications in a computer (Liddy_Elizabeth, 2001). Additionally, NLP is the means for accomplishing different types of tasks and/or applications. Such tasks include part of speech (POS) tagging, named entity recognition (NER), information retrieval (IR), speech recognition, machine translation, question answering etc. (Liddy_Elizabeth, 2001). POS tagging is the process of assigning parts of speech like noun, verb, preposition, pronoun, adverb, adjective or other lexical class markers to each word in a sentence or literature. POS tagging is the first step to understanding a natural language. Most other tasks and applications heavily depend on it (Binyam_Gebrekidan, 2009). The significance of POS
Page | 12 Research Publish Journals