Free Writing - Grammatical Error Correction System: Sequence Tagging

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 09 Issue: 09 | Sep 2022

p-ISSN: 2395-0072

www.irjet.net

Free Writing - Grammatical Error Correction System: Sequence Tagging Shashank1, Shetty Shreyas Udaya2, Sumukha N Shilge3, Mohammed Yasir 4 Mrs. Sreevidya B S5 12345Department

of Information Science and Engineering, Dayananda Sagar College of Engineering ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - This paper presents a grammatical error

would predict the edits {Capitalize token 1, Replace (had) to token 2, Copy token 3, Copy token 4}

correction (GEC) system that provides suggestions to users to make incorrect sentences to correct. Sequence tagging is a core Information extraction task in which words (or phrases) are classified using a predefined label set. The model is pretrained on synthetically generated grammatical errors and trained on National University of Singapore Corpus of Learner English (NUCLE) (Dahlmeier et al., 2013), Lang-8 Corpus of Learner English (Lang-8) (Tajiri et al., 2012), FCE dataset (Yannakoudakis et al., 2011), the publicly available part of the Cambridge Learner Corpus (Nicholls, 2003) and Write & Improve + LOCNESS Corpus (Bryant et al., 2019). Evaluating on CoNLL2014 test set (Ng et al., 2014) evaluated by official M2 scorer (Dahlmeier and Ng, 2012), and on BEA-2019 dev and test sets evaluated by ERRANT.

Second, we try to construct the correct sentence from those tags predicted. Each token will have the prediction token which will then apply to the tokens to form the correct sentence. From the above example “i will change to capital case I” and have will change to had. Third, it improves the inference power of the parallel model by repeatedly feeding the output of the model itself for further improvement.

2.The GEC System In this section, we present Free-Writing, a web-based system where users can write their essays and get suggestions to correct them. The user can ignore or apply suggestions to sentences to correct. The correction process is divided into four steps. First, we use BertTokenizer to tokenize input sentences. Second, the tokenized sentences are fed into the Bert model for inference. Third, predicted tags are converted to suggestion tokens . Finally, to give easy to-read feedback, we convert the result into an informative visual expression instead of prediction tokens.

Key Words: NLP, Sequence tagging, transformers, seq2seq 1. INTRODUCTION A neural machine translation (NMT) -based approach has emerged as the recommended method for grammatical error correction (GEC) tasks. In this formulation, incorrect statements correspond to the source language, and errorfree statements correspond to the target language. Recently, Transformer-based (Vaswani et al., 2017) inter-sequence (seq2seq) models achieved cutting-edge performance in standard GEC benchmarks (Bryant et al., 2019). Currently, the focus of research is shifting to the generation of synthetic data for pre-training Transformer NMT-based GEC systems. NMT-based GEC systems have some problems and are inconvenient to use in real-world deployment: (i) slow inference speed, (ii) requires a large amount of training data, and (iii) interpretability and explainability; NMT-based GEC system requires the additional function to explain the correctness of sentence, e.g., grammatical error type classification. First, instead of generating a complete correct sentence from the incorrect sentence. we output edits such as copy, appends, deletes, replacements, and case-changes which generalize the task to a small size of vocabulary, unlike the NTM-based GEC system which needs a large vocabulary to generate a complete sentence. Suppose in GEC we have an input sentence: “i have dinner yesterday”. Existing seq2seq learning approaches would need to output the four tokens I had dinner yesterday from a word vocabulary whereas we

© 2022, IRJET

|

Impact Factor value: 7.529

Fig.1 Free Writing Web Interface for suggestion

|

ISO 9001:2008 Certified Journal

|

Page 415


Turn static files into dynamic content formats.

Create a flipbook