International Research Journal of Engineering and Technology (IRJET) Volume: 04 Issue: 02 | Feb -2017
www.irjet.net
e-ISSN: 2395 -0056 p-ISSN: 2395-0072
Dictionary Based Approaches in Protein Name Recognition Annalakshmi V1, Bhuvaneswari V2, Aruna L3 1Assistant
Professor, Dept. of Computer Science, Jayaraj Annapckiam College for Women (Autonomous), Periyakulam-625 601, Tamil Nadu, India 2Assistant Professor School of Computer Science and Engineering, Bharathiar University Coimbatore-641 046, Tamil Nadu, India 3Assistant Professor, Dept. of Computer Science, Jayaraj Annapckiam College for Women (Autonomous), Periyakulam-625 601, Tamil Nadu, India 1annalakshmivmca@gmail.com 2bhuvanesh_v@yahoo.com 3arunaswarni@gmail.com -------------------------------------------------------------------------------***-------------------------------------------------------------------------------------------
Abstract—Bioinformatics is the science of organizing
manipulating, analyzing, and transmitting huge quantities of data. Bioinformatics and data mining provide exciting and challenging researches in several application areas especially in computer science. Bioinformatics is the science of managing, mining and interpreting information from biological sequences and structures [2].
and analyzing biological data. Identifying protein/gene name from Medline abstracts is an important task in the biomedical literature. Constructing the protein/gene name dictionary is a major task of the biological literature. Protein names are mentioned in terms of gene symbol, protein names, synonyms, gene name and typographical variants. Dictionary based approaches normalize gene and protein names, reducing many synonyms and phrases representing the same concept to a single identifier for that protein/gene. Protein names are identified from the dataset by using the capital letters, Arabic numerals, Roman alphabets, Roman numerals and frequent words appearing in protein names. In our work we have proposed a method to identify protein/gene name using regular expression to construct dictionary.
Text mining is the process of searching, collecting and deriving high-quality useful material from text sources. It involves setting up patterns in text files, deriving rule patterns, applying them to the text, and producing the output as meaningful information. Most information on the Web is not numeric data but text. So, text mining is a very useful technique to discover customer information from the unstructured text. Text Mining describes the automated process of analyzing natural language text with the goal of discovering information and knowledge. A number of terms describe specific aspects of automatic text analysis:
Keywords— Bioinformatics, Text Mining, Gene, Protein, MEDLINE Abstract.
Bioinformatics is part of the larger science of computational biology. Computational biology is the application of quantitative analytical techniques in modeling and solving problems in the biological systems bioinformatics is a broad term covering the use of computer algorithms to analyze biological data. A gene is a basic unit of heredity in a living organism. It is normally a stretch of DNA (Deoxyribo Nucleic Acid) that codes for a type of protein or for an RNA (Ribo Nucleic Acid) chain that has a function in the organism. All proteins and functional RNA chains are specified by genes. Protein is a long chain molecule made up of amino acids joined by peptide bonds. Protein forms the structural material of
1.INTRODUCTION
Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses [1]. Data mining techniques are the result of a long process of research and product development. Data mining is a component of a wider process called Knowledge discovery from databases. Bioinformatics is the science of organizing and analyzing biological data that involves collecting,
© 2017, IRJET
|
Impact Factor value: 5.181
|
ISO 9001:2008 Certified Journal
|
Page 94