Design & Implementation of a DNA Compression Algorithm

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 04 Issue: 07 | July -2017

p-ISSN: 2395-0072

www.irjet.net

Design & Implementation of a DNA Compression Algorithm Manju Rani 1, Pawan Kumar Mishra 2 1M.Tech

Scholar, Uttarakhand Technical University, Dehradun, Uttarakhand, India Professor, Uttarakhand Technical University, Dehradun, Uttarakhand, India ---------------------------------------------------------------------***--------------------------------------------------------------------2Assistant

Abstract - If we talk about the DNA arrangements, we

shaping its vertical sidepieces. DNA has a unique property of multiplying or replicating.

understand that it oversees only four pictures addressing four nucleotide bases {A, C, T, G}. these four pictures could have been shown as {00, 01, 10, 11} independently, where we can watch that every nucleotide base having 8 bit is made to include 2 bits, when encoded in the already said parallel shape. This could have been a champion among the most capable encoding designs, if and only if there were the same pictures in the plan, other than A, G, T and C base characters. Here, however the encoding ought to be conceivable, yet essential issue will occur in the midst of decompression as the twofold code of the unanticipated picture like N or S will organize with the matched code of A, G, T and C. An another sort of figuring used for DNA weight is Differential Direct (2D) Coding Algorithm, which can vanquish this issue by isolating between the base characters and the astounding pictures. The 2D coding computation uses the social occasion of three characters (triplets), being supplanted by some other character [28].

With the increase in size of the databases containing the nucleotide arrangements, which are utilized as a part of seeking applications to find successions homologous to an inquiry grouping, the need of pressure methods have happened. It is important to store information minimally with the goal that it can be exchanged effortlessly. Furthermore, groupings can be gotten to autonomously. Moreover, the circle costs are regularly bottleneck in seeking additionally [10, 11]. Compression rate is the measurement of the reduction in size of the original file. There are four main methods of measuring the compression rate . The first one known as Bit per Byte or bpb refers to the replacement of one byte (particularly the collection of 8 bits) by less than 8 bits. It can be formulated as follows: (compressed length / original length) * 8. If a file of 800 bytes has been compressed to a file of 200 bytes, the compression will be - (200/800)*8=2 bpb. The second method is measuring of compression in terms of percentage, which can be formulated as (compressed length / original length) * 100. If a file of 800 bytes has been compressed to a file of 200 bytes, the compression will be (200/800)*100= 25% of the original file. Third method can be representation in ratio form, which is (original size: compressed size). This is a general representation technique and is widely used. But it has low precision. i.e ( 4: 1) or (3:1) Bit per Char is another technique. It is same like bpb in some cases only and it cannot be used to compress binary files.

Key Words: DNA Compression, Nucleotide Sequence Compression, Look Up Table, Compression Algorithm, Lossless Compression

1.INTRODUCTION Compression is a technique to reduce the size of some data by lowering the number of bits used in its formation. In other words, we can say compression means reduction of size of data by changing it to a format which requires fewer bits than the original format [1].

2. Literature survey

DNA is a contraction for deoxyribonucleic corrosive, which conveys genetic data. A large portion of the DNA's found in people are same which in gathered in cell core, with the exception of some, which are found in cell's mitochondria. The previous one is known as atomic DNA, while the last one is mt DNA. There are four distinct sorts of nucleotides found in DNA, contrasting just in the nitrogenous base. They are: A, G, T, C. there are about 3 billion bases of human DNA, out of which 99% are same in all people [9]. These bases of these groupings are essential characterizing parts of organisms. In DNA bonds up with T and C bonds up with G, forming base pairs. The twofold helix structure of a DNA contains nucleotides (mix of phosphate particle, sugar atom and base combine). Twofold helix structure resembles a step, with base framing its rungs and sugar and phosphate atoms

Š 2017, IRJET

|

Impact Factor value: 5.181

Chen,X. et al. (2000) characterized Differential Direct Coding (2D) likewise isolates the grouping into components of length three . It suggests that pressure methodologies must suit vast informational collections, comprise of various arrangements and helper information. The arrangement of expected images for the 2D show are {A, T, G, C, and U}, which expels the weight of express assertion of succession sort like DNA or RNA. Adjeroh,D. et al. (2002) depicted the examine disconnected word reference arranged ways to deal with DNA succession pressure, in view of the Burrows-Wheeler Transform (BWT).

|

ISO 9001:2008 Certified Journal

| Page 3073


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.