Download ebooks file Pharmaceutical biotechnology concepts and applications 1st edition gary walsh a by victorvillarreal1672

BiotechnologyConceptsand Applications1stEditionGaryWalsh

https://ebookfinal.com/download/pharmaceuticalbiotechnology-concepts-and-applications-1st-edition-garywalsh/

Biopharmaceuticals Biochemistry and Biotechnology 2nd Edition Gary Walsh

https://ebookfinal.com/download/biopharmaceuticals-biochemistry-andbiotechnology-2nd-edition-gary-walsh/

ebookfinal.com

Environmental Biotechnology Concepts and Applications 6th ed Edition Josef Winter

https://ebookfinal.com/download/environmental-biotechnology-conceptsand-applications-6th-ed-edition-josef-winter/

ebookfinal.com

Industrial Pharmaceutical Biotechnology 1st Edition

Heinrich Klefenz

https://ebookfinal.com/download/industrial-pharmaceuticalbiotechnology-1st-edition-heinrich-klefenz/

ebookfinal.com

Handbook of Pharmaceutical Biotechnology 1st Edition

Shayne Cox Gad

https://ebookfinal.com/download/handbook-of-pharmaceuticalbiotechnology-1st-edition-shayne-cox-gad/

ebookfinal.com

Post translational Modification of Protein

Biopharmaceuticals 1st Edition Gary Walsh

https://ebookfinal.com/download/post-translational-modification-ofprotein-biopharmaceuticals-1st-edition-gary-walsh/

ebookfinal.com

Biocatalysis in the Pharmaceutical and Biotechnology Industries 1st Edition Ramesh N. Patel

https://ebookfinal.com/download/biocatalysis-in-the-pharmaceuticaland-biotechnology-industries-1st-edition-ramesh-n-patel/

ebookfinal.com

Pharmaceutical Bioassays Methods and Applications 1st Edition Shiqi Peng

https://ebookfinal.com/download/pharmaceutical-bioassays-methods-andapplications-1st-edition-shiqi-peng/

ebookfinal.com

Peptide Applications in Biomedicine Biotechnology and Bioengineering 1st Edition Sotirios Koutsopoulos

https://ebookfinal.com/download/peptide-applications-in-biomedicinebiotechnology-and-bioengineering-1st-edition-sotirios-koutsopoulos/

ebookfinal.com

Pairing in Fermionic Systems Basics Concepts and Modern Applications Basic Concepts and Modern Applications Armen Sedrakian

https://ebookfinal.com/download/pairing-in-fermionic-systems-basicsconcepts-and-modern-applications-basic-concepts-and-modernapplications-armen-sedrakian/

ebookfinal.com

Pharmaceutical Biotechnology

Concepts and Applications

Gary Walsh

University of Limerick, Republic of Ireland

Pharmaceutical Biotechnology

Concepts and Applications

Gary Walsh

University of Limerick, Republic of Ireland

Telephone (+44) 1243 779777

Email (for orders and customer service enquiries): cs-books@wiley.co.uk Visit ou r Home Page on www wileyeu rope com or www wiley com

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to permreq@wiley.co.uk, or faxed to ( 44) 1243 770620.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The Publisher is not associated with any product or vendor mentioned in this book.

This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Other Wiley Editorial Ofﬁ ces

John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA

Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA

Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany

John Wiley & Sons Australia Ltd, 42 McDougall Street, Milton, Queensland 4064, Australia

John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809

John Wiley & Sons Canada Ltd, 6045 Freemont Blvd, Mississauga, Ontario, L5R 4J3

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Anniversary Logo Design: Richard J. Paciﬁco

Library of Congress Cataloging-in-Publication Data

Walsh, Gary, Dr.

Pharmaceutical biotechnology : concepts and applications / Gary Walsh. p. ; cm.

Includes bibliographical references. ISBN 978-0-470-01244-4 (cloth)

1. Pharmaceutical biotechnology. I. Title. [DNLM: 1. Technology, Pharmaceutical. 2. Biotechnology. 3. Pharmaceutical Preparations. QV 778 W224p 2007]

RS380.W35 2007 615 .19–dc22 2007017884

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

ISBN 978-0-470-01244-4 (HB)

ISBN 978-0-470-01245-1 (PB)

Typeset in 10.5/12.5 pt Times by Thomson Digital

Printed and bound in Great Britain by Antony Rowe Ltd., Chippenham, Wilts

This book is printed on acid-free paper responsibly manufactured from sustainable forestry in which at least two trees are planted for each one used for paper production.

I dedicate this book to my beautiful daughter Alice. To borrow a phrase: ‘without her help, it would have been written in half the time’!

2.5.1

3.4.1

5.1

4.14.3

4.14.4

4.14.5

4.14.6

5.2.1

5.2.2

5.2.3 Additional production systems

5.2.3.1 Yeast

5.2.3.2 Fungal production systems

5.2.3.3 Transgenic animals

5.2.3.4 Transgenic plants

5.2.3.5 Insect cell-based systems

5.3 Upstream processing

5.3.1 Cell banking systems

5.3.2 Microbial cell fermentation

5.3.3 Mammalian cell culture systems

6 Downstream processing

6.1 Introduction

6.2 Initial product recovery

6.3 Cell disruption

6.4 Removal of nucleic acid

6.5 Initial product concentration

6.5.2 Diaﬁltration

6.6 Chromatographic puriﬁcation

6.6.1 Size-exclusion chromatography (gel ﬁltration)

6.6.2 Ion-exchange chromatography

6.6.3 Hydrophobic interaction chromatography

6.6.4 Afﬁnity chromatography

6.6.5 Immunoafﬁnity puriﬁcations

6.6.6 Protein A chromatography

6.6.7 Lectin afﬁnity chromatography 150

6.6.8 Dye afﬁnity chromatography 152

6.6.9 Metal chelate afﬁnity chromatography 153

6.6.10 Chromatography on hydroxyapatite 154

6.6.11 Chromatofocusing 155

6.7 High-performance liquid chromatography of proteins 155

6.8 Puriﬁcation of recombinant proteins 157

6.9 Final product formulation 159

6.9.1 Some inﬂuences that can alter the biological activity of proteins 159

6.9.1.1 Proteolytic degradation and alteration of sugar side-chains 160

6.9.1.2 Protein deamidation 161

6.9.1.3 Oxidation and disulﬁde exchange 162

6.9.2 Stabilizing excipients used in ﬁnal product formulations 164

6.9.3 Final product ﬁll 166

6.9.4 Freeze-drying 168

6.9.5 Labelling and packing 169 Further reading 171

7 Product analysis

7.1 Introduction 173

7.2 Protein-based contaminants 173

7.3 Removal of altered forms of the protein of interest from the product stream 175

7.3.1 Product potency 175

7.3.2 Determination of protein concentration

7.4 Detection of protein-based product impurities

7.4.1 Capillary electrophoresis

7.4.2 High-performance liquid chromatography

7.4.3 Mass spectrometry

7.5 Immunological approaches to detection of contaminants

7.5.1 Amino acid analysis

7.5.2 Peptide mapping

7.5.3 N-terminal sequencing

7.5.4 Analysis of secondary and tertiary structure

7.6 Endotoxin and other pyrogenic contaminants

7.6.1 Endotoxin, the molecule

7.6.2 Pyrogen detection

7.6.3 DNA

7.6.4 Microbial and viral contaminants

7.6.5 Viral assays

7.6.6 Miscellaneous contaminants

7.6.7

8 The cytokines: The interferon family

8.1 Cytokines

8.1.1 Cytokine receptors

8.1.2 Cytokines as biopharmaceuticals

8.2 The interferons

8.2.3 Interferon- γ

8.2.4

8.2.5

8.2.6

8.2.7 The interferon JAK–STAT pathway

8.2.9 The eIF-2α protein kinase system

8.3 Interferon biotechnology

8.3.1 Production and medical uses of interferon- α

8.3.2 Medical uses of interferon- β

8.3.3 Medical applications of interferon- γ

8.3.4 Interferon toxicity

8.3.5

8.4 Conclusion

9 Cytokines: Interleukins and tumour necrosis factor

9.1 Introduction

9.2 Interleukin-2

9.2.1 Interleukin-2 production

9.2.2 Interleukin-2 and cancer treatment

9.2.3 Interleukin-2 and infectious diseases

9.2.4

9.2.5 Inhibition of interleukin-2 activity

9.3 Interleukin-1

9.3.1 The biological activities of interleukin-1

9.3.2 Interleukin-1 biotechnology

9.4 Interleukin-11

9.5 Tumour necrosis factors

9.5.1 Tumour necrosis factor biochemistry

9.5.2 Biological activities of tumour necrosis factor- α

9.5.3 Immunity and inﬂammation

9.5.4 Tumour necrosis factor receptors

9.5.5 Tumour necrosis factor: therapeutic aspects

10 Growth factors

10.1 Introduction

Haematopoietic growth factors

The interleukins as haemopoietic growth factors

Granulocyte colony-stimulating factor

Macrophage colony-stimulating factor

10.2.4 Granulocyte macrophage colony-stimulating factor

10.2.5 Clinical application of colony-stimulating factors

Erythropoietin

10.2.6.1 Therapeutic applications of erythropoietin

Chronic disease and cancer chemotherapy

11 Therapeutic hormones 291

11.5 The gonadotrophins

11.5.1 Follicle-stimulating hormone, luteinizing hormone and human chorionic gonadotrophin

11.5.2 Pregnant mare serum gonadotrophin

11.5.3 The inhibins and activins

11.6 Medical and veterinary applications of gonadotrophins

11.6.1 Sources and medical uses of follicle-stimulating hormone, luteinizing hormone and human chorionic gonadotrophin

11.6.2 Recombinant gonadotrophins

Veterinary uses of gonadotrophins

11.7 Additional recombinant hormones now

12

12.1

blood products and therapeutic enzymes

12.3

12.4

12.2.1

12.2.2

12.2.3

12.2.4

12.2.5

12.2.6

12.3.1

12.4.1

12.4.4

12.4.5

12.4.6

12.4.7 α

12.4.8

12.5 Enzymes of

12.5.1 Asparaginase

12.5.2

12.5.3 Glucocerebrosidase

12.5.4 α -Galactosidase, urate oxidase and laronidase

12.5.5 Superoxide dismutase

12.5.6 Debriding agents

13 Antibodies, vaccines and adjuvants

13.1

13.2

13.3 Monoclonal

13.3.1

13.3.2

13.3.3 Tumour immunology

13.3.3.1 Antibody-based strategies for tumour detection/destruction

13.3.3.2 Drug-based tumour immunotherapy

13.3.3.3 First-generation anti-tumour antibodies: clinical disappointment

13.3.4 Tumour-associated antigens

13.3.5 Antigenicity of murine monoclonals

13.3.6 Chimaeric and humanized antibodies

13.3.7 Antibody fragments

13.3.8 Additional therapeutic applications of monoclonal antibodies

13.4 Vaccine technology

13.4.1 Traditional vaccine preparations

13.4.1.1 Attenuated, dead or inactivated bacteria

13.4.1.2 Attenuated and inactivated viral vaccines

13.4.1.3 Toxoids and antigen-based vaccines

13.4.2 The impact of genetic engineering on vaccine technology

13.4.3 Peptide vaccines

13.4.4 Vaccine vectors

13.4.5 Development of an AIDS vaccine

13.5

13.5.3 Oil-based emulsion adjuvants

13.5.4 Bacteria/bacterial

14.3.4

14.4

14.5

14.6

14.7.1

14.7.2

14.8

Preface

This book has been written as a sister publication to Biopharmaceuticals: Biochemistry and Biotechnology, a second edition of which was published by John Wiley and Sons in 2003. The latter textbook caters mainly for advanced undergraduate/postgraduate students undertaking degree programmes in biochemistry, biotechnology and related disciplines. Such students have invariably pursued courses/modules in basic protein science and molecular biology in the earlier parts of their degree programmes; hence, the basic principles of protein structure and molecular biology were not considered as part of that publication. This current publication is specifically tailored to meet the needs of a broader audience, particularly to include students undertaking programmes in pharmacy/pharmaceutical science, medicine and other branches of biomedical/clinical sciences. Although evolving from Biopharmaceuticals: Biochemistry and Biotechnology, its focus is somewhat different, reflecting its broader intended readership. This text, therefore, includes chapters detailing the basic principles of protein structure and molecular biology. It also increases/extends the focus upon topics such as formulation and delivery of biopharmaceuticals, and it contains numerous case studies in which both biotech and clinical aspects of a particular approved product of pharmaceutical biotechnology are overviewed. The book, of course, should also meet the needs of students undertaking programmes in core biochemistry, biotechnology or related scientific areas and be of use as a broad reference source to those already working within the pharmaceutical biotechnology sector.

As always, I owe a debt of gratitude to the various people who assisted in the completion of this textbook. Thanks to Sandy for her help in preparing various ﬁgures, usually at ridiculously short notice. To Gerard Wall, for all the laughs and for several useful discussions relating to molecular biology. Thank you to Nancy, my beautiful wife, for accepting my urge to write (rather than to change baby’s nappies) with good humour – most of the time anyway! I am also grateful to the staff of John Wiley and Sons for their continued professionalism and patience with me when I keep overrunning submission deadlines. Finally, I have a general word of appreciation to all my colleagues at the University of Limerick for making this such an enjoyable place to work.

Gary Walsh November 2006

Acronyms

ADCC antibody-dependent cell cytoxicity

BAC bacterial artiﬁcial chromosome

BHK baby hamster kidney

cDNA complementary DNA

CHO Chinese hamster ovary

CNTF ciliary neurotrophic factor

CSF colony-stimulating factor

dsRNA double-stranded RNA

EDTA ethylenediaminetetraacetic acid

ELISA enzyme-linked immunosorbent assay

EPO erythropoietin

FGF ﬁbroblast growth factor

FSH follicle-stimulating hormone

GDNF glial cell-derived neurotrophic factor

GH growth hormone

hCG human chorionic gonadotrophin

HIV human immunodeﬁciency virus

HPLC high-performance liquid chromatography

IGF insulin-like growth factor

ISRE interferon-stimulated response element

JAK Janus kinase

LAF lymphocyte activating factor

LIF leukaemia inhibitory factor

LPS lipopolysaccharide

MHC major histocompatibility complex

MPS mucopolysaccharidosis

mRNA messenger RNA

PDGF platelet-derived growth factor

PEG polyethylene glycol

PTK protein tyrosine kinase

PTM post-translational modiﬁcation

rDNA recombinant DNA

RNAi RNA interference

rRNA ribosomal RNA

SDS sodium dodecyl sulfate

ssRNA single-stranded RNA

STATs signal transducers and activators of transcription

TNF tumour necrosis factor

tPA tissue plasminogen activator

tRNA transfer RNA

WAP whey acid protein

WFI water for injections

1 Pharmaceuticals, biologics and biopharmaceuticals

1.1 Introduction to pharmaceutical products

Pharmaceutical substances form the backbone of modern medicinal therapy. Most traditional pharmaceuticals are low molecular weight organic chemicals (Table 1.1). Although some (e.g. aspirin) were originally isolated from biological sources, most are now manufactured by direct chemical synthesis. Two types of manufacturing company thus comprise the ‘traditional’ pharmaceutical sector: the chemical synthesis plants, which manufacture the raw chemical ingredients in bulk quantities, and the ﬁnished product pharmaceutical facilities, which purchase these raw bulk ingredients, formulate them into ﬁnal pharmaceutical products, and supply these products to the end user.

In addition to chemical-based drugs, a range of pharmaceutical substances (e.g. hormones and blood products) are produced by/extracted from biological sources. Such products, some major examples of which are listed in Table 1.2, may thus be described as products of biotechnology. In some instances, categorizing pharmaceuticals as products of biotechnology or chemical synthesis becomes somewhat artiﬁcial. For example, certain semi-synthetic antibiotics are produced by chemical modiﬁcation of natural antibiotics produced by fermentation technology.

1.2 Biopharmaceuticals and pharmaceutical biotechnology

Terms such as ‘biologic’, ‘biopharmaceutical’ and ‘products of pharmaceutical biotechnology’ or ‘biotechnology medicines’ have now become an accepted part of the pharmaceutical literature. However, these terms are sometimes used interchangeably and can mean different things to different people. Although it might be assumed that ‘biologic’ refers to any pharmaceutical product produced by biotechnological endeavour, its deﬁnition is more limited. In pharmaceutical circles, ‘biologic’ generally refers to medicinal products derived from blood, as well as vaccines, toxins and allergen products. ‘Biotechnology’ has a much broader and long-established meaning. Essentially, it refers

Table 1.1 Some traditional pharmaceutical substances that are generally produced by direct chemical synthesis

DrugMolecular formulaMolecular massTherapeutic indication

Acetaminophen (paracetamol)

C8H9NO2

151.16Analgesic

KetamineC13H16C/NO237.74Anaesthetic

LevamisoleC11H12N2S204.31Anthelmintic

DiazoxideC8H7C/N2O2S230.7Antihypertensive

AcyclovirC8H11N5O3

ZidovudineC10H13N5O4

DexamethasoneC22H 29FO5

MisoprostolC22H38O5

CimetidineC10H16N6

225.2Antiviral agent

267.2Antiviral agent

392.5Anti-inﬂammatory and immunosuppressive agent

382.5Anti-ulcer agent

252.3Anti-ulcer agent

to the use of biological systems (e.g. cells or tissues) or biological molecules (e.g. enzymes or antibodies) for/in the manufacture of commercial products.

The term ‘biopharmaceutical’ was ﬁrst used in the 1980s and came to describe a class of therapeutic proteins produced by modern biotechnological techniques, speciﬁcally via genetic engineering (Chapter 3) or, in the case of monoclonal antibodies, by hybridoma technology (Chapter 13). Although the majority of biopharmaceuticals or biotechnology products now approved or in development are proteins produced via genetic engineering, these terms now also encompass nucleic-acid-based, i.e. deoxyribonucleic acid (DNA)- or ribonucleic acid (RNA)-based products, and whole-cell-based products.

1.3 History of the pharmaceutical industry

The pharmaceutical industry, as we now know it, is barely 60 years old. From very modest beginnings, it has grown rapidly, reaching an estimated value of US$100 billion by the mid 1980s. Its current value is likely double or more this ﬁgure. There are well in excess of 10 000 pharmaceutical companies in existence, although only about 100 of these can claim to be of true international signiﬁcance. These companies manufacture in excess of 5000 individual pharmaceutical substances used routinely in medicine.

Table 1.2 Some pharmaceuticals that were traditionally obtained by direct extraction from biological source material. Many of the protein-based pharmaceuticals mentioned are now also produced by genetic engineering

SubstanceMedical application

Blood products (e.g. coagulation factors)Treatment of blood disorders such as haemophilia A or B

VaccinesVaccination against various diseases

AntibodiesPassive immunization against various diseases

InsulinTreatment of diabetes mellitus

EnzymesThrombolytic agents, digestive aids, debriding agents (i.e. cleansing of wounds)

AntibioticsTreatment against various infections agents

Plant extracts (e.g. alkaloids)Various, including pain relief

The ﬁ rst stages of development of the modern pharmaceutical industry can be traced back to the turn of the twentieth century. At that time (apart from folk cures), the medical community had at their disposal only four drugs that were effective in treating speciﬁc diseases:

Digitalis (extracted from foxglove) was known to stimulate heart muscle and, hence, was used to treat various heart conditions.

Quinine, obtained from the barks/roots of a plant (Cinchona genus), was used to treat malaria.

Pecacuanha (active ingredient is a mixture of alkaloids), used for treating dysentery, was obtained from the bark/roots of the plant genus Cephaelis.

Mercury, for the treatment of syphilis.

This lack of appropriate, safe and effective medicines contributed in no small way to the low life expectancy characteristic of those times.

Developments in biology (particularly the growing realization of the microbiological basis of many diseases), as well as a developing appreciation of the principles of organic chemistry, helped underpin future innovation in the ﬂedgling pharmaceutical industry. The successful synthesis of various artiﬁcial dyes, which proved to be therapeutically useful, led to the formation of pharmaceutical/chemical companies such as Bayer and Hoechst in the late 1800s. Scientists at Bayer, for example, succeeded in synthesizing aspirin in 1895.

Despite these early advances, it was not until the 1930s that the pharmaceutical industry began to develop in earnest. The initial landmark discovery of this era was probably the discovery, and chemical synthesis, of the sulfa drugs. These are a group of related molecules derived from the red dye prontosil rubrum . These drugs proved effective in the treatment of a wide variety of bacterial infections (Figure 1.1). Although it was first used therapeutically in the early 1920s, large-scale industrial production of insulin also commenced in the 1930s.

The medical success of these drugs gave new emphasis to the pharmaceutical industry, which was boosted further by the commencement of industrial-scale penicillin manufacture in the early 1940s. Around this time, many of the current leading pharmaceutical companies (or their forerunners) were founded. Examples include Ciba Geigy, Eli Lilly, Wellcome, Glaxo and Roche. Over the next two to three decades, these companies developed drugs such as tetracyclines, corticosteroids, oral contraceptives, antidepressants and many more. Most of these pharmaceutical substances are manufactured by direct chemical synthesis.

1.4 The age of biopharmaceuticals

Biomedical research continues to broaden our understanding of the molecular mechanisms underlining both health and disease. Research undertaken since the 1950s has pinpointed a host of proteins produced naturally in the body that have obvious therapeutic applications. Examples include the interferons and interleukins (which regulate the immune response), growth factors, such as erythropoietin (EPO; which stimulates red blood cell production), and neurotrophic factors (which regulate the development and maintenance of neural tissue).

Figure 1.1 Sulfa drugs and their mode of action. The first sulfa drug to be used medically was the red dye prontosil rubrum (a). In the early 1930s, experiments illustrated that the administration of this dye to mice infected with haemolytic streptococci prevented the death of the mice. This drug, although effective in vivo, was devoid of in vitro antibacterial activity. It was first used clinically in 1935 under the name Streptozon. It was subsequently shown that prontosil rubrum was enzymatically reduced by the liver, forming sulfanilamide, the actual active antimicrobial agent (b). Sulfanilamide induces its effect by acting as an anti-metabolite with respect to para -aminobenzoic acid (PABA) (c). PABA is an essential component of tetrahydrofolic acid (THF) (d). THF serves as an essential cofactor for several cellular enzymes. Sulfanilamide (at sufficiently high concentrations) inhibits manufacture of THF by competing with PABA. This effectively inhibits essential THF-dependent enzyme reactions within the cell. Unlike humans, who can derive folates from their diets, most bacteria must synthesize it de novo, as they cannot absorb it intact from their surroundings

Although the pharmaceutical potential of these regulatory molecules was generally appreciated, their widespread medical application was in most cases rendered impractical due to the tiny quantities in which they were naturally produced. The advent of recombinant DNA technology (genetic engineering) and monoclonal antibody technology (hybridoma technology) overcame many such difﬁculties, and marked the beginning of a new era of the pharmaceutical sciences.

Recombinant DNA technology has had a fourfold positive impact upon the production of pharmaceutically important proteins:

Prontosil rubrum (a) Sulphanilamide (b) PABA (c)

Pteridine derivative PABA Glutamic acid

Tetrahydrofolic acid (d)

It overcomes the problem of source availability. Many proteins of therapeutic potential are produced naturally in the body in minute quantities. Examples include interferons (Chapter 8), interleukins (Chapter 9) and colony-stimulating factors (CSFs; Chapter 10). This rendered impractical their direct extraction from native source material in quantities sufﬁcient to meet likely clinical demand. Recombinant production (Chapters 3 and 5) allows the manufacture of any protein in whatever quantity it is required.

It overcomes problems of product safety. Direct extraction of product from some native biological sources has, in the past, led to the unwitting transmission of disease. Examples include the transmission of blood-borne pathogens such as hepatitis B and C and human immunodeﬁciency virus (HIV) via infected blood products and the transmission of Creutzfeldt–Jakob disease to persons receiving human growth hormone (GH) preparations derived from human pituitaries.

It provides an alternative to direct extraction from inappropriate/dangerous source material. A number of therapeutic proteins have traditionally been extracted from human urine. Folliclestimulating hormone (FSH), the fertility hormone, for example, is obtained from the urine of postmenopausal women, and a related hormone, human chorionic gonadotrophin (hCG), is extracted from the urine of pregnant women (Chapter 11). Urine is not considered a particularly desirable source of pharmaceutical products. Although several products obtained from this source remain on the market, recombinant forms have now also been approved. Other potential biopharmaceuticals are produced naturally in downright dangerous sources. Ancrod, for example, is a protein displaying anti-coagulant activity (Chapter 12) and, hence, is of potential clinical use. It is, however, produced naturally by the Malaysian pit viper. Although retrieval by milking snake venom is possible, and indeed may be quite an exciting procedure, recombinant production in less dangerous organisms, such as Escherichia coli or Saccharomycese cerevisiae, would be considered preferable by most.

It facilitates the generation of engineered therapeutic proteins displaying some clinical advantage over the native protein product. Techniques such as site-directed mutagenesis facilitate the logical introduction of predeﬁned changes in a protein’s amino acid sequence. Such changes can be as minimal as the insertion, deletion or alteration of a single amino acid residue, or can be more substantial (e.g. the alteration/deletion of an entire domain, or the generation of a novel hybrid protein). Such changes can be made for a number of reasons, and several engineered products have now gained marketing approval. An overview summary of some engineered product types now on the market is provided in Table 1.3. These and other examples will be discussed in subsequent chapters.

Despite the undoubted advantages of recombinant production, it remains the case that many protein-based products extracted directly from native source material remain on the market. In certain circumstances, direct extraction of native source material can prove equally/more attractive than recombinant production. This may be for an economic reason if, for example, the protein is produced in very large quantities by the native source and is easy to extract/purify, e.g. human serum albumin (HSA; Chapter 12). Also, some blood factor preparations puriﬁed from donor blood actually contain several different blood factors and, hence, can be used to treat several haemophilia patient types. Recombinant blood factor preparations, on the other hand, contain but a single blood factor and, hence, can be used to treat only one haemophilia type (Chapter 12).

The advent of genetic engineering and monoclonal antibody technology underpinned the establishment of literally hundreds of start-up biopharmaceutical (biotechnology) companies in

Table 1.3 Selected engineered biopharmaceutical types/products that have now gained marketing approval. These and additional such products will be discussed in detail in subsequent chapters

Product description/typeAlteration introducedRationale

Faster acting insulins (Chapter 11)Modiﬁed amino acid sequenceGeneration of faster acting insulin

Slow acting insulins (Chapter 11)Modiﬁed amino acid sequenceGeneration of slow acting insulin

Modiﬁed tissue plasminogen activator (tPA; Chapter 12)

Modiﬁed blood factor VIII (Chapter 12)

Chimaeric/humanized antibodies (Chapter 13)

Removal of three of the ﬁve native domains of tPA

Deletion of 1 domain of native factor VIII

Replacement of most/virtually all of the murine amino acid sequences with sequences found in human antibodies

‘Ontak’, a fusion protein (Chapter 9)Fusion protein consisting of the diphtheria toxin linked to interleukin-2 (IL-2)

Generation of a faster acting thrombolytic (clot degrading) agent

Production of a lower molecular mass product

Greatly reduced/eliminated immunogenicity. Ability to activate human effector functions

Targets toxin selectively to cells expressing an IL-2 receptor

the late 1970s and early 1980s. The bulk of these companies were founded in the USA, with smaller numbers of start-ups emanating from Europe and other world regions.

Many of these fledgling companies were founded by academics/technical experts who sought to take commercial advantage of developments in the biotechnological arena. These companies were largely financed by speculative monies attracted by the hype associated with the establishment of the modern biotech era. Although most of these early companies displayed significant technical expertise, the vast majority lacked experience in the practicalities of the drug development process (Chapter 4). Most of the well-established large pharmaceutical companies, on the other hand, were slow to invest heavily in biotech research and development. However, as the actual and potential therapeutic significance of biopharmaceuticals became evident, many of these companies did diversify into this area. Most either purchased small, established biopharmaceutical concerns or formed strategic alliances with them. An example was the long-term alliance formed by Genentech (see later) and the well-

Table 1.4 Pharmaceutical companies who manufacture and/or market biopharmaceutical products approved for general medical use in the USA and EU

Sanoﬁ-AventisHoechst AG BayerWyeth

Novo NordiskGenzyme Isis PharmaceuticalsAbbott GenentechRoche CentocorNovartis

Boehringer ManheimSerono Galenus ManheimOrganon

Eli LillyAmgen

Ortho BiotechGlaxoSmithKline

Schering PloughCytogen

Hoffman-la-RocheImmunomedics ChironBiogen

established pharmaceutical company Eli Lilly. Genentech developed recombinant human insulin, which was then marketed by Eli Lilly under the trade name Humulin. The merger of biotech capability with pharmaceutical experience helped accelerate development of the biopharmaceutical sector. Many of the earlier biopharmaceutical companies no longer exist. The overall level of speculative finance available was not sufficient to sustain them all long term (it can take 6–10 years and US$800 million to develop a single drug; Chapter 4). Furthermore, the promise and hype of biotechnology sometimes exceeded its ability actually to deliver a final product. Some biopharmaceutical substances showed little efficacy in treating their target condition, and/or exhibited unacceptable side effects. Mergers and acquisitions also led to the disappearance of several biopharmaceutical concerns. Table 1.4 lists many of the major pharmaceutical concerns which now manufacture/market biopharmaceuticals approved for general medical use. Box 1.1 provides a profile of three well-established dedicated biopharmaceutical companies.

Box 1.1

Amgen, Biogen and Genentech

Amgen, Biogen and Genentech represent three pioneering biopharmaceutical companies that still remain in business.

Founded in the 1980s as AMGen (Applied Molecular Genetics), Amgen now employs over 9000 people worldwide, making it one of the largest dedicated biotechnology companies in existence. Its headquarters are situated in Thousand Oaks, California, although it has research, manufacturing, distribution and sales facilities worldwide. Company activities focus upon developing novel (mainly protein) therapeutics for application in oncology, inﬂammation, bone disease, neurology, metabolism and nephrology. By mid 2006, seven of its recombinant products had been approved for general medical use (the EPO-based products ‘Aranesp’ and ‘Epogen’ (Chapter 10), the CSF-based products ‘Neupogen’ and ‘Neulasta’ (Chapter 10), as well as the interleukin-1 (IL-1) receptor antagonist ‘Kineret’, the anti-rheumatoid arthritis fusion protein Enbrel (Chapter 9) and the keratinocyte growth factor ‘Kepivance’, indicated for the treatment of severe oral mucositis. Total product sales for 2004 reached US$9.9 billion. In July 2002, Amgen acquired Immunex Corporation, another dedicated biopharmaceutical company founded in Seattle in the early 1980s.

Biogen was founded in Geneva, Switzerland, in 1978 by a group of leading molecular biologists. Currently, its global headquarters are located in Cambridge, MA, and it employs in excess of 2000 people worldwide. The company developed and directly markets the interferonbased product ‘Avonex’ (Chapter 8), but also generates revenues from sales of other Biogendiscovered products that are licensed to various other pharmaceutical companies. These include Schering Plough’s ‘Intron A’ (Chapter 8) and a number of hepatitis B-based vaccines sold by SmithKline Beecham (SKB) and Merck (Chapter 13).

Genentech was founded in 1976 by scientist Herbert Boyer and the venture capitalist Robert Swanson. Headquartered in San Francisco, it employs almost 5000 staff worldwide and has 10 protein-based products on the market. These include hGHs (Nutropin, Chapter 11), the antibody-based products ‘Herceptin’ and ‘Rituxan’ (Chapter 13) and the thrombolytic agents ‘Activase’ and ‘TNKase’ (Chapter 12). The company also has 20 or so products in clinical trials. In 2004, it generated some US$4.6 billion in revenues.

1.5 Biopharmaceuticals: current status and future prospects

Approximately one in every four new drugs now coming on the market is a biopharmaceutical. By mid 2006, some 160 biopharmaceutical products had gained marketing approval in the USA and/or EU. Collectively, these represent a global biopharmaceutical market in the region of US$35 billion (Table 1.5), and the market value is estimated to surpass US$50 billion by 2010. The products include a range of hormones, blood factors and thrombolytic agents, as well as vaccines and monoclonal antibodies (Table 1.6). All but two are protein-based therapeutic agents. The exceptions are two nucleic-acid-based products: ‘Vitravene’, an antisense oligonucleotide, and ‘Macugen’, an aptamer (Chapter 14). Many additional nucleic-acid-based products for use in gene therapy or antisense technology are in clinical trials, although the range of technical difﬁculties that still beset this class of therapeutics will ensure that protein-based products will overwhelmingly predominate for the foreseeable future (Chapter 14).

Many of the initial biopharmaceuticals approved were simple replacement proteins (e.g. blood factors and human insulin). The ability to alter the amino acid sequence of a protein logically coupled to an increased understanding of the relationship between protein structure and function (Chapters 2 and 3) has facilitated the more recent introduction of several engineered therapeutic proteins (Table 1.3). Thus far, the vast majority of approved recombinant proteins have been produced in the bacterium E. coli, the yeast S. cerevisiae or in animal cell lines (most notably Chinese hamster ovary (CHO) cells or baby hamster kidney (BHK) cells. These production systems are discussed in Chapter 5.

Although most biopharmaceuticals approved to date are intended for human use, a number of products destined for veterinary application have also come on the market. One early such example is that of recombinant bovine GH (Somatotrophin), which was approved in the USA in the early 1990s and used to increase milk yields from dairy cattle. Additional examples of approved veterinary biopharmaceuticals include a range of recombinant vaccines and an interferon-based product (Table 1.7).

Table 1.5 Approximate annual market values of some leading approved biopharmaceutical products. Data gathered from various sources, including company home pages, annual reports and industry reports

Product (Company)Product description (use)Annual sales value (US$, billions)

Procrit (Amgen/Johnson & Johnson) EPO (treatment of anaemia)4.0

Epogen & Aranesp combined (Amgen) EPO (treatment of anaemia)4.0

Intron A (Schering Plough)IFN- α (treatment of leukaemia)0.3

Remicade (Johnson & Johnson)Monoclonal antibody based (treatment of Crohn’s disease)

Avonex (Biogen)Interferon-β (IFN-β ; treatment of multiple sclerosis)

Embrel (Wyeth)Monoclonal antibody based (treatment of rheumatoid arthritis)

Rituxan (Genentech)Monoclonal antibody based (nonHodgkin’s lymphoma)

Humulin (Eli Lilly)Insulin (diabetes)1.0

Table 1.6 Summary categorization of biopharmaceuticals approved for general medical use in the EU and/or USA by 2006

Product typeExamplesNo. approvedRefer to

Blood factorsFactors VIII and IX8Chapter 12

Thrombolytic agentstPA6Chapter 12

HormonesInsulin, GH, gonadotrophins

33Chapter 11

Haematopoietic growth factors EPO, CSFs8Chapter 10

InterferonsIFN- α , -β, - γ

16Chapter 8

Interleukin-based products IL-23Chapter 9

VaccinesHepatitis B-surface antigen

20Chapter 13

Monoclonal antibodiesVarious30Chapter 13

Nucleic acid basedAntisense and aptamer2Chapter 14

Additional productsTumour necrosis factor (TNF), therapeutic enzymes

18Various chapters

At least 1000 potential biopharmaceuticals are currently being evaluated in clinical trials, although the majority of these are in early stage trials. Vaccines and monoclonal antibody-based products represent the two biggest product categories. Regulatory factors (e.g. hormones and

Table 1.7 Some recombinant (r) biopharmaceuticals recently approved for veterinary application in the EU ProductCompanyIndication

Vibragen Omega (r-feline interferon omega; IFN- ω )

Fevaxyl Pentafel (combination vaccine containing r-feline leukaemia viral antigen as one component)

Porcilis porcoli (combination vaccine containing r-E. coli adhesins)

Porcilis AR-T DF (combination vaccine containing a recombinant modiﬁed toxin from Pasteurella multocida)

Porcilis pesti (combination vaccine containing r-classical swine fever virus E2 subunit antigen)

Bayovac CSF E2 (combination vaccine containing r-classical swine fever virus E2 subunit antigen)

VirbacReduction of mortality/clinical symptoms associated with canine parvovirus

Fort Dodge LaboratoriesImmunization of cats against various feline pathogens

IntervetActive immunization of sows

IntervetReduction in clinical signs of progressive atrophic rhinitis in piglets

IntervetImmunization of pigs against classical swine fever

cytokines) and gene therapy and antisense-based products also represent signiﬁcant groupings. Although most protein-based products likely to gain marketing approval over the next 2–3 years will be produced in engineered E. coli, S. cerevisiae or animal cell lines, some products now in clinical trials are being produced in the milk of transgenic animals (Chapter 5). Additionally, plant-based transgenic expression systems may potentially come to the fore, particularly for the production of oral vaccines (Chapter 5).

Interestingly, the ﬁ rst generic biopharmaceuticals are already entering the market. Patent protection for many ﬁ rst-generation biopharmaceuticals (including recombinant human GH (rhGH), insulin, EPO, interferon- α (IFN- α) and granulocyte-CSF (G-CSF)) has now/is now coming to an end. Most of these drugs command an overall annual market value in excess of US$1 billion, rendering them attractive potential products for many biotechnology/pharmaceutical companies. Companies already/soon producing generic biopharmaceuticals include Biopartners (Switzerland), Genemedix (UK), Sicor and Ivax (USA), Congene and Microbix (Canada) and BioGenerix (Germany). Genemedix, for example, secured approval for sale of a recombinant CSF in China in 2001 and is also commencing the manufacture of recombinant EPO. Sicor currently markets hGH and IFN- α in eastern Europe and various developing nations. A generic hGH also gained approval in both Europe and the USA in 2006.

To date (mid 2006), no gene-therapy-based product has thus far been approved for general medical use in the EU or USA, although one such product (‘Gendicine’; Chapter 14) has been approved in China. Although gene therapy trials were initiated as far back as 1989, the results have been disappointing. Many technical difﬁculties remain in relation to, for example, gene delivery and regulation of expression. Product effectiveness was not apparent in the majority of trials undertaken and safety concerns have been raised in several trials.

Only one antisense-based product has been approved to date (in 1998) and, although several such antisense agents continue to be clinically evaluated, it is unlikely that a large number of such products will be approved over the next 3–4 years. Aptamers represent an additional emerging class of nucleic-acid-based therapeutic. These are short DNA- or RNA-based sequences that adopt a specific three-dimensional structure, enabling them to bind (and thereby inhibit) specific target molecules. One such product (Macugen) has been approved to date. RNA interference (RNAi) represents a yet additional mechanism of achieving downregulation of gene expression (Chapter 14). It shares many characteristics with antisense technology and, like antisense, provides a potential means of treating medical conditions triggered or exacerbated by the inappropriate overexpression of specific gene products. Despite the disappointing results thus far generated by nucleic-acid-based products, future technical advances will almost certainly ensure the approval of gene therapy and antisense-based products in the intermediate to longer term future.

Technological developments in areas such as genomics, proteomics and high-throughput screening are also beginning to impact signiﬁcantly upon the early stages of drug development (Chapter 4). By linking changes in gene/protein expression to various disease states, for example, these technologies will identify new drug targets for such diseases. Many/most such targets will themselves be proteins, and drugs will be designed/developed speciﬁcally to interact with. They may be protein based or (more often) low molecular mass ligands.

Additional future innovations likely to impact upon pharmaceutical biotechnology include the development of alternative product production systems, alternative methods of delivery and the development of engineered cell-based therapies, particularly stem cell therapy. As mentioned previously, protein-based biotechnology products produced to date are produced in either microbial

or in animal cell lines. Work continues on the production of such products in transgenic-based production systems, speciﬁcally either transgenic plants or animals (Chapter 5).

Virtually all therapeutic proteins must enter the blood in order to promote a therapeutic effect. Such products must usually be administered parenterally. However, research continues on the development of non-parenteral routes which may prove more convenient, less costly and obtain improved patient compliance. Alternative potential delivery routes include transdermal, nasal, oral and bucal approaches, although most progress to date has been recorded with pulmonary-based delivery systems (Chapter 4). An inhaled insulin product (‘Exubera’, Chapters 4 and 11) was approved in 2006 for the treatment of type I and II diabetes.

A small number of whole-cell-based therapeutic products have also been approved to date (Chapter 14). All contain mature, fully differentiated cells extracted from a native biological source. Improved techniques now allow the harvest of embryonic and, indeed, adult stem cells, bringing the development of stem-cell-based drugs one step closer. However, the use of stem cells to replace human cells or even entire tissues/organs remains a long term goal (Chapter 14). Overall, therefore, products of pharmaceutical biotechnology play an important role in the clinic and are likely to assume an even greater relative importance in the future.

Books

Crommelin, D. and Sindelar, R. 2002. Pharmaceutical Biotechnology, second edition. Taylor and Francis, London, UK.

Goldberg, R. 2001. Pharmaceutical Medicine, Biotechnology and European Law. Cambridge University Press. Grindley, J. and Ogden, J. 2000. Understanding Biopharmaceuticals. Manufacturing and Regulatory Issues. Interpharm Press.

Kayser, O. and Muller, RH. 2004. Pharmaceutical Biotechnology. Wiley VCH, Weinheim, Germany.

Oxender, D. and Post, L. 1999. Novel Therapeutics from Modern Biotechnology. Springer Verlag. Spada, S. and Walsh, G. 2005. Directory of Approved Biopharmaceutical Products. CRC Press, Florida, USA.

Articles

Mayhall, E., Paffett-Lugassy N., and Zon L.I. 2004. The clinical potential of stem cells. Current Opinion in Cell Biology 16, 713–720.

Reichert, J. and Paquette, C. 2003. Therapeutic recombinant proteins: trends in US approvals 1982-2002. Current Opinion in Molecular Therapy 5, 139–147.

Reichert, J. and Pavlov, A. 2004. Recombinant therapeutics – success rates, market trends and values to 2010. Nature Biotechnology 22 , 1513–1519.

Walsh, G. 2005. Biopharmaceuticals: recent approvals and likely directions. Trends in Biotechnology 23, 553–558.

Walsh, G. 2006. Biopharmaceutical benchmarks 2006. Nature Biotechnology 24, 769–776.

Weng, Z. and DeLisi, C. 2000. Protein therapeutics: promises and challenges of the 21st century. Trends in Biotechnology 20, 29–36.

2 Protein structure

2.1 Introduction

Almost all products of modern pharmaceutical biotechnology, be they on the market or likely to gain approval in the short to intermediate term, are protein based. As such, an understanding of protein structure is central to this topic. A comprehensive treatment of the subject would easily constitute a book on its own, and many such publications are available. The aim of this chapter is to provide a basic overview of the subject in order to equip the reader with a knowledge of protein science sufﬁcient to understand relevant concepts outlined in the remaining chapters of this book. The interested reader is also referred to the ‘Further reading’ section, which lists several excellent specialist publications in the ﬁeld. Much additional information may also be sourced via the web sites mentioned within the chapter.

2.2 Overview of protein structure

Proteins are macromolecules consisting of one or more polypeptides (Table 2.1). Each polypeptide consists of a chain of amino acids linked together by peptide (amide) bonds. The exact amino acid sequence is determined by the gene coding for that specific polypeptide. When synthesized, a polypeptide chain folds up, assuming a specific three-dimensional shape (i.e. a specific conformation) that is unique to it. The conformation adopted is dependent upon the polypeptide’s amino acid sequence, and this conformation is largely stabilized by multiple, weak non-covalent interactions. Any influence (e.g. certain chemicals and heat) that disrupts such weak interactions results in disruption of the polypeptide’s native conformation, a process termed denaturation. Denaturation usually results in loss of functional activity, clearly demonstrating the dependence of protein function upon protein structure. A protein’s structure currently cannot be predicted solely from its amino acid sequence. Its conformation can, however, be determined by techniques such as X-ray diffraction and nuclear magnetic resonance (NMR) spectroscopy.

Proteins are sometimes classiﬁed as ‘simple’ or ‘conjugated’. Simple proteins consist exclusively of polypeptide chain(s) with no additional chemical components present or being required for biological activity. Conjugated proteins, in addition to their polypeptide components(s),

Table 2.1 Selected examples of proteins. The number of polypeptide chains and amino acid residues constituting the protein are listed, along with its molecular mass and biological function

Protein

No. polypeptide chains

Total no. amino acids

Molecular mass (Da)Biological function

Insulin (human)2515 800Complex, includes regulation of blood glucose levels

Lysozyme (egg)112913 900Enzyme capable of degrading peptidoglycan in bacterial cell walls

IL-2 (human)113315 400T-lymphocytederived polypeptide that regulates many aspects of immunity

EPO (human)116536 000Hormone that stimulates red blood cell production

Chymotrypsin (bovine)

324121 600Digestive proteolytic enzyme

Subtilisin (Bacillus amyloliquefaciens) 127427 500Bacterial proteolytic enzyme

Tumour necrosis factor (human TNF- α)

Haemoglobin (human)

347152

000Mediator of inﬂammation and immunity

457464 500Gas transport

Hexokinase (yeast)2800102 000Enzyme capable of phosphorylating selected monosaccharides

Glutamate dehydrogenase (bovine) 40 8 300 1 000 000Enzyme interconverts glutamate and α -ketoglutarate and NH4+

contain one or more non-polypeptide constituents known as prosthetic group(s). The most common prosthetic groups found in association with proteins include carbohydrates (glycoproteins), phosphate groups (phosphoproteins), vitamin derivatives (e.g. ﬂavoproteins) and metal ions (metalloproteins).

Table 2.2 The 20 commonly occurring amino acids. They may be subdivided into ﬁve groups on the basis of side-chain structure. Their three- and one-letter abbreviations are also listed (one-letter abbreviations are generally used only when compiling extended sequence data, mainly to minimize writing space and effort). In addition to their individual molecular masses, the percentage occurrence of each amino acid in an ‘average’ protein is also presented. These data were generated from sequence analysis of over 1000 different proteins

R group classiﬁcationAmino acid

Nonpolar, aliphatic

Abbreviation

Occurrence in ‘average’ protein (%) 3 letters1 letter

Molecular mass

GlycineGlyG757.2

AlanineAlaA898.3

ValineValV1176.6

LeucineLeuL1319

IsoleucineIleI1315.2

ProlineProP1155.1

AromaticTyrosineTyrY1813.2

PhenylalaninePheF1653.9

TryptophanTrpW2041.3

Polar but uncharged

Positively charged

CysteineCysC1211.7

SerineSerS1056

MethionineMetM1492.4

ThreonineThrT1195.8

AsparagineAsnN1324.4

GlutamineGlnQ1464

ArginineArgR1745.7

LysineLysK1465.7

HistidineHisH1552.2

Negatively charged Aspartic acidAspD1335.3

Glutamic acidGluE1476.2

2.2.1

Primary structure

Polypeptides are linear, unbranched polymers, potentially containing up to 20 different monomer types (i.e. the 20 commonly occurring amino acids) linked together in a precise predeﬁned sequence. The primary structure of a polypeptide refers to its exact amino acid sequence, along with the exact positioning of any disulﬁde bonds present (described later). The 20 commonly occurring amino acids are listed in Table 2.2, along with their abbreviated and one-letter designations. The structures of these amino acids are presented in Figure 2.1. Nineteen of these amino acids contain a central (α) carbon atom, to which is attached a hydrogen atom (H), an amino group (NH2) a carboxyl group (COOH), and an additional side chain (R) group – which differs from amino acid to amino acid. The amino acid proline is unusual in that its R group forms a direct covalent bond with the nitrogen atom of what is the free amino group in other amino acids (Figure 2.1).

Figure 2.1 The chemical structure of the 20 amino acids commonly found in proteins

As will be evident from Section 2.2.2, peptide bond formation between adjacent amino acid residues entails the establishment of covalent linkages between the amino and carboxyl groups attached to their respective central (α) carbon atoms. Hence, the free functional (i.e. chemically reactive) groups in polypeptides are almost entirely present as part of the constituent amino acids’ side chains (R groups). In addition to determining the chemical reactivity of a polypeptide, these R groups also very largely dictate the ﬁnal conformation adopted by a polypeptide. Stabilizing/repulsive forces between different R groups (as well as between R groups and the surrounding aqueous media) largely dictate what ﬁnal shape the polypeptide adopts, as will be described later.

Tryptophan Tyrosine

Phenylalanine

Glycine

LeucineIsoleucine Proline

The R groups of the non-polar, alipathic amino acids (Gly, Ala, Val, Leu, Ile and Pro) are devoid of chemically reactive functional groups. These R groups are noteworthy in that, when present in a polypeptide’s backbone, they tend to interact with each other non-covalently (via hydrophobic interactions). These interactions have a signiﬁcant stabilizing inﬂuence on protein conformation.

Glycine is noteworthy in that its R group is a hydrogen atom. This means that the α-carbon of glycine is not asymmetric, i.e. is not a chiral centre. (To be a chiral centre the carbon would have to have four different chemical groups attached to it; in this case, two of its four attached groups are identical.) As a consequence, glycine does not occur in multiple stereo-isomeric forms, unlike the remaining amino acids, which occur as either D or L isomers. Only L-amino acids are naturally found in polypeptides.

The side chains of the aromatic amino acids (Phe, Tyr and Trp) are not particularly reactive chemically, but they all absorb ultraviolet (UV) light. Tyr and Trp in particular absorb strongly at 280 nm, allowing detection and quantiﬁcation of proteins in solution by measuring the absorbance at this wavelength.

Of the six polar but uncharged amino acids, two (cysteine and methionine) are unusual in that they contain a sulfur atom. The side chain of methionine is non-polar and relatively unreactive, although the sulfur atom is susceptible to oxidation. In contrast, the thiol ( C SH) portion of cysteine’s R group is the most reactive functional group of any amino acid side chain. In vivo, this group can form complexes with various metal ions and is readily oxidized, forming ‘disulﬁde linkages’ (covalent linkages between two cysteine residues within the same or even different polypeptide backbones). These help stabilize the three-dimensional structure of such polypeptides. Interchain disulﬁde linkages can also form, in which cysteines from two different polypeptides participate. This is a very effective way of covalently linking adjacent polypeptides.

Of the four remaining polar but uncharged amino acids, the R groups of serine and threonine contain hydroxyl (OH) groups and the R groups of asparagine and glutamine contain amide (CONH2) groups. None are particularly reactive chemically; however, upon exposure to high temperatures or extremes of pH, the latter two can deamidate, yielding aspartic acid and glutamic acid respectively.

Aspartic and glutamic acids are themselves negatively charged under physiological conditions. This allows them to chelate certain metal ions, and also to markedly inﬂuence the conformation adopted by polypeptide chains in which they are found.

Lysine, arganine and histidine are positively charged amino acids. The arganine R group consists of a hydrophobic chain of four CH2 groups ( Figure 2.1), capped with an amino (NH2) group, which is ionized (NH3 ) under most physiological conditions. However, within most polypeptides there is normally a fraction of un-ionized lysines, and these (unlike their ionized counterparts) are quite chemically reactive. Such lysine side chains can be chemically converted into various analogues. The arganine side chain is also quite bulky, consisting of three CH2 groups, an amino group ( NH2) and an ionized guanido group ( " NH2 ). The ‘imidazole’ side chain of histidine can be described chemically as a tertiary amine (R3 N), and thus it can act as a strong nucleophilic catalyst (the nitrogen atom houses a lone pair of electrons, making it a ‘nucleus lover’ or nucleophile; it can donate its electron pair to an ‘electron lover’ or electrophile). As such, the histidine side chain often constitute an essential part of some enzyme active sites.

In addition to the 20 ‘common’ amino acids, some modified amino acids are also found in several proteins. These amino acids are normally altered via a process of post-translational modification (PTM) reactions (i.e. modified after protein synthesis is complete). Almost 200 such modified amino acids have been characterized to date. The more common such modifications are discussed separately in Section 2.5.

Peptide bond

bond

Figure 2.2 (a) Peptide bond formation. (b) Polypeptides consist of a linear chain of amino acids successively linked via peptide bonds. (c) The peptide bond displays partial double-bonded character

2.2.2 The peptide bond

Successive amino acids are joined together during protein synthesis via a ‘peptide’ (i.e. amide) bond ( Figure 2.2). This is a condensation reaction, as a water molecule is eliminated during bond formation. Each amino acid in the resultant polypeptide is termed a ‘residue’, and the polypeptide chain will display a free amino (NH2) group at one end and a free carboxyl (COOH) group at the other end. These are termed the amino and carboxyl termini respectively.

The peptide bond has a rigid, planar structure and is in the region of 1.33 Å in length. Its rigid nature is a reﬂection of the fact that the amide nitrogen lone pair of electrons is delocalized across the bond (i.e. the bond structure is a halfway house between the two forms illustrated in Figure 2.2c). In most instances, peptide groups assume a ‘trans’ conﬁguration ( Figure 2.2b). This minimizes steric interference between the R groups of successive amino acid residues.

Peptide

Amino acid 'residue'

Planar (rigid) peptide bonds

N - C α bond free to rotate, angle of rotation = φ

Cα - C bond free to rotate, angle of rotation = ψ

Figure 2.3 Fragment of polypeptide chain backbone illustrating rigid peptide bonds and the intervening N C α and C α C backbone linkages, which are free to rotate

Whereas the peptide bond is rigid, the other two bond types found in the polypeptide backbone (i.e. the N Cα bond and the Cα C bond, Figure 2.3) are free to rotate. The polypeptide backbone can thus be viewed as a series of planar ‘plates’ that can rotate relative to one another. The angle of rotation around the N Cα bond is termed φ (phi) and that around the Cα C bond is termed ψ (psi) ( Figure 2.3). These angles are also known as rotation angles, dihedral angles or torsion angles. By convention, these angles are defined as being 180 when the polypeptide chain is in its fully extended, trans form. In principle, each bond can rotate to any value between 180 and 180 . However, the degrees of rotation actually observed are restricted due to the occurrence of steric hindrance between atoms of the polypeptide backbone and those of amino acid side chains. For each amino acid residue in a polypeptide backbone, the actual φ and ψ angles that are physically possible can be calculated, and these angle pairs are often plotted against each other in a diagram termed a Ramachandran plot. Sterically allowable angles fall within relatively narrow bands in most instances. A greater than average degree of φ /ψ rotational freedom is observed around glycine residues, due to the latter’s small R group – hence steric hindrance is minimized. On the other hand, bond angle freedom around proline residues is quite restricted due to this amino acid’s unusual structure (Figure 2.1). The φ and ψ angles allowable around each Cα in a polypeptide backbone obviously exert a major influence upon the final three-dimensional shape assumed by the polypeptide.

2.2.3 Amino acid sequence determination

The amino acid sequence of a polypeptide may be determined directly via chemical sequencing or by physical fragmentation and analysis, usually by mass spectrometry. Direct chemical sequencing was the only method available until the 1970s. Insulin was the ﬁrst protein to be sequenced by this approach (in 1953), requiring several years and several hundred grams of protein to complete. The method has been reﬁned and automated over the years, such that, today, polypeptides containing 100 amino acids or more can be automatically sequenced within a few hours, using microgram to milligram levels of protein. The actual chemical sequencing procedure employed is termed the Edman degradation method.

Table 2.3 Representative organisms whose genomes have been or will soon be completely/ almost completely sequenced. Data taken largely from http://wit.integratedgenomics.com/GOLD/ eucaryoticgenomes.html and http://www.tigr.org/tdb/mdb/mdcomplete.html. Updated information is available on these sites

OrganismClassiﬁcation

Aeropyrum pernix

Archaeoglobus fulgidus

Pyrococcus horikoshii

Pyrococcus furiosus

Sulfolobus solfataricus

Thermoplasma acidophilum

Genome sizea (Mb)OrganismClassiﬁcation

Archaea1.67

Genome sizea (Mb)

Treponema pallidum Eubacteria1.14

Archaea2.18 Vibrio chloerae Eubacteria4.0

Archaea1.80

Archaea2.10

Archaea2.99

Archaea1.56

Aspergillus nidulans Fungi31.0

Candida albicans Fungi15.0

Neurospora crassa Fungi47.0

Schizosaccharomyces pombe Fungi14.0

Aquifex aeolicus Eubacteria1.50 Babesia bovis ProtozoaNL

Bacillus subtilis Eubacteria4.20 Cryptosporidium parvum Protozoa10.4

Bacillus anthracis Eubacteria4.50 Leishmania major Protozoa33.6

Bordetella pertussis Eubacteria3.88 Arabidopsis thaliana (thale cress)

Brucella suis Eubacteria3.30 Hordeum vulgare (barley)

Plant70.0

Plant5.0

Chlamydia pneumoniae Eubacteria1.23 Gossypium hirsutum (cotton) PlantNL

Clostridium tetani Eubacteria4.40 Triticum aestivum (wheat) PlantNL

Corynebacterium diphtheriae Eubacteria3.10 Zea mays (maize) PlantNL

E. coli Eubacteria5.23 Danio rerio (zebra ﬁsh) FishNL

Lactobacillus acidophilus Eubacteria1.90 Gallus gallus (chicken) BirdNL

Listeria monocytogenes

Mycobacterium leprae

Eubacteria2.94 Bos taurus (cow) MammalNL

Eubacteria2.80 Canis familiaris (dog) MammalNL

Mycobacterium tuberculosis Eubacteria4.40 Rattus norvegicus (rat) MammalNL

Neisseria meningitidis Eubacteria2.18 Ovis aries (sheep) MammalNL

Pseudomonas aeruginosa

Eubacteria6.3 Sus scrofa (pig) MammalNL

Salmonella enterica EubacteriaNLApePrimateNL

Staphylococcus aureus Eubacteria2.80 Homo sapiens PrimateNL

Streptococcus pneumoniae Eubacteria2.04

a NL: not listed in source publication.

Table 2.4 The major primary sequence (protein and nucleic acid) databases and the web addresses from which they may be accessed

DatabaseWeb address

Protein PIRhttp://www-nbrf.georgetown.edu/ Swiss-Prothttp://www.ebi.ac.uk/swissprot/ MIPShttp://www.mips.biochem.mpg.de/ NRL-3Dhttp://www-nbrf.georgetown.edu/pirwww/ dbinfo/nrl3d.html

Tr EMBLhttp://www.ebi.ac.uk/index.html

Owlhttp://www.bis.med.jhmi.edu/Dan/ proteins/owl.html

Nucleic acid

EMBLhttp://www.ebi.ac.uk/embl/index.html/ GenBankhttp://www.ncbi.nlm.nih.gov

DDBJhttp://www.ddbj.nig.ac.jp/

An alternative approach to amino acid sequence determination is to sequence its gene (Chapter 3). The amino acid sequence can be inferred from the nucleotide sequence obtained. This approach has gained favour in recent years. Reﬁnements to DNA sequencing methodologies and equipment have made such sequence analysis both rapid and relatively inexpensive. The ongoing genome projects continue to generate enormous amounts of sequence data. By the early 2000s, substantial/complete sequence data for some 300 organisms were available ( Table 2.3). As a result, the putative amino acid sequences of an enormous number of proteins (most of unknown function/structure) had been determined.

Upon its generation, sequence information is normally submitted to various databases. The major databases in which protein primary sequence data are available are listed in Table 2.4. Also included in this table are the major nucleic acid sequence databases, as amino acid sequence information can potentially be derived from these.

The Swiss-Prot database is probably the most widely used protein database. It is maintained collaboratively by the European Bioinformatics Institute (EBI) and the Swiss Institute for Bioinformatics. It is relatively easy to access and search via the World Wide Web (Table 2.4). A sample entry for human insulin is provided in Figure 2.4. Additional information detailing such databases is available via the web addresses provided in Table 2.4 and in the bioinformatics publications listed at the end of this chapter.

A polypeptide’s amino acid sequence can thus be determined by direct chemical (Edman) or physical (mass spectrometry) means, or indirectly via gene sequencing. In practice, these methods are complementary to one another and can be used to cross-check sequence accuracy. If the target gene/messenger RNA (mRNA) has been previously isolated, then DNA sequencing is usually most convenient. However, this approach reveals little information regarding any PTMs present in the mature polypeptide, many of whom are of critical signiﬁcance in the context of therapeutic proteins (discussed in Section 2.5).

Figure 2.4 Sample entry for human insulin as present in the Swiss-Prot database. Refer to text for further details. Reproduced from the Swiss-Prot database on the Uniprot website htt://www.ebi.uniprot.org/

2.2.4 Polypeptide synthesis

Full-scale polypeptide characterization usually requires modest/large (milligram to gram) amounts of the puriﬁed target polypeptide. Even larger quantities are then generally required

if the polypeptide has a commercial application. In some cases a polypeptide can be obtained in sufﬁcient quantities by direct extraction from its natural producer source. However, polypeptides may also be produced by direct chemical synthesis, as long as their amino acid sequence (and any PTMs) has been elucidated. Synthesis can be undertaken via a biological route (recombinant DNA technology), as is the case for virtually all modern therapeutic proteins.

2.3 Higher level structure

Thus far we have concentrated on the primary structure (amino acid sequence) of a polypeptide. Higher level protein structure can be described at various levels, i.e. secondary, tertiary and quaternary:

Secondary structure can be described as the local spatial conformation of a polypeptide’s backbone, excluding the constituent amino acid’s side chains. The major elements of secondary structure are the α -helix and β -strands, as described below.

Tertiary structure refers to the three-dimensional arrangement of all the atoms that contribute to the polypeptide.

Quaternary structure refers to the overall spatial arrangement of polypeptide subunits within a protein composed of two or more polypeptides.

2.3.1 Secondary structure

By studying the backbone of most proteins, stretches of amino acids that adopt a regular, recurring shape usually become evident. The most commonly observed secondary structural elements are termed the α -helix and β -strands, which are usually separated by stretches largely devoid of regular, recurring conformation. The α -helix and β -sheets are commonly formed because they maximize formation of stabilizing intramolecular hydrogen bonds and minimize steric repulsion between adjacent side chain groups, while also being compatible with the rigid planar nature of the peptide bonds.

The α -helix contains 3.6 amino acid residues in a full turn ( Figure 2.5). This approximates to a length of 0.56 nm along the long axis of the helix. The participating amino acid side chains protrude outward from the helical backbone. Amino acids most conducive with α -helix formation include alanine, leucine, methionine and glutamate. Proline, as well as the occurrence in close proximity of multiple residues with either bulky side groups or side groups of the same charge, tends to disrupt α -helical formation. The helical structure is stabilized by hydrogen bonding, with every backbone C = O group forming a hydrogen bond with the N—H group four residues ahead of it in the helix. Stretches of α -helix found in globular (i.e. tightly folded, approximately spherical) polypeptides can vary in length from a single helical turn to greater than 10 consecutive helical turns. The average length is about three turns.

Figure 2.5 Ball-and-stick and ribbon representations of an α -helix. Reproduced from Sun, P. and Boyington. 1997. Current Protocols in Protein Science by kind permission of the publisher, John Wiley and Sons

Stretches of α -helix are most often positioned on the protein’s surface, with one face of the helix facing the hydrophobic interior and the other facing the surrounding aqueous medium. The amino acid sequence of these helices is such that hydrophobic amino acid residues are positioned on one

Figure 2.6 The β -sheet. (a) Two segments of β -strands (antiparallel) forming a β -sheet via hydrogen bonding. The β -strand is drawn schematically as a thick arrow. By convention the arrowhead points in the direction of the polypeptide’s C terminus. (b) Schematic illustration of a two-strand β -sheet in parallel and antiparallel modes

Hydrogen bond