118

BIOLOGY

6.9 HUMAN GENOME PROJECT

In the preceding sections you have learnt that it is the sequence of bases in

DNA that determines the genetic information of a given organism. In other

words, genetic make-up of an organism or an individual lies in the DNA

sequences. If two individuals differ, then their DNA sequences should also

be different, at least at some places. These assumptions led to the quest of

finding out the complete DNA sequence of human genome. With the

establishment of genetic engineering techniques where it was possible to

isolate and clone any piece of DNA and availability of simple and fast

techniques for determining DNA sequences, a very ambitious project of

sequencing human genome was launched in the year 1990.

Human Genome Project (HGP) was called a mega project. You can

imagine the magnitude and the requirements for the project if we simply

define the aims of the project as follows:

Human genome is said to have approximately 3 x 10

bp, and if the

cost of sequencing required is US $ 3 per bp (the estimated cost in the

beginning), the total estimated cost of the project would be approximately

9 billion US dollars. Further, if the obtained sequences were to be stored

in typed form in books, and if each page of the book contained 1000

letters and each book contained 1000 pages, then 3300 such books would

be required to store the information of DNA sequence from a single human

cell. The enormous amount of data expected to be generated also

necessitated the use of high speed computational devices for data storage

and retrieval, and analysis. HGP was closely associated with the rapid

development of a new area in biology called Bioinformatics.

Goals of HGP

Some of the important goals of HGP were as follows:

(i) Identify all the approximately 20,000-25,000 genes in human DNA;

(ii) Determine the sequences of the 3 billion chemical base pairs that

make up human DNA;

(iiii) Store this information in databases;

(iv) Improve tools for data analysis;

(v) Transfer related technologies to other sectors, such as industries;

(vi) Address the ethical, legal, and social issues (ELSI) that may arise

from the project.

The Human Genome Project was a 13-year project coordinated by

the U.S. Department of Energy and the National Institute of Health. During

the early years of the HGP, the Wellcome Trust (U.K.) became a major

partner; additional contributions came from Japan, France, Germany,

China and others. The project was completed in 2003. Knowledge about

the effects of DNA variations among individuals can lead to revolutionary

new ways to diagnose, treat and someday prevent the thousands of

2022-23

BIOLOGY

In the preceding sections you have learnt that it is the sequence of bases in

DNA that determines the genetic information of a given organism. In other

words,

enetic make-

of an o

anism or an individual lies in the DNA

sequences. If two individuals dif

, then their DNA sequences should also

be different, at least at some places. These assumptions led to the quest of

China and others. The

ect was co

leted in 2003. Knowled

about

the effects of DNA variations among individuals can lead to revolutionary

new ways to diagnose, treat and someday prevent the thousands of

202

2-2

111188

be different, at least at some

laces. These assum

ions led to the

est of

finding out the complete DNA sequence of human genome. With the

establishment of

enetic e

ineeri

techn

ues where it was

ossible to

isolate and clone any piece of DNA and availability of simple and fast

techniques for deter

mining DNA sequences, a very ambitious p

oject of

sequencing human genome was launched in the year 1990.

Human Genome Pr

oject

(HGP) was called a mega pr

oject. Y

ou can

ine the ma

itude and the re

irements for the

ct if we sim

define the aims of the

ct as follows:

Human genome is said to have approximately 3 x 10

bp, and if the

cost of sequencing required is US $ 3 per bp (the estimated cost in the

beginning), the total estimated cost of the project would be approximately

9 billion US dollars. Further

, if the obtained se

ences wer

e to be stor

in typed form in books, and if each page of the book contained 1000

letters and each book contained 1000 pages, then 3300 such books would

be required to store the information of DNA sequence from a single human

cell. The enormous amount of data expected to be generated also

necessitated the use of hi

ed com

tational devices for data stor

and retrieval, and analysis. HGP was closely associated with the rapid

development of a new area in biology calle

Some of the important goals of HGP were as follows:

(i)

dentify all the approximately 20,000-25,000 genes in human D

;

(ii)

Determine the sequences of the 3 billion chemical base pairs that

make up human DNA;

(iiii)

Store this information in databases;

(iv)

Improve tools for data analysis;

)

r r

elated technologies to other sectors, such as industries

;

(vi)

Address the ethical, l

al, and social issues (ELSI) that ma

arise

from the project

The Human Genome Project was a 13-year project coordinated by

the U.S. Department of Energy and the National Institute of Health. During

the early years of the HGP

, the W

ellcome T

rust (U.K.) became a major

partner; additional contributions came from Japan, France, Germany,

China and others. The project was completed in 2003. Knowledge about

119

MOLECULAR BASIS OF INHERITANCE

disorders that affect human beings. Besides providing clues to

understanding human biology, learning about non-human organisms

DNA sequences can lead to an understanding of their natural capabilities

that can be applied toward solving challenges in health care, agriculture,

energy production, environmental remediation. Many non-human model

organisms, such as bacteria, yeast, Caenorhabditis elegans (a free living

non-pathogenic nematode), Drosophila (the fruit fly), plants (rice and

Arabidopsis), etc., have also been sequenced.

Methodologies : The methods involved two major approaches. One

approach focused on identifying all the genes that are expressed as

RNA (referred to as Expressed Sequence Tags (ESTs). The other took

the blind approach of simply sequencing the whole set of genome that

contained all the coding and non-coding sequence, and later assigning

different regions in the sequence with functions (a term referred to as

Sequence Annotation). For sequencing, the total DNA from a cell is

isolated and converted into random fragments of relatively smaller sizes

(recall DNA is a very long polymer, and there are technical limitations in

sequencing very long pieces of DNA) and cloned in suitable host using

specialised vectors. The cloning resulted into amplification of each piece

of DNA fragment so that it subsequently could be sequenced with ease.

The commonly used hosts were bacteria and yeast, and the vectors were

called as BAC (bacterial artificial chromosomes), and YAC (yeast artificial

chromosomes).

The fragments were sequenced using automated DNA sequencers that

worked on the principle of a method developed by Frederick Sanger.

(Remember, Sanger is also credited for developing method for

determination of amino acid

sequences in proteins). These

sequences were then arranged based

on some overlapping regions

present in them. This required

generation of overlapping fragments

for sequencing. Alignment of these

sequences was humanly not

possible. Therefore, specialised

computer based programs were

developed (Figure 6.15). These

sequences were subsequently

annotated and were assigned to each

chromosome. The sequence of

chromosome 1 was completed only

in May 2006 (this was the last of the

24 human chromosomes – 22

autosomes and X and Y – to be

119

Figure 6.15 A representative diagram of human

genome project

2022-23

MOLECULAR BASIS OF INHERITANCE

disorders that affect human beings. Besides providing clues to

understandin

human biol

learnin

about non-human or

nisms

DNA sequences can lead to an understanding of their natural capabilities

that can be applied toward solving challenges in health care, agriculture,

energy production, environmental remediation. Many non-human model

organisms, such as bacteria, yeast,

Caenorhabditis elega

(a free living

ay (t

24 human chromosomes – 2

d X

d Y –

Figure 6.1

A representative diagram of human

genome project

202

2-2

non-pathogenic nematode),

Drosophila

(the fruit fly), plants (rice and

Arabidopsis

), etc., have also been se

enced.

Methodologies :

The methods involved two major approaches. One

approach focused on identifying all the genes that are expressed as

RNA (referred to as

Expressed Sequence Tags

(ESTs). The other took

the blind approach of simply sequencing the whole set of genome that

contained all the coding and non-coding sequence, and later assigning

different re

ons in the s

uence with functions (a term referred to as

Sequence Annotation

). For sequencing, the total DNA from a cell is

isolated and converted into random fragments of relatively smaller sizes

ecall DNA is a very long polymer

, and th

e technical limitations in

sequencing very long pieces of DNA) and cloned in suitable host using

specialised vectors. The cloning resulted into amplification of each piece

of DNA fragment so that it subsequently could be sequenced with ease.

The commonly used hosts were bacteria and yeast, and the vectors were

called as

BAC

(bacterial artificial chr

omosomes), and

(

ast artificial

chromosomes).

The fragments were sequenced using automated DNA sequencers that

worked on the princi

e of a method developed

ederick Sanger

(Remember

, Sanger is also cr

edited for developing method for

determination of amino aci

ences in

roteins). Thes

sequences were then arranged bas

on some overlapping region

present in them. This require

generation of overlappin

fragment

for sequencing. Alignment of thes

sequences was humanly not

ssible. Therefore, s

cialis

computer based programs we

developed (Figure 6.15). The

sequences were subsequently

annotated and were assigned to each

chromosome. The sequence of

chromosome 1 was completed only

in May 2006 (this was the last of th

111199

120

BIOLOGY

sequenced). Another challenging task was assigning the genetic and

physical maps on the genome. This was generated using information on

polymorphism of restriction endonuclease recognition sites, and some

repetitive DNA sequences known as microsatellites (one of the applications

of polymorphism in repetitive DNA sequences shall be explained in next

section of DNA fingerprinting).

6.9.1 Salient Features of Human Genome

Some of the salient observations drawn from human genome project are

as follows:

(i) The human genome contains 3164.7 million bp.

(ii) The average gene consists of 3000 bases, but sizes vary greatly, with

the largest known human gene being dystrophin at 2.4 million bases.

(iii) The total number of genes is estimated at 30,000– much lower

than previous estimates of 80,000 to 1,40,000 genes. Almost all

(99.9 per cent) nucleotide bases are exactly the same in all people.

(iv) The functions are unknown for over 50 per cent of the discovered

genes.

(v) Less than 2 per cent of the genome codes for proteins.

(vi) Repeated sequences make up very large portion of the human genome.

(vii) Repetitive sequences are stretches of DNA sequences that are

repeated many times, sometimes hundred to thousand times. They

are thought to have no direct coding functions, but they shed light

on chromosome structure, dynamics and evolution.

(viii) Chromosome 1 has most genes (2968), and the Y has the fewest (231).

(ix) Scientists have identified about 1.4 million locations where single-

base DNA differences (SNPs – single nucleotide polymorphism,

pronounced as ‘snips’) occur in humans. This information promises

to revolutionise the processes of finding chromosomal locations for

disease-associated sequences and tracing human history.

6.9.2 Applications and Future Challenges

Deriving meaningful knowledge from the DNA sequences will define

research through the coming decades leading to our understanding of

biological systems. This enormous task will require the expertise and

creativity of tens of thousands of scientists from varied disciplines in both

the public and private sectors worldwide. One of the greatest impacts of

having the HG sequence may well be enabling a radically new approach

to biological research. In the past, researchers studied one or a few genes

at a time. With whole-genome sequences and new high-throughput

technologies, we can approach questions systematically and on a much

2022-23

BIOLOGY

sequ

nced). Another challenging task was assigning the genetic and

physical maps on the genome. This was generated using information on

polymorphism of restriction endonuclease recognition sites, and some

repetitive DNA sequences known as microsatellites (one of the applications

ism in r

etitive DNA se

ences shall be e

lained in next

section of DNA fin

rprintin

at a time. With whole-genome sequences and new high-throug

technologies, we can approach questions systematical

and on a much

202

2-2

112200

6.9.1 Salient Features of Human Geno

Some of the salient observations drawn from human

nome

ct are

as follows:

)

The human genome contains 3164.7 million bp.

(ii)

The average gene consists of 3000 bases, but sizes vary greatly, with

the largest known human gene being dystrophin at 2.4 million bases.

(iii

)

The total number of genes is estimated at 30,00

0–

much lower

than previous estimates of 80,000 to 1,40,000 genes. Almost all

(99.9 per cent) nucleotide bases are exactly the same in all people.

The functions are unknown for over 50 per cent of the discovered

genes.

)

Less than 2 per cent of the genome codes for protein

Repeated sequences make up very large portion of the human genome.

)

titive s

uences are stretches of DNA se

ences that are

ated man

times, sometimes hundred to thousand times. The

are thought to have no direct coding functions, but they shed light

on chromosome structure, d

amics and evolutio

(viii)

Chromosome 1 has most genes (2968), and the Y has the fewest (231).

Scientists have identified about 1.4 million locations where sin

base DNA differences

(

SNPs

–

le nucleotide polymorphism

pronounced as ‘snips’) occur in humans. This information promises

to revolutioni

e the

ocesses of findi

chromosomal locations for

disease-associated sequences and tracing human history.

6.6.

9.9.

plications and Future Challeng

Deriving meaningful knowledge from the DNA sequences will define

research through the coming decades leading to our understanding of

biological systems. This enormous task will require the expertise and

creativity of tens of thousands of scientists from varied disciplines in both

the public and private sectors worldwide. One of the greatest impacts of

having the HG sequence may well be enabling a radically new approach

to biological research. In the past, researchers studied one or a few genes

at t With whol nd h h-th hput

121

MOLECULAR BASIS OF INHERITANCE

broader scale. They can study all the genes in a genome, for example, all

the transcripts in a particular tissue or organ or tumor, or how tens of

thousands of genes and proteins work together in interconnected networks

to orchestrate the chemistry of life.

6.10 DNA FINGERPRINTING

As stated in the preceding section, 99.9 per cent of base sequence among

humans is the same. Assuming human genome as 3 × 10

bp, in how

many base sequences would there be differences? It is these differences

in sequence of DNA which make every individual unique in their

phenotypic appearance. If one aims to find out genetic differences

between two individuals or among individuals of a population,

sequencing the DNA every time would be a daunting and expensive

task. Imagine trying to compare two sets of 3 × 10

base pairs. DNA

fingerprinting is a very quick way to compare the DNA sequences of any

two individuals.

DNA fingerprinting involves identifying differences in some specific

regions in DNA sequence called as repetitive DNA, because in these

sequences, a small stretch of DNA is repeated many times. These repetitive

DNA are separated from bulk genomic DNA as different peaks during

density gradient centrifugation. The bulk DNA forms a major peak and

the other small peaks are referred to as satellite DNA. Depending on

base composition (A : T rich or G:C rich), length of segment, and number

of repetitive units, the satellite DNA is classified into many categories,

such as micro-satellites, mini-satellites etc. These sequences normally

do not code for any proteins, but they form a large portion of human

genome. These sequence show high degree of polymorphism and form

the basis of DNA fingerprinting. Since DNA from every tissue (such as

blood, hair-follicle, skin, bone, saliva, sperm etc.), from an individual

show the same degree of polymorphism, they become very useful

identification tool in forensic applications. Further, as the polymorphisms

are inheritable from parents to children, DNA fingerprinting is the basis

of paternity testing, in case of disputes.

As polymorphism in DNA sequence is the basis of genetic mapping

of human genome as well as of DNA fingerprinting, it is essential that we

understand what DNA polymorphism means in simple terms.

Polymorphism (variation at genetic level) arises due to mutations. (Recall

different kind of mutations and their effects that you have already

studied in Chapter 5, and in the preceding sections in this chapter.)

New mutations may arise in an individual either in somatic cells or in

the germ cells (cells that generate gametes in sexually reproducing

organisms). If a germ cell mutation does not seriously impair individual’s

ability to have offspring who can transmit the mutation, it can spread to

2022-23

MOLECULAR BASIS OF INHERITANCE

broader scale. They can study all the genes in a genome, for example, all

the transcripts in a particular tissue or

gan or tumor

, or how tens of

thousands of genes and proteins work together in interconnected networks

to orchestrate the chemistry of life.

6.10 DNA F

INGERPRINTING

ability to have offspring who can transmit the mutation, it can spread to

202

2-2

112211

As stated in the preceding section, 99.9 per cent of base sequence among

humans is the same.

Assuming human genome as 3 × 10

in h

many base sequences would there be d

rences

diff

in sequence of DNA which make every individual unique in their

phenotypic appearance. If one aims to find out genetic differences

between two individuals or among individuals of a population,

sequencing the DNA every time would be a daunting and expensive

task. Imagine trying to compare two

of 3 ×

base pairs. D

fingerprinting is a very quick way to compare the DNA sequences of any

two individuals.

DNA fingerprinting involves identifying differences in some specific

regions in DNA sequence called as

repetitive DNA

because in these

sequences, a small stretch of DNA is repeated many times. These repetitive

DNA are separated from bulk genomic DNA as different peaks during

density gradient centrifugation. The bulk DNA forms a major peak and

the other small peaks are referred to as

satellite DNA

. Depending on

base composition (A

T rich or G:C rich), length of segment, and number

of repetitive units, the satellite DNA is classified into many categories,

such as micro-satellites, mini-satellites etc. These sequences normally

do not code for any proteins, but they form a large portion of human

genome. These sequence show high degree of polymorphism and form

the basis of DNA

fingerprinting. Since DNA from every tissue (such as

blood, hair

-follicle, skin, bone, saliva, sper

m etc.)

dual

show the same degree of po

morphism, they become very useful

identification tool in for

ensic applications. Furthe

, as the polymorphisms

are inheritable from parents to children, DNA fingerprinting is the basis

of paternity testing, in case of disputes.

As polymorphism in DNA sequence is the basis of genetic mapping

of human genome as well as of DNA

fingerprinting, it is essential that we

understand what DNA polymorphism means in simple terms.

hism

(variation at genetic level) arises due to mutations.

(

Recall

different kind of mutations and their effects that you have already

studied in Chapter 5, and in the pr

eceding section

this chapter

New mutations may arise in an individual either in somatic cells or in

the germ cells (cells that generate gametes in sexually reproducing

organisms). If a germ cell mutation does not seriously impair individual’s

122

BIOLOGY

the other members of population (through sexual reproduction). Allelic

(again recall the definition of alleles from Chapter 5) sequence variation

has traditionally been described as a DNA polymorphism if more than

one variant (allele) at a locus occurs in human population with a

frequency greater than 0.01. In simple terms, if an inheritable mutation

is observed in a population at high frequency, it is referred to as DNA

polymorphism. The probability of such variation to be observed in non-

coding DNA sequence would be higher as mutations in these sequences

may not have any immediate effect/impact in an individual’s

reproductive ability. These mutations keep on accumulating generation

after generation, and form one of the basis of variability/polymorphism.

There is a variety of different types of polymorphisms ranging from single

nucleotide change to very large scale changes. For evolution and

speciation, such polymorphisms play very important role, and you will

study these in details at higher classes.

The technique of DNA Fingerprinting was initially developed by Alec

Jeffreys. He used a satellite DNA as probe that shows very high degree

of polymorphism. It was called as Variable Number of Tandem Repeats

(VNTR). The technique, as used earlier, involved Southern blot

hybridisation using radiolabelled VNTR as a probe. It included

(i) isolation of DNA,

(ii) digestion of DNA by restriction endonucleases,

(iii) separation of DNA fragments by electrophoresis,

(iv) transferring (blotting) of separated DNA fragments to synthetic

membranes, such as nitrocellulose or nylon,

(v) hybridisation using labelled VNTR probe, and

(vi) detection of hybridised DNA fragments by autoradiography. A schematic

representation of DNA fingerprinting is shown in Figure 6.16.

The VNTR belongs to a class of satellite DNA referred to as mini-satellite.

A small DNA sequence is arranged tandemly in many copy numbers. The

copy number varies from chromosome to chromosome in an individual.

The numbers of repeat show very high degree of polymorphism. As a

result the size of VNTR varies in size from 0.1 to

20 kb. Consequently, after hybridisation with VNTR probe, the

autoradiogram gives many bands of differing sizes. These bands give a

characteristic pattern for an individual DNA (Figure 6.16). It differs from

individual to individual in a population except in the case of monozygotic

(identical) twins. The sensitivity of the technique has been increased by

use of polymerase chain reaction (PCR–you will study about it in

Chapter 11). Consequently, DNA from a single cell is enough to perform

DNA fingerprinting analysis. In addition to application in forensic

2022-23

BIOLOGY

the other members of population (through sexual reproduction). Allelic

(again recall the

definition of alleles from Chapter 5) sequence variation

has traditionally been described as a DNA polymorphism if more than

one variant (allele) at a locus occurs in human population with a

frequency greater than 0.01. In simple terms, if an

inheri

is observed in a population at high frequency, it is referred to as

DNA

. The probability of such variation to be observed in non-

Chapter 11). Consequently, DNA from a single cell is enough to perform

DNA fingerprinting ana

sis. In addition to ap

ication in forensic

202

2-2

112222

polymorphism

. The probability of such variation to be observed in non-

codi

DNA sequence would be h

her as mutations in these sequences

not have a

immediate effect/im

ct in an individual’s

reproductive ability. These mutations keep on accumulating generation

after generation, and form one of the basis of variability/polymorphism.

There is a variety of different types of polymorphisms ranging from single

nucleotide change to very large scale changes. For evolution and

speciation, such polymorphisms play very important role, and you

will

stud

these in details at hi

er classes.

The technique of DNA Fingerprinting was initially developed by Alec

Jeffreys.

He used a satellite DNA as probe that shows very high degree

of polymorphism. It was called as

ariable Number of T

andem

Repeats

(VNTR). The technique, as used earlier

, involved Souther

n blot

hybrid

ation using radiolab

led VNTR as a probe.

inc

(

)

isolation of DNA,

(

)

digestion of DNA by restriction endonuclease

(

)

separation of DNA fragments by electrophoresis,

(

)

transferring (blotting) of separated DNA fragments to synthetic

membranes, such as nitrocellulose or n

(

((

)

bridisation using labe

led VNTR probe, and

(

((

)

detection of hybridi

ed DNA fragments by autoradiography. A schematic

esentation of DNA fi

inti

is shown

ure 6.16.