|
Proceedings -Wednesday, October 9, 2002
WeOE2
Emerging Approaches to Data Integration and
Bioinformatics in Toxicology and Disease Studies
Clary Clish, Beyond Genomics
Background
While understanding the human genome has yielded significant
insight into factors related to disease, it is only one piece of a
complex system. This presentation goes beyond genomics and looks for
correlations, pathways, and connections among thousands of measurable
molecular componentsincluding genes, proteins, and metabolitesin
clinical samples. A Systems Biology approach is used to reveal unique
disease markers and drug targets.
Systems Biology is the study of biology as an integrated system of
genetic, protein, metabolite, cellular, and pathway events that are in
flux and interdependent. While genes are informative clues to disease,
they are not the active agents and insight into disease pathology is
gained through knowledge of protein and metabolite levels. Beyond
Genomics uses its proprietary technologies to perform parallel,
comparative analyses of protein, metabolite and mRNA levels in complex
biological samples, such as peripheral fluids and tissues.
The application to drug discovery for this technology includes utilizing
clinical samples from diseased and healthy (normal) patients to uncover
BioSystem Markers™ and BioSelective Targets™ that are indicators of
disease and potential targets for therapeutic intervention. These
markers and targets enable the linkage of gene response, protein
activity and metabolite dynamics information to give rise to new pathway
and system knowledge.
Premise
An integrated analytical approach lends itself to uniform
experimental designs where sample collection methodologies can be
standardized and sample handling and tracking are consistent. Often the
same samples may be used for more than one type of analysis, i.e., the
same tissue sample can be surveyed for metabolite, protein and mRNA
profiles, and this enables both consistency in data normalization across
platforms as well as complex correlation analyses that utilize profiling
data pertinent to each biomolecule class. Analytical platforms are
designed to be versatile, sensitive, and robust. Also important are
dynamic range, coverage, and the ability to identify analytes of
interest.
Metabolite profiling is achieved using both global NMR and LC/MS
platforms that were developed to survey analytes that might be found at
higher abundance in biological samples and more targeted LC/MS platforms
that were designed to focus on particular classes of molecules that are
lower in abundance. Beyond Genomics¹ approach includes both protein- and
peptide-level separation strategies that allow for global and targeted
coverage of the proteome, coupled either off-line or on-line to mass
spectrometry for quantitative measurement. Also available are a number
of platforms for quantitative profiling of proteins
Given the complexity of the datasets, where a single LC/MS chromatogram
may contain peaks from thousands of analytes, it is essential to have
informatics tools to extract relevant information from raw data. Beyond
Genomics uses the IMPRESSTM algorithm for this data pre-processing step.
The algorithm performs both a background noise removal step as well as a
peak detection function that integrates all chromatographic peaks across
all mass to charge ratios (m/z). Output includes peak list where each
signal has a unique identifier that tracks chromatographic retention
time and peak m/z, a measured intensity, and an assessment of the peak
quality.
Following extraction of peak information, the next challenge is to
analyze the data to determine if there are signals that differentiate
groups of samples, e.g., a diseased population from a healthy
population. The BioSystematics™ approach incorporates computational
strategies that include: normalization of the datasets; exploratory
principal component analyses (PCA) to identify differences between
groups of samples; parametric and non-parametric tests of significance
for a focused, signal-by-signal comparative analysis; and linear and
non-linear correlation analyses to identify signal variances and changes
that might be related biologically.
Analysis
Results from application of this methodology are shown below,
where comparative metabolite, protein, mRNA expression profiling were
performed in a study of a transgenic mouse model of atherosclerosis. The
transgenic mice were sampled prior to the development of clinical
symptoms of disease and compared against age-matched wild type controls
in order to identify early biomarkers and to gain insight into
pathogenesis. The figure shows a PCA analysis of LC/MS profiles from
tryptic digests of a plasma protein fraction. The transgenic mice (TG)
form a distinct cluster apart from the wild type controls (WT) and the
difference factor spectrum indicates the differentially abundant tryptic
peptides that give rise to the separation. These peptides are then
targeted for sequencing via tandem mass spectrometry (MS/MS).
Value of the Technology
Systems biology requires differential
comparative analyses that span a number of molecular classes and
therefore a variety of robust analytical methods are necessary. Equally
important are software tools for data warehousing and pre-processing, as
well as bioinformatic approaches for data analysis and integration. When
applied to disease studies, systems biology enables the discovery of
biomarkers as well as potential avenues for therapeutic intervention.
References and/or Links
Beyond Genomics website at www.beyondgenomics.com
Return to Proceedings »
|