|
|
Over the past few years,
"genetic genomics" has provided a new and powerful paradigm
for understanding the effect of sequence polymorphisms, by analyzing their genome-wide
effect on the expression profiles of potential target genes. This approach has
promise both for providing basic biological insight on gene regulation and as a
starting point for understanding human diseases.
Classical computational methods for analyzing these data have been direct
extensions of genetic analysis, viewing each gene expression profile as an isolated,
quantitative trait. In collaboration with
Daphne Koller at Stanford, we developed "Geronemo"
a novel computational method designed specifically for gene expression
quantitative traits. Our premise is that the influence of genotype on phenotype is induced by
fine-grained perturbations to the complex regulatory network that governs a
cell's activity. We provide a computational method that deciphers both the cell's
regulatory network and perturbations to it that result from sequence variability.
Our method, builds on our successful
Module Networks procedure and offers several significant
advantages over eQTL mapping.
-
Geronemo can distinguish between associations directly induced by sequence
variation and those induced by an indirectly via the abundance of a regulator,
leading to a better causal understanding of the observed variation.
- Geronemo exploits the modularity of biological systems, allowing discovery of
complex combinatorial regulation programs that are undetectable when considering
each gene in isolation.
We applied Genonemo to a
dataset containing expression and genotype data for 116 S. cerevisiae strains,
generated by crossing a lab strain (BY) with a wild vineyard strain (RM).
Our method produced a range of interesting biological findings regarding both the
yeast regulatory network and an understanding how
perturbations to it result in variation observed between the strains. Two of our
most interesting findings include:
-
Variation in a small number of chromatin modifying factors
plays a key role modulating a large fraction of the variance in gene expression between strains.
Our global module based analysis suggests that evolutionary forces use changes in a small set of chromatin modification
proteins to drive coordinated global changes in the regulatory network.
(PNAS 2006)
-
Geronemo predicted a novel mechanism involving regulation of mRNA degradation,
connecting Puf3 to P-bodies, which we subsequently verified experimentally.
This connection was uncovered due to our ability to concurrently infer both
the regulatory network and analyze changes to it between strains.
Expression variation among individuals is a powerful resource that is well-suited
both for detecting regulatory interactions and uncovering complex phenotypes.
Unlike other types of data (e.g., gene deletions or environmental stimuli), functional
assays from divergent strains represent small, natural perturbations to the system,
allowing subtle changes to manifest. Moreover, each individual represents a large set
of such perturbations, providing a rich source of statistical variation that helps
clarify the signal. Interestingly, many perturbations are only revealed in the
offspring, with the parents showing no variation.
Such data is rapidly accumulating for a number of model systems, yet much of the
variation remains unexplained, even by the best models. We are working on improving
and extending Geronemo, and developing entirely new methods. Our
computational efforts are aimed at a number of biological questions and directions.
- Understanding the flow of genetic information, from genotype to phenotype and fitness.
We take a multi-layer approach to understand how genotype manifests in phenotypic
diversity, using the regulatory network and changes in gene expression patterns and
an intermediate to facilitate our understanding.
- De-convolving genetic complexity: Despite clear heritability of many phenotypes
and disease, association of multi-loci traits has remained an unresolved challenge.
We are developing new approaches to detect
causality in multi-loci situations that are obscured by standard techniques.
- Scaling to mammalian system and clinically relevant problems. Such scaling
entails considerable computational and statistical challenges: mammalian genomes
are 100-fold larger, have a higher degree of combinatorial regulation and have a complex
landscape of variation. Nevertheless, our success in yeast bears much promise for its
extension to mammalian systems.
|