Understanding functional organization of genetic information is a major
challenge in modern biology. Following the initial publication of the human
genome sequence in 2001, advances in high-throughput measurement technologies
and efficient sharing of research material through community databases have
opened up new views to the study of living organisms and the structure of life.
In this thesis, novel computational strategies have been developed to
investigate a key functional layer of genetic information, the human
transcriptome, which regulates the function of living cells through protein
synthesis. The key contributions of the thesis are general exploratory tools
for high-throughput data analysis that have provided new insights to
cell-biological networks, cancer mechanisms and other aspects of genome
function.
A central challenge in functional genomics is that high-dimensional genomic
observations are associated with high levels of complex and largely unknown
sources of variation. By combining statistical evidence across multiple
measurement sources and the wealth of background information in genomic data
repositories it has been possible to solve some the uncertainties associated
with individual observations and to identify functional mechanisms that could
not be detected based on individual measurement sources. Statistical learning
and probabilistic models provide a natural framework for such modeling tasks.
Open source implementations of the key methodological contributions have been
released to facilitate further adoption of the developed methods by the
research community.Comment: Doctoral thesis. 103 pages, 11 figure