Search CORE

4,067 research outputs found

Probabilistic analysis of the human transcriptome with side information

Author: Lahti Leo
Publication venue
Publication date: 01/01/2010
Field of study

Understanding functional organization of genetic information is a major challenge in modern biology. Following the initial publication of the human genome sequence in 2001, advances in high-throughput measurement technologies and efficient sharing of research material through community databases have opened up new views to the study of living organisms and the structure of life. In this thesis, novel computational strategies have been developed to investigate a key functional layer of genetic information, the human transcriptome, which regulates the function of living cells through protein synthesis. The key contributions of the thesis are general exploratory tools for high-throughput data analysis that have provided new insights to cell-biological networks, cancer mechanisms and other aspects of genome function. A central challenge in functional genomics is that high-dimensional genomic observations are associated with high levels of complex and largely unknown sources of variation. By combining statistical evidence across multiple measurement sources and the wealth of background information in genomic data repositories it has been possible to solve some the uncertainties associated with individual observations and to identify functional mechanisms that could not be detected based on individual measurement sources. Statistical learning and probabilistic models provide a natural framework for such modeling tasks. Open source implementations of the key methodological contributions have been released to facilitate further adoption of the developed methods by the research community.Comment: Doctoral thesis. 103 pages, 11 figure

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Iterative Random Forests to detect predictive and stable high-order interactions

Author: Basu Sumanta
Brown James B.
Kumbier Karl
Yu Bin
Publication venue
Publication date: 23/12/2017
Field of study

Genomics has revolutionized biology, enabling the interrogation of whole transcriptomes, genome-wide binding sites for proteins, and many other molecular processes. However, individual genomic assays measure elements that interact in vivo as components of larger molecular machines. Understanding how these high-order interactions drive gene expression presents a substantial statistical challenge. Building on Random Forests (RF), Random Intersection Trees (RITs), and through extensive, biologically inspired simulations, we developed the iterative Random Forest algorithm (iRF). iRF trains a feature-weighted ensemble of decision trees to detect stable, high-order interactions with same order of computational cost as RF. We demonstrate the utility of iRF for high-order interaction discovery in two prediction problems: enhancer activity in the early Drosophila embryo and alternative splicing of primary transcripts in human derived cell lines. In Drosophila, among the 20 pairwise transcription factor interactions iRF identifies as stable (returned in more than half of bootstrap replicates), 80% have been previously reported as physical interactions. Moreover, novel third-order interactions, e.g. between Zelda (Zld), Giant (Gt), and Twist (Twi), suggest high-order relationships that are candidates for follow-up experiments. In human-derived cells, iRF re-discovered a central role of H3K36me3 in chromatin-mediated splicing regulation, and identified novel 5th and 6th order interactions, indicative of multi-valent nucleosomes with specific roles in splicing regulation. By decoupling the order of interactions from the computational cost of identification, iRF opens new avenues of inquiry into the molecular mechanisms underlying genome biology

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

eScholarship - University of California

You can't always sketch what you want: Understanding Sensemaking in Visual Query Systems

Author: Karahalios Karrie
Kim Jaewoo
Lee Doris Jung-Lin
Lee John
Parameswaran Aditya
Siddiqui Tarique
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Visual query systems (VQSs) empower users to interactively search for line charts with desired visual patterns, typically specified using intuitive sketch-based interfaces. Despite decades of past work on VQSs, these efforts have not translated to adoption in practice, possibly because VQSs are largely evaluated in unrealistic lab-based settings. To remedy this gap in adoption, we collaborated with experts from three diverse domains---astronomy, genetics, and material science---via a year-long user-centered design process to develop a VQS that supports their workflow and analytical needs, and evaluate how VQSs can be used in practice. Our study results reveal that ad-hoc sketch-only querying is not as commonly used as prior work suggests, since analysts are often unable to precisely express their patterns of interest. In addition, we characterize three essential sensemaking processes supported by our enhanced VQS. We discover that participants employ all three processes, but in different proportions, depending on the analytical needs in each domain. Our findings suggest that all three sensemaking processes must be integrated in order to make future VQSs useful for a wide range of analytical inquiries.Comment: Accepted for presentation at IEEE VAST 2019, to be held October 20-25 in Vancouver, Canada. Paper will also be published in a special issue of IEEE Transactions on Visualization and Computer Graphics (TVCG) IEEE VIS (InfoVis/VAST/SciVis) 2019 ACM 2012 CCS - Human-centered computing, Visualization, Visualization design and evaluation method

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Recommended from our members

Exploration of human search behaviour: a multidisciplinary perspective

Author: Rosetti Sciutto Marcos Francisco
Publication venue
Publication date: 10/10/2011
Field of study

The following work presents an exploration of human search behaviour both from biological and computational perspectives. Search behaviour is defined as the movements made by an organism while attempting to find a resource. This work describes some of the principal procedures used to record movement, methods for analysing the data and possible ways of interpreting the data. In order to obtain a database of searching behaviour, an experimental setup was built and tested to generate the search paths of human participants. The test arena occupied part of a football field and the targets consisted of an array of 20 golf balls. In the first set of experiments, a random and regular distribution of targets were tested. For each distribution, three distinct conspicuity levels were constructed: a cryptic level, in which targets were painted the same colour as the grass, a semi-conspicuous level in which targets were left white and a conspicuous condition in which the position of each target was marked by a red flag, protruding one metre from the ground. The subjects tested were 9-11 year old children and their search paths were collected using a GPS device. Subjects did not recognise the spatial cues regarding the way targets were spatially distributed. A minimal decision model, the bouncing search model, was built based on the characteristics of the childrens search paths. The model produced an outstanding fit of the children’s behavioural data. In the second set of experiments, a new group of children were tested for two new distributions obtained by arranging the targets in patches. Again, children appeared unable to recognise spatial information during the collection processes. The children’s behaviour once again produced a good match with that of the bouncing search model. This work introduces several new methodological aspects to be explored to further understand the decision processes involved when humans search. Also, it illustrates that integrating biology and computational science can result in innovative research

Sussex Research Online

Spaceflight and the Differential Gene Expression of Human Stem Cell-Derived Cardiomyocytes

Author: Zhu Eugenie
Publication venue: SJSU ScholarWorks
Publication date: 26/05/2021
Field of study

The National Aeronautics and Space Administration (NASA) has performed many experiments on the International Space Station (ISS) to further understand how conditions in space can affect life on Earth. This project analyzed GLDS-258, a gene set from NASA’s GeneLab repository which examines the impact of microgravity on human induced pluripotent stem-cell-derived cardiomyocytes (hiPSC-CMs). While many datasets have been run through NASA’s RNA-Seq Consensus Pipeline (RCP) to study differential gene expression in space, a Homo sapiens dataset has yet to be analyzed using the RCP. The aim of this project was to run the first Homo sapiens dataset, GLDS-258, through the RCP on the San Jose State University College of Engineering High Performance Computing Cluster and investigate any biological significance from the results. In this study, a total of 18 hiPSC-CMs samples from ground control, flight, and post- flight groups are run through the RPC. The resulting differential gene expression data was further analyzed for biological significance using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) and Gene Set Enrichment Analysis (GSEA). Results showed that most genes were differentially expressed in ground control versus flight groups, while post- flight groups and ground control groups did not have as many differentially expressed genes. Gene set analysis showed significant expression of genes in mitochondrial pathways as well as genes related to neurodegenerative diseases such as Alzheimer’s, Huntington’s, and Parkinson’s disease. These results indicate that exposure to microgravity may play a role in altering expression of genes which are related to neurodegenerative pathways in cardiac cells. Our results demonstrate that it is possible to process Homo sapiens data through the RPC, and suggest that cardiomyocytes exposed to microgravity may exacerbate neurodegenerative disease progression

SJSU ScholarWorks

Learning stable and predictive structures in kinetic systems: Benefits of a causal approach

Author: Bauer Stefan
Peters Jonas
Pfister Niklas
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2019
Field of study

Learning kinetic systems from data is one of the core challenges in many fields. Identifying stable models is essential for the generalization capabilities of data-driven inference. We introduce a computationally efficient framework, called CausalKinetiX, that identifies structure from discrete time, noisy observations, generated from heterogeneous experiments. The algorithm assumes the existence of an underlying, invariant kinetic model, a key criterion for reproducible research. Results on both simulated and real-world examples suggest that learning the structure of kinetic systems benefits from a causal perspective. The identified variables and models allow for a concise description of the dynamics across multiple experimental settings and can be used for prediction in unseen experiments. We observe significant improvements compared to well established approaches focusing solely on predictive performance, especially for out-of-sample generalization

arXiv.org e-Print Archive

Copenhagen University Research Information System

MPG.PuRe

Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

Author: Bastien Olivier
Birkholtz Lyn-Marie
Breton Vincent
Grando Delphine
Hofmann-Apitius Martin
Jacq Nicolas
Joubert Fourie
Kasam Vinod
Louw Abraham I
Maréchal Eric
Ortet Philippe
Roy Sylvaine
Saïdani Nadia
Wells Gordon
Zimmermann Marc
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

Hal - Université Grenoble Alpes

HAL AMU

Fraunhofer-ePrints

HAL Clermont Université

HAL Descartes

HAL-CEA

ProdInra

arXiv.org e-Print Archive

HAL-IN2P3

Springer - Publisher Connector

PubMed Central

UPSpace at the University of Pretoria