453 research outputs found

    A comparative phylogenetic approach to Austronesian cultural evolution

    Get PDF

    Machine Learning Models for Deciphering Regulatory Mechanisms and Morphological Variations in Cancer

    Get PDF
    The exponential growth of multi-omics biological datasets is resulting in an emerging paradigm shift in fundamental biological research. In recent years, imaging and transcriptomics datasets are increasingly incorporated into biological studies, pushing biology further into the domain of data-intensive-sciences. New approaches and tools from statistics, computer science, and data engineering are profoundly influencing biological research. Harnessing this ever-growing deluge of multi-omics biological data requires the development of novel and creative computational approaches. In parallel, fundamental research in data sciences and Artificial Intelligence (AI) has advanced tremendously, allowing the scientific community to generate a massive amount of knowledge from data. Advances in Deep Learning (DL), in particular, are transforming many branches of engineering, science, and technology. Several of these methodologies have already been adapted for harnessing biological datasets; however, there is still a need to further adapt and tailor these techniques to new and emerging technologies. In this dissertation, we present computational algorithms and tools that we have developed to study gene-regulation and cellular morphology in cancer. The models and platforms that we have developed are general and widely applicable to several problems relating to dysregulation of gene expression in diseases. Our pipelines and software packages are disseminated in public repositories for larger scientific community use. This dissertation is organized in three main projects. In the first project, we present Causal Inference Engine (CIE), an integrated platform for the identification and interpretation of active regulators of transcriptional response. The platform offers visualization tools and pathway enrichment analysis to map predicted regulators to Reactome pathways. We provide a parallelized R-package for fast and flexible directional enrichment analysis to run the inference on custom regulatory networks. Next, we designed and developed MODEX, a fully automated text-mining system to extract and annotate causal regulatory interaction between Transcription Factors (TFs) and genes from the biomedical literature. MODEX uses putative TF-gene interactions derived from high-throughput ChIP-Seq or other experiments and seeks to collect evidence and meta-data in the biomedical literature to validate and annotate the interactions. MODEX is a complementary platform to CIE that provides auxiliary information on CIE inferred interactions by mining the literature. In the second project, we present a Convolutional Neural Network (CNN) classifier to perform a pan-cancer analysis of tumor morphology, and predict mutations in key genes. The main challenges were to determine morphological features underlying a genetic status and assess whether these features were common in other cancer types. We trained an Inception-v3 based model to predict TP53 mutation in five cancer types with the highest rate of TP53 mutations. We also performed a cross-classification analysis to assess shared morphological features across multiple cancer types. Further, we applied a similar methodology to classify HER2 status in breast cancer and predict response to treatment in HER2 positive samples. For this study, our training slides were manually annotated by expert pathologists to highlight Regions of Interest (ROIs) associated with HER2+/- tumor microenvironment. Our results indicated that there are strong morphological features associated with each tumor type. Moreover, our predictions highly agree with manual annotations in the test set, indicating the feasibility of our approach in devising an image-based diagnostic tool for HER2 status and treatment response prediction. We have validated our model using samples from an independent cohort, which demonstrates the generalizability of our approach. Finally, in the third project, we present an approach to use spatial transcriptomics data to predict spatially-resolved active gene regulatory mechanisms in tissues. Using spatial transcriptomics, we identified tissue regions with differentially expressed genes and applied our CIE methodology to predict active TFs that can potentially regulate the marker genes in the region. This project bridged the gap between inference of active regulators using molecular data and morphological studies using images. The results demonstrate a significant local pattern in TF activity across the tissue, indicating differential spatial-regulation in tissues. The results suggest that the integrative analysis of spatial transcriptomics data with CIE can capture discriminant features and identify localized TF-target links in the tissue

    Variability of grid-cell activity

    Get PDF
    Action potentials of grid cells in the entorhinal cortex of navigating rodents occur every two seconds on average. If one considers the precise temporal sequence of these events, however, it can be seen that they rarely occur in isolation. In fact, the intervals between successive action potentials can be on the order of a few milliseconds. Mapped to the trajectory of the animal, a clear clustering of the action potentials in space can be observed as well. The places where the density of such events is particularly high are called firing fields and are arranged in a hexagonal grid. Regardless of the cell characteristics, the number of spikes observed on different crossings of a field varies strongly. The time between subsequent field crossings is on the order of seconds. We found out that one cause of spike-count variability is that the exact position of the firing fields is not stable over time. In addition, the shifts of the fields were correlated across simultaneously recorded cells. This kind of non-stationarity in the grid-cell network allows conclusions to be drawn about the functioning of this system. Furthermore, dynamic field locations imply that common methods for data analysis of grid-cell recordings can be problematic. Furthermore, we found out that a subset of grid cells, which have particularly high firing rates when crossing a field, can be associated with a peculiarity in the shape of their action potentials: The spikes of some cells are followed by a short afterdepolarization (DAP). At the same time, we discovered cells with even smaller and extremely stereotypical intervals between their spikes. This group of neurons, however, exhibited less pronounced DAPs. Cells with and without DAP did not differ in their spatial firing behavior. Our results imply that different burst behaviors are not directly related to different types of spatial coding. In addition, we suggest that bursting of grid cells could be altered via the mechanisms of DAP formation. In summary, this work shows how details of neuronal activity on two different time scales provide fundamental insights into the processes of spatial navigation. Untethered firing fields and intermittent silences: Why grid‐cell discharge is so variable - Grid cells in medial entorhinal cortex are notoriously variable in their responses, despite the striking hexagonal arrangement of their spatial firing fields. Indeed, when the animal moves through a firing field, grid cells often fire much more vigorously than predicted or do not fire at all. The source of this trial‐to‐trial variability is not completely understood. By analyzing grid‐cell spike trains from mice running in open arenas and on linear tracks, we characterize the phenomenon of “missed” firing fields using the statistical theory of zero inflation. We find that one major cause of grid‐cell variability lies in the spatial representation itself: firing fields are not as strongly anchored to spatial location as the averaged grid suggests. In addition, grid fields from different cells drift together from trial to trial, regardless of whether the environment is real or virtual, or whether the animal moves in light or darkness. Spatial realignment across trials sharpens the grid representation, yielding firing fields that are more pronounced and significantly narrower. These findings indicate that ensembles of grid cells encode relative position more reliably than absolute position. Spike Afterpotentials Shape the In Vivo Burst Activity of Principal Cells in Medial Entorhinal Cortex - Principal neurons in rodent medial entorhinal cortex (MEC) generate high-frequency bursts during natural behavior. While in vitro studies point to potential mechanisms that could support such burst sequences, it remains unclear whether these mechanisms are effective under in vivo conditions. In this study, we focused on the membrane-potential dynamics immediately following action potentials (APs), as measured in whole-cell recordings from male mice running in virtual corridors (Domnisoru et al., 2013). These afterpotentials consisted either of a hyperpolarization, an extended ramp-like shoulder, or a depolarization reminiscent of depolarizing afterpotentials (DAPs) recorded in vitro in MEC principal neurons. Next, we correlated the afterpotentials with the cells' propensity to fire bursts. All DAP cells with known location resided in Layer II, generated bursts, and their interspike intervals (ISIs) were typically between 5 and 15 ms. The ISI distributions of Layer-II cells without DAPs peaked sharply at around 4 ms and varied only minimally across that group. This dichotomy in burst behavior is explained by cell-group-specific DAP dynamics. The same two groups of bursting neurons also emerged when we clustered extracellular spike-train autocorrelations measured in real 2D arenas (Latuske et al., 2015). Apart from slight variations in grid spacing, no difference in the spatial coding properties of the grid cells across all three groups was discernible. Layer III neurons were only sparsely bursting (SB) and had no DAPs. As various mechanisms for modulating ion-channels underlying DAPs exist, our results suggest that temporal features of MEC activity can be altered while maintaining the cells' overall spatial tuning characteristics

    Statistical inference on evolutionary processes in Alpine ibex (Capra ibex): mutation, migration and selection

    Get PDF
    The thesis begins with a general introduction to population genetics in chapter 1. I review the fundamental processes of evolution - mutation, recombination, selection, gene flow and genetic drift - and give an overview of Bayesian inference in statistical population genetics. Later, I introduce the studied species, Alpine ibex (Capra ibex ), and its recent history. This history is intimately linked to the structured population in the Swiss Alps that provides the source of genetic data for this thesis. A particular focus is devoted to approximate Bayesian computation (ABC) in chapter 2, a method of inference that has become important over the last 15 years and is convenient for complex problems of inference. In chapter 3, the biological focus is on estimating the distribution of mutation rates across neutral genetic variation (microsatellites), and on inferring the proportion of male ibex that obtain access to matings each breeding season. The latter is an important determinant of genetic drift. Methodologically, I compare different methods for the choice of summary statistics in ABC. One of the approaches proposed by collaborators and me and based on boosting (a technique developed in machine learning) is found to perform best in this case. Applying that method to microsatellite data from Alpine ibex, I estimate the scaled ancestral mutation rate (THETA anc = 4Neu) to about 1:288, and find that most of the variation across loci of the ancestral mutation rate u is between 7.7*10^-4 and 3.5*10^-3. The proportion of males with access to matings per breeding season is estimated to about 21%. Chapter 4 is devoted to the estimation of migration rates between a large number of pairs of populations. Again, I use ABC for inference. Estimating all rates jointly comes with substantial methodological problems. Therefore, I assess if, by dividing the whole problem into smaller ones and assuming that those are approximately independent, more accuracy may be achieved overall. The net accuracy of the second approach increases with the number of migration rates. Applying that approach to microsatellite data from Alpine ibex, and accounting for the possibility that a model without migration could also explain the data, I find no evidence for substantial gene flow via migration, except for one pair of demes in one direction. While chapters 3 and 4 deal with neutral variation, in chapter 5 I investigate if an allele of the Major Histocompatibility Complex (MHC) has been under selection over the last ten generations. Short- and medium-term methods for detecting signals of selection are combined. For the medium-term analysis, I adapt a matrix iteration approach that allows for joint estimation of the initial allele frequency, the dominance coefficient, and the strength of selection. The focal MHC allele is shared with domestic goat, and an interesting side issue is if this reflects an ancestral polymorphism or is due to recent introgression via hybridization. I find most evidence for asymmetric overdominance (selection coefficient s: 0.974; equilibrium frequency: 0.125) or directional selection against the `goat' allele (s: 0.5) with partial recessivity. Both scenarios suggest a disadvantage of the `goat' homozygote, but differ in the relative fitness of the heterozygotes. Overall, two aspects play a dominating role in this thesis: the biological questions and the process of inference. They are linked, yet while the proximate motivation for the biological component is given by a specific system - the structured population of Alpine ibex in the Swiss Alps - the methods used and advanced here are fairly general and may well be applied in different contexts

    Evaluation of an evaluation list for model complexity

    Get PDF
    This study (‘WOt-werkdocument’) builds on the project ‘Evaluation model complexity’, in which a list has been developed to assess the ‘equilibrium’ of a model or database. This list compares the complexity of a model or database with the availability and quality of data and the application area. A model or database is said to be in equilibrium if the uncertainty in the predictions by the model or database is appropriately small for the intended application, while the data availability supports this complexity. In this study the prototype of the list is reviewed and tested by applying it to test cases. The review has been performed by modelling experts from within and outside Wageningen University & Research centre (Wageningen UR). The test cases have been selected form the scientific literature in order to evaluate the various elements of the list. The results are used to update the list to a new version
    • 

    corecore