181 research outputs found

    Data Mining of Biomedical Databases

    Get PDF
    Data mining can be defined as the nontrivial extraction of implicit, previously unknown and potentially useful information from data. This thesis is focused on Data Mining in Biomedicine, representing one of the most interesting fields of application. Different kinds of biomedical data sets would require different data mining approaches. Two approaches are treated in this thesis, divided in two separate and independent parts. The first part deals with Bayesian Networks, representing one of the most successful tools for medical diagnosis and therapies follow-up. Formally, a Bayesian Network (BN) is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph. An algorithm for Bayesian network structure learning that is a variation of the standard search-and-score approach has been developed. The proposed approach overcomes the creation of redundant network structures that may include non significant connections between variables. In particular, the algorithm finds which relationships between the variables must be prevented, by exploiting the binarization of a square matrix containing the mutual information (MI) among all pairs of variables. Four different binarization methods are implemented. The MI binary matrix is exploited as a pre-conditioning step for the subsequent greedy search procedure that optimizes the network score, reducing the number of possible search paths in the greedy search procedure. This approach has been tested on four different datasets and compared against the standard search-and-score algorithm as implemented in the DEAL package, with successful results. Moreover, a comparison among different network scores has been performed. The second part of this thesis is focused on data mining of microarray databases. An algorithm able to perform the analysis of Illumina microRNA microarray data in a systematic and easy way has been developed. The algorithm includes two parts. The first part is the pre-processing, characterized by two steps: variance stabilization and normalization. Variance stabilization has to be performed to abrogate or at least reduce the heteroskedasticity while normalization has to be performed to minimize systematic effects that are not constant among different samples of an experiment and that are not due to the factors under investigation. Three alternative variance stabilization strategies and three alternative normalization approaches are included. So, considering all the possible combinations between variance stabilization and normalization strategies, 9 different ways to pre-process the data are obtained. The second part of the algorithm deals with the statistical analysis for the differential expression detection. Linear models and empirical Bayes methods are used. The final result is the list of the microRNAs significantly differentially-expressed in two different conditions. The algorithm has been tested on three different real datasets and partially validated with an independent approach (quantitative real time PCR). Moreover, the influence of the use of different preprocessing methods on the discovery of differentially expressed microRNAs has been studied and a comparison among the different normalization methods has been performed. This is the first study comparing normalization techniques for Illumina microRNA microarray data

    Mechanisms of receptor tyrosine kinase signaling diversity: a focus in cardiac growth

    Get PDF
    To understand organism function and disease and to target perturbed processes for therapy, comprehensive knowledge of the underlying cell signaling networks is required. However, mapping the interplay of the vast number of biomolecules involved in these networks remains challenging. As a result, efforts have focused on identifying the structural elements within biomolecules that facilitate signal transmission. Receptor tyrosine kinases (RTKs) regulate the function of several important organs and are most recognized as oncogenes in cancer. Research into the structural determinants of RTKs that govern their signaling has led to clinically approved therapies. However, some structural regions of these kinases remain poorly understood. In this thesis, the diversity of cell signaling arising from variation in an overlooked region in RTKs known as the extracellular juxtamembrane region was explored. A sequence motif that controls the cell surface location and the signaling of RTKs was identified, presenting a potential novel way to target RTKs for therapy. The cell signaling pathways that regulate myocardial growth could be putatively re-activated to treat heart failure or inhibited to treat pathological hypertrophy. Additionally, these pathways may hold the key to regenerating the myocardium post-injury. A pathway promoting myocardial growth involving STAT5b and the RTK ErbB4 was uncovered in this thesis. VEGFB, traditionally associated with endothelial cells, was additionally observed to elicit myocardial growth through paracrine signaling involving ErbB RTKs. Activation of ErbB4 pathways in the heart with NRG-1 has improved the cardiac function of heart failure patients implying that the discoveries made in this thesis may aid in heart failure therapy development. Finally, recent developments in omics technologies have facilitated the detection and quantification of the different layers of cell signaling networks. Consequently, a growing need for computational analyses capable of reverse-engineering cell signaling pathways from multi-omics data has emerged. In this thesis, a new computational approach specifically designed to discover cell signaling pathways from multi-omics data without the use of prior information was developed. These types of de novo methods remain essential for uncovering new cell signaling connections, which, in turn, can unveil potential new drug targets to treat disease. Reseptorityrosiinikinaasien viestinnÀn monimuotoisuuden mekanismit: painotus sydÀnlihaksen kasvussa Elimistön toiminnan ja sairauksien ymmÀrtÀminen sekÀ lÀÀkekehitys edellyttÀÀ kattavaa tietoa solujen soluviestintÀverkostoista. Koska soluviestintÀmolekyylejÀ on lukuisia, soluviestinnÀn tutkimus on keskittynyt löytÀmÀÀn toistuvia rakenteellisia soluviestintÀÀ vÀlittÀviÀ alueita soluviestintÀmolekyyleistÀ. Reseptorityrosiinikinaasit (RTK:t) ovat solun pinnan soluviestintÀmolekyylejÀ, jotka sÀÀtelevÀt useita elimistön tÀrkeitÀ toimintoja ja joiden rakenteen tutkimus on johtanut useisiin kÀytössÀ oleviin lÀÀkkeisiin. RTK:iden rakenteessa sijaitsee alue solun ulkopuolella, jonka merkitystÀ ei ole aikaisemmin juurikaan selvitetty. TÀmÀn alueen vaikutusta RTK:iden viestinnÀn monimuotoisuudelle tutkittiin tÀssÀ vÀitöskirjassa. Alueelta löydettiin sekvenssimotiivi, joka sÀÀtelee RTK:iden sijaintia solun pinnalla sekÀ niiden viestintÀÀ. Alueelle voidaan tulevaisuudessa mahdollisesti kohdentaa RTK:iden viestintÀÀ muuttavia lÀÀkkeitÀ. SoluviestintÀreittejÀ, jotka sÀÀtelevÀt sydÀnlihaksen kasvua, voidaan mahdollisesti aktivoida sydÀmen vajaatoiminnan hoitamiseksi tai estÀÀ vahingollisen sydÀmen liikakasvun lieventÀmiseksi. LisÀksi nÀitÀ soluviestintÀreittejÀ voidaan hyödyntÀÀ vaurion jÀlkeiseen sydÀnlihassolujen regeneraatioon. SydÀnlihaksen kasvun soluviestintÀreitteihin liittyviÀ havaintoja tehtiin tÀssÀ vÀitöskirjassa. RTK ErbB4:n todettiin aiheuttavan sydÀnlihaksen kasvua STAT5b viestinnÀn kautta. RTK ligandi VEGF-B:n puolestaan todettiin vaikuttavan sydÀnlihaksen kasvuun ErbB RTK:iden viestinnÀn avulla. Koska ErbB4 viestinnÀn aktivointi on parantanut sydÀmen vajaatoimintapotilaiden sydÀmen toimintaa, nÀmÀ havainnot saattavat edesauttaa sydÀmen vajaatoiminnan hoitojen kehitystÀ. Omiikka-teknologioilla voidaan mitata soluviestintÀverkostojen eri tasoja lÀhes kattavasti. Laskennallisia työkaluja kuitenkin tarvitaan, jotta omiikka-teknologioilla tuotettu tieto voidaan mallintaa soluviestintÀreiteiksi. Uusi soluviestintÀreittien mallinnusohjelma kehitettiin tÀssÀ vÀitöskirjassa. Mallinnusohjelma kÀyttÀÀ ainoastaan omiikka-teknologioilla saatua tietoa soluviestintÀreittien mallinnukseen. TÀmÀn kaltaisia vain mitattuun tietoon perustuvia menetelmiÀ tarvitaan uusien soluviestintÀreittien löytÀmiseksi. Uudet soluviestintÀreittien yhteydet puolestaan voivat paljastaa uusia tautimekanismeja ja toimia uusina lÀÀkekohteina

    Research summary, January 1989 - June 1990

    Get PDF
    The Research Institute for Advanced Computer Science (RIACS) was established at NASA ARC in June of 1983. RIACS is privately operated by the Universities Space Research Association (USRA), a consortium of 62 universities with graduate programs in the aerospace sciences, under a Cooperative Agreement with NASA. RIACS serves as the representative of the USRA universities at ARC. This document reports our activities and accomplishments for the period 1 Jan. 1989 - 30 Jun. 1990. The following topics are covered: learning systems, networked systems, and parallel systems

    Super-resolution:A comprehensive survey

    Get PDF

    A survey of uncertainty in deep neural networks

    Get PDF
    Over the last decade, neural networks have reached almost every field of science and become a crucial part of various real world applications. Due to the increasing spread, confidence in neural network predictions has become more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over- or under-confidence, i.e. are badly calibrated. To overcome this, many researchers have been working on understanding and quantifying uncertainty in a neural network's prediction. As a result, different types and sources of uncertainty have been identified and various approaches to measure and quantify uncertainty in neural networks have been proposed. This work gives a comprehensive overview of uncertainty estimation in neural networks, reviews recent advances in the field, highlights current challenges, and identifies potential research opportunities. It is intended to give anyone interested in uncertainty estimation in neural networks a broad overview and introduction, without presupposing prior knowledge in this field. For that, a comprehensive introduction to the most crucial sources of uncertainty is given and their separation into reducible model uncertainty and irreducible data uncertainty is presented. The modeling of these uncertainties based on deterministic neural networks, Bayesian neural networks (BNNs), ensemble of neural networks, and test-time data augmentation approaches is introduced and different branches of these fields as well as the latest developments are discussed. For a practical application, we discuss different measures of uncertainty, approaches for calibrating neural networks, and give an overview of existing baselines and available implementations. Different examples from the wide spectrum of challenges in the fields of medical image analysis, robotics, and earth observation give an idea of the needs and challenges regarding uncertainties in the practical applications of neural networks. Additionally, the practical limitations of uncertainty quantification methods in neural networks for mission- and safety-critical real world applications are discussed and an outlook on the next steps towards a broader usage of such methods is given

    EEG Based Inference of Spatio-Temporal Brain Dynamics

    Get PDF

    Deep Neural Networks and Tabular Data: Inference, Generation, and Explainability

    Get PDF
    Over the last decade, deep neural networks have enabled remarkable technological advancements, potentially transforming a wide range of aspects of our lives in the future. It is becoming increasingly common for deep-learning models to be used in a variety of situations in the modern life, ranging from search and recommendations to financial and healthcare solutions, and the number of applications utilizing deep neural networks is still on the rise. However, a lot of recent research efforts in deep learning have focused primarily on neural networks and domains in which they excel. This includes computer vision, audio processing, and natural language processing. It is a general tendency for data in these areas to be homogeneous, whereas heterogeneous tabular datasets have received relatively scant attention despite the fact that they are extremely prevalent. In fact, more than half of the datasets on the Google dataset platform are structured and can be represented in a tabular form. The first aim of this study is to provide a thoughtful and comprehensive analysis of deep neural networks' application to modeling and generating tabular data. Apart from that, an open-source performance benchmark on tabular data is presented, where we thoroughly compare over twenty machine and deep learning models on heterogeneous tabular datasets. The second contribution relates to synthetic tabular data generation. Inspired by their success in other homogeneous data modalities, deep generative models such as variational autoencoders and generative adversarial networks are also commonly applied for tabular data generation. However, the use of Transformer-based large language models (which are also generative) for tabular data generation have been received scant research attention. Our contribution to this literature consists of the development of a novel method for generating tabular data based on this family of autoregressive generative models that, on multiple challenging benchmarks, outperformed the current state-of-the-art methods for tabular data generation. Another crucial aspect for a deep-learning data system is that it needs to be reliable and trustworthy to gain broader acceptance in practice, especially in life-critical fields. One of the possible ways to bring trust into a data-driven system is to use explainable machine-learning methods. In spite of this, the current explanation methods often fail to provide robust explanations due to their high sensitivity to the hyperparameter selection or even changes of the random seed. Furthermore, most of these methods are based on feature-wise importance, ignoring the crucial relationship between variables in a sample. The third aim of this work is to address both of these issues by offering more robust and stable explanations, as well as taking into account the relationships between variables using a graph structure. In summary, this thesis made a significant contribution that touched many areas related to deep neural networks and heterogeneous tabular data as well as the usage of explainable machine learning methods

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    Data aware sparse non-negative signal processing

    Get PDF
    Greedy techniques are a well established framework aiming to reconstruct signals which are sparse in some domain of representations. They are renowned for their relatively low computational cost, that makes them appealing from the perspective of real time applications. Within the current work we focus on the explicit case of sparse non–negative signals that finds applications in several aspects of daily life e.g., food analysis, hazardous materials detection etc. The conventional approach to deploy this type of algorithms does not employ benefits from properties that characterise natural data, such as lower dimensional representations, underlying structures. Motivated by these properties of data we are aiming to incorporate methodologies within the domain of greedy techniques that will boost their performance in terms of: 1) computational efficiency and 2) signal recovery improvement (for the remainder of the thesis we will use the term acceleration when referring to the first goal and robustness when we are referring to the second goal). These benefits can be exploited via data aware methodologies that arise, from the Machine Learning and Deep Learning community. Within the current work we are aiming to establish a link among conventional sparse non–negative signal decomposition frameworks that rely on greedy techniques and data aware methodologies. We have explained the connection among data aware methodologies and the challenges associated with the sparse non–negative signal decompositions: 1) acceleration and 2) robustness. We have also introduced the standard data aware methodologies, which are relevant to our problem, and the theoretical properties they have. The practical implementations of the proposed frameworks are provided here. The main findings of the current work can be summarised as follows: ‱ We introduce novel algorithms, theory for the Nearest Neighbor problem. ‱ We accelerate a greedy algorithm for sparse non–negative signal decomposition by incorporating our algorithms within its structure. ‱ We introduce a novel reformulation of greedy techniques from the perspective of a Deep Neural Network that boosts the robustness of greedy techniques. ‱ We introduce the theoretical framework that fingerprints the conditions that lay down the soil for the exact recovery of the signal
    • 

    corecore