1,206 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Prognostic modelling of breast cancer patients: a benchmark of predictive models with external validation

    Get PDF
    Dissertação apresentada para obtenção do Grau de Doutor em Engenharia Electrotécnica e de Computadores – Sistemas Digitais e Percepcionais pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaThere are several clinical prognostic models in the medical field. Prior to clinical use, the outcome models of longitudinal cohort data need to undergo a multi-centre evaluation of their predictive accuracy. This thesis evaluates the possible gain in predictive accuracy in multicentre evaluation of a flexible model with Bayesian regularisation, the (PLANN-ARD), using a reference data set for breast cancer, which comprises 4016 records from patients diagnosed during 1989-93 and reported by the BCCA, Canada, with follow-up of 10 years. The method is compared with the widely used Cox regression model. Both methods were fitted to routinely acquired data from 743 patients diagnosed during 1990-94 at the Christie Hospital, UK, with follow-up of 5 years following surgery. Methodological advances developed to support the external validation of this neural network with clinical data include: imputation of missing data in both the training and validation data sets; and a prognostic index for stratification of patients into risk groups that can be extended to non-linear models. Predictive accuracy was measured empirically with a standard discrimination index, Ctd, and with a calibration measure, using the Hosmer-Lemeshow test statistic. Both Cox regression and the PLANN-ARD model are found to have similar discrimination but the neural network showed marginally better predictive accuracy over the 5-year followup period. In addition, the regularised neural network has the substantial advantage of being suited for making predictions of hazard rates and survival for individual patients. Four different approaches to stratify patients into risk groups are also proposed, each with a different foundation. While it was found that the four methodologies broadly agree, there are important differences between them. Rules sets were extracted and compared for the two stratification methods, the log-rank bootstrap and by direct application of regression trees, and with two rule extraction methodologies, OSRE and CART, respectively. In addition, widely used clinical breast cancer prognostic indexes such as the NPI, TNM and St. Gallen consensus rules, were compared with the proposed prognostic models expressed as regression trees, concluding that the suggested approaches may enhance current practice. Finally, a Web clinical decision support system is proposed for clinical oncologists and for breast cancer patients making prognostic assessments, which is tailored to the particular characteristics of the individual patient. This system comprises three different prognostic modelling methodologies: the NPI, Cox regression modelling and PLANN-ARD. For a given patient, all three models yield a generally consistent but not identical set of prognostic indices that can be analysed together in order to obtain a consensus and so achieve a more robust prognostic assessment of the expected patient outcome

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Complexity, Emergent Systems and Complex Biological Systems:\ud Complex Systems Theory and Biodynamics. [Edited book by I.C. Baianu, with listed contributors (2011)]

    Get PDF
    An overview is presented of System dynamics, the study of the behaviour of complex systems, Dynamical system in mathematics Dynamic programming in computer science and control theory, Complex systems biology, Neurodynamics and Psychodynamics.\u

    Compositionality, stability and robustness in probabilistic machine learning

    Get PDF
    Probability theory plays an integral part in the field of machine learning. Its use has been advocated by many [MacKay, 2002; Jaynes, 2003] as it allows for the quantification of uncertainty and the incorporation of prior knowledge by simply applying the rules of probability [Kolmogorov, 1950]. While probabilistic machine learning has been originally restricted to simple models, the advent of new computational technologies, such as automatic differentiation, and advances in approximate inference, such as Variational Inference [Blei et al., 2017], has made it more viable in complex settings. Despite this progress, there remain many challenges to its application to real-world tasks. Among those are questions about the ability of probabilistic models to model complex tasks and their reliability both in training and in the face of unexpected data perturbation. These three issues can be addressed by examining the three properties of compositionality, stability and robustness in these models. Hence, this thesis explores these three key properties and their application to probabilistic models, while validating their importance on a range of applications. The first contribution in this thesis studies compositionality. Compositionality enables the construction of complex and expressive probabilistic models from simple components. This increases the types of phenomena that one can model and provides the modeller with a wide array of modelling options. This thesis examines this property through the lens of Gaussian processes [Rasmussen and Williams, 2006]. It proposes a generic compositional Gaussian process model to address the problem of multi-task learning in the non-linear setting. Additionally, this thesis contributes two methods addressing the issue of stability. Stability determines the reliability of inference algorithms in the presence of noise. More stable training procedures lead to faster, more reliable inferences, especially for complex models. The two proposed methods aim at stabilising stochastic gradient estimation in Variational Inference using the method of control variates [Owen, 2013]. Finally, the last contribution of this thesis considers robustness. Robust machine learning methods are unaffected by unaccounted-for phenomena in the data. This makes such methods essential in deploying machine learning on real-world datasets. This thesis examines the problem of robust inference in sequential probabilistic models by combining the ideas of Generalised Bayesian Inference [Bissiri et al., 2016] and Sequential Monte Carlo sampling [Doucet and Johansen, 2011]

    Supervised machine learning algorithms for the estimation of the probability of default in corporate credit risk

    Get PDF
    This thesis investigates the application of non-linear supervised machine learning algorithms for estimating Probability of Default (PD) of corporate clients. To achieve this, the thesis is separated into three different experiments: 1. The first experiment investigates a wrapper feature selection method and its application on the support vector machines (SVMs) and logistic regression (LR). The logistic regression model is the most popular approach used for estimating PD in a rich default portfolio. However, other alternatives to PD estimation are available. SVMs method is compared to the logistic regression model using the proposed feature selection method. 2. The second experiment investigates the application of artificial neural networks (ANNs) for estimating PD of corporate clients. In particular ANNs are regularized and trained both with classical and Bayesian approach. Furthermore, different network architectures are explored and specifically the Bayesian estimation and regularization is compared to the classical estimation and regularization. 3. The third experiment investigates the k-Nearest Neighbours algorithm (KNNs). This algorithm is trained using both Bayesian and classical methods. KNNs could be efficiently applied to estimating PD. In addition, other supervised machine learning algorithms such as Decision trees (DTs), Linear discriminant analysis (LDA) and Naive Bayes (NB) were applied and their performance summarized and compared to that of the SVMs, ANNs, KNNs and logistic regression. The contribution of this thesis to science is to provide efficient and at the same time applicable methods for estimating PD of corporate clients. This thesis contributes to the existing literature in a number of ways. 1. First, this research proposes an innovative feature selection method for SVMs. 2. Second, this research proposes an innovative Bayesian estimation methods to regularize ANNs. 3. Third, this research proposes an innovative Bayesian approaches to the estimation of KNNs. Nonetheless, the objective of the research is to promote the use of the Bayesian non-linear supervised machine learning methods that are currently not heavily applied in the industry for PD estimation of corporate clients

    Estimation of Dose Distribution for Lu-177 Therapies in Nuclear Medicine

    Get PDF
    In nuclear medicine, two frequent applications of 177-Lu therapy exist: DOTATOC therapy for patients with a neuroendocrine tumor and PSMA thearpy for prostate cancer. During the therapy a pharmaceutical is injected intravenously, which attaches to tumor cells due to its molecular composition. Since the pharmaceutical contains a radioactive 177Lu isotope, tumor cells are destroyed through irradiation. Afterwards the substance is excreted via the kidneys. Since the latter are very sensitive to high energy radiation, it is necessary to compute exactly how much radioactivity can be administered to the patient without endangering healthy organs. This calculation is called dosimetry and currently is made according to the state of the art MIRD method. At the beginning of this work, an error assessment of the established method is presented, which has determined an overall error of 25% in the renal dose value. The presented study improves and personalizes the MIRD method in several respects and reduces individual error estimates considerably. In order to be able to estimate of the amount of activity, first a test dose is injected to the patient. Subsequently, after 4h, 24h, 48h and 72h SPECT images are taken. From these images the activity at each voxel can be obtained a specified time points, i. e. the physical decline and physiological metabolization of the pharmaceutical can be followed in time. To calculate the amount of decay in each voxel from the four SPECT registrations, a time activity curve must be integrated. In this work, a statistical method was developed to estimate the time dependent activity and then integrate a voxel-by-voxel time-activity curve. This procedure results in a decay map for all available 26 patients (13 PSMA/13 DOTATOC). After the decay map has been estimated, a full Monte Carlo simulation has been carried out on the basis of these decay maps to determine a related dose distribution. The simulation results are taken as reference (“Gold Standard”) and compared with methods for an approximate but faster estimation of the dose distribution. Recently, a convolution with Dose Voxel Kernels (DVK) has been established as a standard dose estimation method (Soft Tissue Scaling STS). Thereby a radioactive Lutetium isotope is placed in a cube consisting of soft tissue. Then radiation interactions are simulated for a number of 10^10 decays. The resulting Dose Voxel Kernel is then convolved with the estimated decay map. The result is a dose distribution, which, however, does not take into account any tissue density differences. To take tissue inhomogeneities into account, three methods are described in the literature, namely Center Scaling (CS), Density Scaling (DS), and Percentage Scaling (PS). However, their application did not improve the results of the STS method as is demonstrated in this study. Consequently, a neural network was trained finally to estimate DVKs adapted to the respective individual tissue density distribution. During the convolution process, it uses for each voxel an adapted DVK that was deduced from the corresponding tissue density kernel. This method outperformed the MIRD method, which resulted in an uncertainty of the renal dose between -42.37-10.22% an achieve a reduction in the uncertainty to a range between -26.00%-7.93%. These dose deviations were calculated for 26 patients and relate to the mean renal dose compared with the respective result of the Monte Carlo simulation. In order to improve the estimates of dose distribution even further, a 3D 2D neural network was trained in the second part of the work. This network predicts the dose distribution of an entire patient. In combination with an Empirical Mode Decomposition, this method achieved deviations of only -12.21%-2.13% . The mean deviation of the dose estimates is in the range of the statistical error of the Monte Carlo simulation. In the third part of the work, a neural network was used to automatically segment the kidney, spleen and tumors. Compared to an established segmentation algorithm, the method developed in this work can segment tumors because it uses not only the CT image as input, but also the SPECT image
    corecore