87 research outputs found

    Looking beyond the hype : applied AI and machine learning in translational medicine

    Get PDF
    Big data problems are becoming more prevalent for laboratory scientists who look to make clinical impact. A large part of this is due to increased computing power, in parallel with new technologies for high quality data generation. Both new and old techniques of artificial intelligence (AI) and machine learning (ML) can now help increase the success of translational studies in three areas: drug discovery, imaging, and genomic medicine. However, ML technologies do not come without their limitations and shortcomings. Current technical limitations and other limitations including governance, reproducibility, and interpretation will be discussed in this article. Overcoming these limitations will enable ML methods to be more powerful for discovery and reduce ambiguity within translational medicine, allowing data-informed decision-making to deliver the next generation of diagnostics and therapeutics to patients quicker, at lowered costs, and at scale

    Nonnegative matrix factorization with polynomial signals via hierarchical alternating least squares

    No full text
    Nonnegative matrix factorization (NMF) is a widely used tool in data analysis due to its ability to extract significant features from data vectors. Among algorithms developed to solve NMF, hierarchical alternating least squares (HALS) is often used to obtain state-of-the-art results. We generalize HALS to tackle an NMF problem where both input data and features consist of nonnegative polynomial signals. Compared to standard HALS applied to a discretization of the problem, our algorithm is able to recover smoother features, with a computational time growing moderately with the number of observations compared to existing approaches

    Joint optimization of predictive performance and selection stability

    No full text
    Current feature selection methods, especially applied to high dimensional data, tend to suffer from instability since marginal modifications in the data may result in largely distinct selected feature sets. Such instability strongly limits a sound interpretation of the selected variables by domain experts. We address this issue by optimizing jointly the predictive accuracy and selection stability and by deriving Pareto-optimal trajectories. Our approach extends the Recursive Feature Elimination algorithm by enforcing the selection of some features based on a stable, univariate criterion. Experiments conducted on several high dimensional microarray datasets illustrate that large stability gains are obtained with no significant drop of accuracy

    Compressive Learning of Generative Networks

    No full text
    Generative networks implicitly approximate complex densities from their sampling with impressive accuracy. However, because of the enormous scale of modern datasets, this training process is often computationally expensive. We cast generative network training into the recent framework of compressive learning: we reduce the computational burden of large-scale datasets by first harshly compressing them in a single pass as a single sketch vector. We then propose a cost function, which approximates the Maximum Mean Discrepancy metric, but requires only this sketch, which makes it time-and memory-efficient to optimize

    Improving accuracy by reducing the importance of hubs in nearest-neighbor recommendations

    No full text
    A traditional approach for recommending items to persons consists of including a step of forming neighborhoods of users/items. This work focuses on such nearest-neighbor approaches and, more specically, on a particular type of neighbors, the ones frequently appearing in the neighborhoods of users/items (i.e., very similar to many other users/items in the data set), referred to as hubs in the literature. The aim of this paper is to explore through experiments how the presence of hubs aects the accuracy of nearest-neighbor recommendations

    Survival Analysis with Cox Regression and Random Non-linear Projections

    No full text
    Proportional Cox hazard models are commonly used in survival analysis, since they define risk scores which can be directly interpreted in terms of hazards. Yet they cannot account for non-linearities in their covariates. This paper shows how to use random non-linear projections to efficiently address this limitation

    Lower bounds on the nonnegative rank using a nested polytopes formulation

    No full text
    Computing the nonnegative rank of a nonnegative matrix has been proven to be, in general, NP-hard. However, this quantity has many interesting applications, e.g., it can be used to compute the ex- tension complexity of polytopes. Therefore researchers have been trying to approximate this quantity as closely as possible with strong lower and upper bounds. In this work, we introduce a new lower bound on the nonnegative rank based on a representation of the matrix as a pair of nested polytopes. The nonnegative rank then corresponds to the minimum num-er of vertices of any polytope nested between these two polytopes. Using the geometric concept of supporting corner, we introduce a parametrized family of computable lower bounds and present preliminary numerical results on slack matrices of regular polygons

    The Sum-over-Forests clustering

    No full text
    This work introduces a novel way to identify dense regions in a graph based on a mode-seeking clustering technique, relying on the Sum-Over-Forests (SoF) density index (which can easily be computed in closed form through a simple matrix inversion) as a local density estimator. We first identify the modes of the SoF density in the graph. Then, the nodes of the graph are assigned to the cluster corresponding to the nearest mode, according to a new kernel, also based on the SoF framework. Experiments on artificial and real datasets show that the proposed index performs well in nodes clustering

    The stability of feature selection and class prediction from ensemble tree classifiers

    No full text
    The bootstrap aggregating procedure at the core of ensemble tree classifiers reduces, in most cases, the variance of such models while offering good generalization capabilities. The average predictive performance of those ensembles is known to improve up to a certain point while increasing the ensemble size. The present work studies this convergence in contrast to the stability of the class prediction and the variable selection performed while and after growing the ensemble. Experiments on several biomedical datasets, using random forests or bagging of decision trees,show that class prediction and, most notably, variable selection typically require orders of magnitude more trees to get stable
    corecore