169,544 research outputs found

    Integrative analysis of large-scale biological data sets

    Get PDF
    We present two novel web-applications for microarray and gene/protein set analysis, ArrayMining.net and TopoGSA. These bioinformatics tools use integrative analysis methods, including ensemble and consensus machine learning techniques, as well as modular combinations of different analysis types, to extract new biological insights from experimental transcriptomics and proteomics data. They enable researchers to combine related algorithms and datasets to increase the robustness and accuracy of statistical analyses and exploit synergies of different computational methods, ranging from statistical learning to optimization and topological network analysis

    Parameter Tuning Using Gaussian Processes

    Get PDF
    Most machine learning algorithms require us to set up their parameter values before applying these algorithms to solve problems. Appropriate parameter settings will bring good performance while inappropriate parameter settings generally result in poor modelling. Hence, it is necessary to acquire the “best” parameter values for a particular algorithm before building the model. The “best” model not only reflects the “real” function and is well fitted to existing points, but also gives good performance when making predictions for new points with previously unseen values. A number of methods exist that have been proposed to optimize parameter values. The basic idea of all such methods is a trial-and-error process whereas the work presented in this thesis employs Gaussian process (GP) regression to optimize the parameter values of a given machine learning algorithm. In this thesis, we consider the optimization of only two-parameter learning algorithms. All the possible parameter values are specified in a 2-dimensional grid in this work. To avoid brute-force search, Gaussian Process Optimization (GPO) makes use of “expected improvement” to pick useful points rather than validating every point of the grid step by step. The point with the highest expected improvement is evaluated using cross-validation and the resulting data point is added to the training set for the Gaussian process model. This process is repeated until a stopping criterion is met. The final model is built using the learning algorithm based on the best parameter values identified in this process. In order to test the effectiveness of this optimization method on regression and classification problems, we use it to optimize parameters of some well-known machine learning algorithms, such as decision tree learning, support vector machines and boosting with trees. Through the analysis of experimental results obtained on datasets from the UCI repository, we find that the GPO algorithm yields competitive performance compared with a brute-force approach, while exhibiting a distinct advantage in terms of training time and number of cross-validation runs. Overall, the GPO method is a promising method for the optimization of parameter values in machine learning

    Efficient Optimization of Dominant Set Clustering with Frank-Wolfe Algorithms

    Get PDF
    We study Frank-Wolfe algorithms - standard, pairwise, and away-steps - for efficient optimization of Dominant Set Clustering. We present a unified and computationally efficient framework to employ the different variants of Frank-Wolfe methods, and we investigate its effectiveness via several experimental studies. In addition, we provide explicit convergence rates for the algorithms in terms of the so-called Frank-Wolfe gap. The theoretical analysis has been specialized to Dominant Set Clustering and covers consistently the different variants

    Efficient Optimization of Dominant Set Clustering with Frank-Wolfe Algorithms

    Get PDF
    We study Frank-Wolfe algorithms -- standard, pairwise, and away-steps -- for efficient optimization of Dominant Set Clustering. We present a unified and computationally efficient framework to employ the different variants of Frank-Wolfe methods, and we investigate its effectiveness via several experimental studies. In addition, we provide explicit convergence rates for the algorithms in terms of the so-called Frank-Wolfe gap. The theoretical analysis has been specialized to the problem of Dominant Set Clustering and is thus more easily accessible compared to prior work

    Riemannian optimization and multidisciplinary design optimization

    Get PDF
    Riemannian Optimization (RO) generalizes standard optimization methods from Euclidean spaces to Riemannian manifolds. Multidisciplinary Design Optimization (MDO) problems exist on Riemannian manifolds, and with the differential geometry framework which we have previously developed, we can now apply RO techniques to MDO. Here, we provide background theory and a literature review for RO and give the necessary formulae to implement the Steepest Descent Method (SDM), Newton’s Method (NM), and the Conjugate Gradient Method (CGM), in Riemannian form, on MDO problems. We then compare the performance of the Riemannian and Euclidean SDM, NM, and CGM algorithms on several test problems (including a satellite design problem from the MDO literature); we use a calculated step size, line search, and geodesic search in our comparisons. With the framework’s induced metric, the RO algorithms are generally not as effective as their Euclidean counterparts, and line search is consistently better than geodesic search. In our post-experimental analysis, we also show how the optimization trajectories for the Riemannian SDM and CGM relate to design coupling and thereby provide some explanation for the observed optimization behaviour. This work is only a first step in applying RO to MDO, however, and the use of quasi-Newton methods and different metrics should be explored in future research.This is the author accepted manuscript. It is currently under an indefinite embargo pending publication by Springer
    • …
    corecore