9 research outputs found

    Rangiranje s hkratnim učenjem več nalog

    No full text

    Ranking by Multitask Learning

    Get PDF
    Instance ranking is a subfield of supervised machine learning and is concerned with inferring predictive models that can rank a set of data instances. We focus on multipartite ranking, where instances belong to one of a limited set of rank classes, study different approaches on synthetic and real data sets, and propose a ranking-specific evaluation framework and a new learning approach that combines multitask learning and binary decomposition. The thesis starts with an analysis of existing ranking approaches. These are used in a practical application of ranking within the domain of molecular biology. In particular, we study embryonic stem cell differentiation posed as a multipartite ranking problem. We critically evaluate several ranking approaches and demonstrate how they can be used to construct accurate predictive models that can rank samples based on their stage of differentiation. For testing, we introduce a framework for evaluation of ranking methods including a generalization of the popular performance measure AUC (area under the ROC curve) that can be used for multipartite problems. We proceed by analysing the family of methods based on binary decomposition, which reduces a ranking problem to a set of binary classification tasks. To improve the process of learning models for these tasks, we suggest to combine it with a multitask learning technique. Specifically, we propose a new ranking method, which we name BDMT, that combines one-against-one decomposition and multitask feature learning. The decomposition allows us to model nonlinear patterns and to simplify the learning domain to problems that are suitable for classical machine learning approaches. Multitask learning allows us to generalize across the decomposed tasks, making them interdependent through a joint regularized optimization. Our experiments show that the addition of multitask learning can greatly improve the performance of one-against-one decomposition and even succeed in outperforming state-of-the-art ranking approaches in certain settings. Learning the models of decomposed tasks simultaneously, increases the stability of model estimation and reduces the sensitivity to perturbations of the training data set. Compared with other ranking methods that are also able to model complex patterns, BDMT remains efficient and can achieve low training times with the use of fast linear base learners. We also show how the method and the patterns it learns can be interpreted. New features learned during the training of BDMT can reveal important hidden factors and hence map the problem into a low-dimensional subspace spanned by a set of novel features. Individual models of decomposed tasks and the similarities between them can be studied to further elucidate the distribution of rank classes in this subspace

    Data representation and mining using multi-layered networks

    Get PDF
    WE PRESENT A NEW TECHNIQUE FOR NETWORK VISUALIZATION AND NETWORK-BASED DATA MINING. STANDARD NETWORK VISUALIZATION TECHNIQUES MOST OFTEN FOCUS ON A SINGLE-TYPE RELATIONS AND ARE USED FOR VISUALIZATION OF A SINGLE DATA SET. IN PRACTICAL PROBLEM SOLVING, HOWEVER, ADDITIONAL DATA SETS AND RELATIONS THAT RELATE THEM ARE AVAILABLE. OUR SPECIFIC GOAL IN THIS THESIS WAS TO ADDRESS THE PROBLEM OF VISUALIZATION OF MULTIPLE DATASETS FROM A RELATIONAL DATABASE. OUR PROPOSED APPROACH IS BASED ON MULTI-LAYER NETWORKS. IN THIS STUDY WE USE ONLY TWO LAYERS REPRESENTING TWO DIFFERENT DATASETS. A METHOD FOR OPTIMIZING THE LAYOUT OF A MULTI-LAYER NETWORK WAS PROPOSED. SEVERAL OBJECTIVE CRITERIA FOR EVALUATION OF NETWORK VISUALIZATIONS WERE ALSO DEVELOPED. SIMULATIONS ON SYNTHETIC DATA SETS SHOWED THAT THE PROPOSED OPTIMIZATION TECHNIQUE PERFORMS WELL IN SIMULTANEOUS OPTIMIZATION OF TWO-LAYERED NETWORK WITH RESPECT TO THE STRUCTURE OF BOTH LAYERS. WE HAVE ALSO STUDIED THE PERFORMANCE OF THE TECHNIQUE IN BIOINFORMATICAL APPLICATION, WHERE A GENE NETWORK WAS SUCCESSFULLY COMPLEMENTED WITH A NETWORK OF MESH TERMS RESULTING IN AN INFORMATIVE TWO-LAYER NETWORK. AFTER THE OPTIMIZATION STEP, SEVERAL MESH TERMS WERE PLACED NEAR RELATED GENE CLUSTERS AND THUS PROVIDED ADDITIONAL INSIGHT INTO THE IDENTIFIED GENE SETS

    Orange: data mining toolbox in Python

    Get PDF
    Orange is a machine learning and data mining suite for data analysis through Python scripting and visual programming. Here we report on the scripting part, which features interactive data analysis and component-based assembly of data mining procedures. In the selection and design of components, we focus on the flexibility of their reuse: our principal intention is to let the user write simple and clear scripts in Python, which build upon C++ implementations of computationally-intensive tasks. Orange is intended both for experienced users and programmers, as well as for students of data mining

    Long-Term Stability Predictions of Therapeutic Monoclonal Antibodies in Solution Using Arrhenius Based Kinetics

    No full text
    Long-term stability of monoclonal antibodies is the key aspect in their development for use as (bio)pharmaceutical products; therefore, possible prediction of long-term stability from accelerated stability studies is of major interest, despite currently regarded as not sufficiently robust. In this work, using combination of accelerated stability studies (up to 6 months) and first order degradation kinetic modelling, we are able to predict long-term stability (up to 3 years) including temperature dependence of changes of multiple quality attributes and for multiple monoclonal antibody formulations. More specifically, we can robustly predict the long-term stability behavior of a protein at the intended storage condition (5°C), based on up to six months data obtained from different temperatures, usually from intended (5°C), accelerated (25°C) and stress conditions (40°C). We have performed stability studies and evaluated the stability data of several mAbs including IgG1, IgG2, IgG4 and fusion proteins and validated our model by overlaying the 95% prediction interval and experimental stability data from up to 36 months. We demonstrated improved robustness, speed and accuracy of kinetic long-term stability prediction as compared to classical linear extrapolation used today, justifying long-term stability prediction and shelf life extrapolation for some biological products such as monoclonal antibodies. More generally, this work aims to contribute towards further development and refinement of the regulatory landscape that will allow extrapolation for biological products during the developmental phase, clinical phase and also in marketing authorization applications, as already established today for small molecules

    Democratized image analytics by visual programming through integration of deep models and small-scale machine learning

    Full text link
    Analysis of biomedical images requires computational expertize that are uncommon among biomedical scientists. Deep learning approaches for image analysis provide an opportunity to develop user-friendly tools for exploratory data analysis. Here, we use the visual programming toolbox Orange (http://orange.biolab.si) to simplify image analysis by integrating deep-learning embedding, machine learning procedures, and data visualization. Orange supports the construction of data analysis workflows by assembling components for data preprocessing, visualization, and modeling. We equipped Orange with components that use pre-trained deep convolutional networks to profile images with vectors of features. These vectors are used in image clustering and classification in a framework that enables mining of image sets for both novel and experienced users. We demonstrate the utility of the tool in image analysis of progenitor cells in mouse bone healing, identification of developmental competence in mouse oocytes, subcellular protein localization in yeast, and developmental morphology of social amoebae
    corecore