21 research outputs found

    Avaliação de medidas de similaridade de matrizes kernel aplicadas em classicadores de larga margem para seleção de modelos

    Get PDF
    The purpose of this work is to investigate the behavior of similarity measurements, i.e., Kernel Target Alignment (KTA) and Feature Space-based Kernel Matrix Evaluation Measure (FSM) in relation to their correlation with a large margin classifier - support vector machine, in order to propose and implement a model selection method, constructed by means of two steps: a hyper-parameter selection model and a model for feature selection. The KTA and FSM methods indicate the degree of similarity between kernel matrices determined by an alignment measure. This value of alignment is used as reference for a wrapper model selection construction using the simulated anneling as optimizer. Initial tests are depicted to verify the similarity measurements performance in relation to a large margin classi er aiming to identify the better measure to be adopted in the proposed selection model. Following, the described selection model components are tested separately and their results are exhaustively analyzed.A proposta deste trabalho é investigar o comportamento de medidas de similaridade, como Kernel Target Alignment (KTA) e Feature Space-based Kernel Matrix Evaluation Measure (FSM), e observar suas interações com um classificador de larga margem, construir um modelo de seleção de modelos, implementando seus componentes separadamente: um modelo de seleção de hiperparâmetros e um modelo de seleção de características. Os métodos KTA e FSM indicam o grau de similaridades entre matrizes kernel, retornando um valor de alinhamento. Este alinhamento é utilizando na construção dos modelos de seleção utilizando o método Simulated Annealing. São apresentados testes iniciais indicando o desempenho das medidas de similaridade, para a escolha adequada de qual medida será acoplada ao modelo de seleção proposto. Em seguida são descritos, separadamente, os modelos propostos de seleção, bem como seus resultados comparativos

    Design and HPC implementation of unsupervised Kernel methods in the context of molecular dynamics

    Get PDF
    The thesis represents an extensive research in the multidisciplinary domain formed by the cross contamination of unsupervised learning and molecular dynamics, two research elds that are coming close creating a breeding ground for valuable new concepts and methods. In this context, at rst, we describe a novel engine to perform large scale kernel k-means clustering. We introduce a two-fold approximation strategy to minimize the kernel k-means cost function in which the trade-off between accuracy and execution time is automatically ruled by the available system memory

    Coupled Multiple Kernel Learning for Supervised Classification

    Get PDF
    Multiple kernel learning (MKL) has recently received significant attention due to the fact that it is able to automatically fuse information embedded in multiple base kernels and then find a new kernel for classification or regression. In this paper, we propose a coupled multiple kernel learning method for supervised classification (CMKL-C), which comprehensively involves the intra-coupling within each kernel, inter-coupling among different kernels and coupling between target labels and real ones in MKL. Specifically, the intra-coupling controls the class distribution in a kernel space, the inter-coupling captures the co-information of base kernel matrices, and the last type of coupling determines whether the new learned kernel can make a correct decision. Furthermore, we deduce the analytical solutions to solve the CMKL-C optimization problem for highly efficient learning. Experimental results over eight UCI data sets and three bioinformatics data sets demonstrate the superior performance of CMKL-C in terms of the classification accuracy

    Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems

    Full text link
    We propose two new methods to address the weak scaling problems of KRR: the Balanced KRR (BKRR) and K-means KRR (KKRR). These methods consider alternative ways to partition the input dataset into p different parts, generating p different models, and then selecting the best model among them. Compared to a conventional implementation, KKRR2 (optimized version of KKRR) improves the weak scaling efficiency from 0.32% to 38% and achieves a 591times speedup for getting the same accuracy by using the same data and the same hardware (1536 processors). BKRR2 (optimized version of BKRR) achieves a higher accuracy than the current fastest method using less training time for a variety of datasets. For the applications requiring only approximate solutions, BKRR2 improves the weak scaling efficiency to 92% and achieves 3505 times speedup (theoretical speedup: 4096 times).Comment: This paper has been accepted by ACM International Conference on Supercomputing (ICS) 201

    A Policy Gradient Method for Confounded POMDPs

    Full text link
    In this paper, we propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of conditional moment restrictions and adopt the min-max learning procedure with general function approximation for estimating the policy gradient. We then provide a finite-sample non-asymptotic bound for estimating the gradient uniformly over a pre-specified policy class in terms of the sample size, length of horizon, concentratability coefficient and the measure of ill-posedness in solving the conditional moment restrictions. Lastly, by deploying the proposed gradient estimation in the gradient ascent algorithm, we show the global convergence of the proposed algorithm in finding the history-dependent optimal policy under some technical conditions. To the best of our knowledge, this is the first work studying the policy gradient method for POMDPs under the offline setting.Comment: 95 pages, 3 figure

    Faktorizacija matrik nizkega ranga pri učenju z večjedrnimi metodami

    Full text link
    The increased rate of data collection, storage, and availability results in a corresponding interest for data analyses and predictive models based on simultaneous inclusion of multiple data sources. This tendency is ubiquitous in practical applications of machine learning, including recommender systems, social network analysis, finance and computational biology. The heterogeneity and size of the typical datasets calls for simultaneous dimensionality reduction and inference from multiple data sources in a single model. Matrix factorization and multiple kernel learning models are two general approaches that satisfy this goal. This work focuses on two specific goals, namely i) finding interpretable, non-overlapping (orthogonal) data representations through matrix factorization and ii) regression with multiple kernels through the low-rank approximation of the corresponding kernel matrices, providing non-linear outputs and interpretation of kernel selection. The motivation for the models and algorithms designed in this work stems from RNA biology and the rich complexity of protein-RNA interactions. Although the regulation of RNA fate happens at many levels - bringing in various possible data views - we show how different questions can be answered directly through constraints in the model design. We have developed an integrative orthogonality nonnegative matrix factorization (iONMF) to integrate multiple data sources and discover non-overlapping, class-specific RNA binding patterns of varying strengths. We show that the integration of multiple data sources improves the predictive accuracy of retrieval of RNA binding sites and report on a number of inferred protein-specific patterns, consistent with experimentally determined properties. A principled way to extend the linear models to non-linear settings are kernel methods. Multiple kernel learning enables modelling with different data views, but are limited by the quadratic computation and storage complexity of the kernel matrix. Considerable savings in time and memory can be expected if kernel approximation and multiple kernel learning are performed simultaneously. We present the Mklaren algorithm, which achieves this goal via Incomplete Cholesky Decomposition, where the selection of basis functions is based on Least-angle regression, resulting in linear complexity both in the number of data points and kernels. Considerable savings in approximation rank are observed when compared to general kernel matrix decompositions and comparable to methods specialized to particular kernel function families. The principal advantages of Mklaren are independence of kernel function form, robust inducing point selection and the ability to use different kernels in different regions of both continuous and discrete input spaces, such as numeric vector spaces, strings or trees, providing a platform for bioinformatics. In summary, we design novel models and algorithms based on matrix factorization and kernel learning, combining regression, insights into the domain of interest by identifying relevant patterns, kernels and inducing points, while scaling to millions of data points and data views.V času pospešenega zbiranja, organiziranja in dostopnosti podatkov se pojavlja potreba po razvoju napovednih modelov na osnovi hkratnega učenja iz več podatkovnih virov. Konkretni primeri uporabe obsegajo področja strojnega učenja, priporočilnih sistemov, socialnih omrežij, financ in računske biologije. Heterogenost in velikost tipičnih podatkovnih zbirk vodi razvoj postopkov za hkratno zmanjšanje velikosti (zgoščevanje) in sklepanje iz več virov podatkov v skupnem modelu. Matrična faktorizacija in jedrne metode (ang. kernel methods) sta dve splošni orodji, ki omogočata dosego navedenega cilja. Pričujoče delo se osredotoča na naslednja specifična cilja: i) iskanje interpretabilnih, neprekrivajočih predstavitev vzorcev v podatkih s pomočjo ortogonalne matrične faktorizacije in ii) nadzorovano hkratno faktorizacijo več jedrnih matrik, ki omogoča modeliranje nelinearnih odzivov in interpretacijo pomembnosti različnih podatkovnih virov. Motivacija za razvoj modelov in algoritmov v pričujočem delu izhaja iz RNA biologije in bogate kompleksnosti interakcij med proteini in RNA molekulami v celici. Čeprav se regulacija RNA dogaja na več različnih nivojih - kar vodi v več podatkovnih virov/pogledov - lahko veliko lastnosti regulacije odkrijemo s pomočjo omejitev v fazi modeliranja. V delu predstavimo postopek hkratne matrične faktorizacije z omejitvijo, da se posamezni vzorci v podatkih ne prekrivajo med seboj - so neodvisni oz. ortogonalni. V praksi to pomeni, da lahko odkrijemo različne, neprekrivajoče načine regulacije RNA s strani različnih proteinov. Z vzključitvijo več podatkovnih virov izboljšamo napovedno točnost pri napovedovanju potencialnih vezavnih mest posameznega RNA-vezavnega proteina. Vzorci, odkriti iz podatkov so primerljivi z eksperimentalno določenimi lastnostmi proteinov in obsegajo kratka zaporedja nukleotidov na RNA, kooperativno vezavo z drugimi proteini, RNA strukturnimi lastnostmi ter funkcijsko anotacijo. Klasične metode matrične faktorizacije tipično temeljijo na linearnih modelih podatkov. Jedrne metode so eden od načinov za razširitev modelov matrične faktorizacije za modeliranje nelinearnih odzivov. Učenje z več jedri (ang. Multiple kernel learning) omogoča učenje iz več podatkovnih virov, a je omejeno s kvadratno računsko zahtevnostjo v odvisnosti od števila primerov v podatkih. To omejitev odpravimo z ustreznimi približki pri izračunu jedrnih matrik (ang. kernel matrix). V ta namen izboljšamo obstoječe metode na način, da hkrati izračunamo aproksimacijo jedrnih matrik ter njihovo linearno kombinacijo, ki modelira podan tarčni odziv. To dosežemo z metodo Mklaren (ang. Multiple kernel learning based on Least-angle regression), ki je sestavljena iz Nepopolnega razcepa Choleskega in Regresije najmanjših kotov (ang. Least-angle regression). Načrt algoritma vodi v linearno časovno in prostorsko odvisnost tako glede na število primerov v podatkih kot tudi glede na število jedrnih funkcij. Osnovne prednosti postopka so poleg računske odvisnosti tudi splošnost oz. neodvisnost od uporabljenih jedrnih funkcij. Tako lahko uporabimo različne, splošne jedrne funkcije za modeliranje različnih delov prostora vhodnih podatkov, ki so lahko zvezni ali diskretni, npr. vektorski prostori, prostori nizov znakov in drugih podatkovnih struktur, kar je prikladno za uporabo v bioinformatiki. V delu tako razvijemo algoritme na osnovi hkratne matrične faktorizacije in jedrnih metod, obravnavnamo modele linearne in nelinearne regresije ter interpretacije podatkovne domene - odkrijemo pomembna jedra in primere podatkov, pri čemer je metode mogoče poganjati na milijonih podatkovnih primerov in virov
    corecore