35,292 research outputs found

    Discovering user access pattern based on probabilistic latent factor model

    Full text link
    There has been an increased demand for characterizing user access patterns using web mining techniques since the informative knowledge extracted from web server log files can not only offer benefits for web site structure improvement but also for better understanding of user navigational behavior. In this paper, we present a web usage mining method, which utilize web user usage and page linkage information to capture user access pattern based on Probabilistic Latent Semantic Analysis (PLSA) model. A specific probabilistic model analysis algorithm, EM algorithm, is applied to the integrated usage data to infer the latent semantic factors as well as generate user session clusters for revealing user access patterns. Experiments have been conducted on real world data set to validate the effectiveness of the proposed approach. The results have shown that the presented method is capable of characterizing the latent semantic factors and generating user profile in terms of weighted page vectors, which may reflect the common access interest exhibited by users among same session cluster. © 2005, Australian Computer Society, Inc

    Semi-supervised feature extraction using independent factor analysis

    No full text
    International audienceDimensionality reduction can be efficiently achieved by generative latent variable models such as probabilistic principal component analysis (PPCA) or independent component analysis (ICA), aiming to extract a reduced set of variables (latent variables) from the original ones. In most cases, the learning of these methods is achieved within the unsupervised framework where only unlabeled samples are used. In this paper we investigate the possibility of estimating independent factor analysis model (IFA) and thus projecting original data onto a lower dimensional space, when prior knowledge on the cluster membership of some training samples is incorporated. In the basic IFA model, latent variables are only recovered from their linear observed mixtures (original features). Both the mapping matrix (assumed to be linear) and the latent variable densities (that are assumed to be mutually independent and generated according to mixtures of Gaussians) are learned from observed data. We propose to learn this model within semisupervised framework where the likelihood of both labeled and unlabeled samples is maximized by a generalized expectation-maximization (GEM) algorithm. Experimental results on real data sets are provided to demonstrate the ability of our approach to find law dimensional manifold with good explanatory power

    Interpretable Machine Learning Models for Molecular Design of Tyrosine Kinase Inhibitors Using Variational Autoencoders and Perturbation-Based Approach of Chemical Space Exploration

    Get PDF
    In the current study, we introduce an integrative machine learning strategy for the autonomous molecular design of protein kinase inhibitors using variational autoencoders and a novel cluster-based perturbation approach for exploration of the chemical latent space. The proposed strategy combines autoencoder-based embedding of small molecules with a cluster-based perturbation approach for efficient navigation of the latent space and a feature-based kinase inhibition likelihood classifier that guides optimization of the molecular properties and targeted molecular design. In the proposed generative approach, molecules sharing similar structures tend to cluster in the latent space, and interpolating between two molecules in the latent space enables smooth changes in the molecular structures and properties. The results demonstrated that the proposed strategy can efficiently explore the latent space of small molecules and kinase inhibitors along interpretable directions to guide the generation of novel family-specific kinase molecules that display a significant scaffold diversity and optimal biochemical properties. Through assessment of the latent-based and chemical feature-based binary and multiclass classifiers, we developed a robust probabilistic evaluator of kinase inhibition likelihood that is specifically tailored to guide the molecular design of novel SRC kinase molecules. The generated molecules originating from LCK and ABL1 kinase inhibitors yielded ~40% of novel and valid SRC kinase compounds with high kinase inhibition likelihood probability values (p \u3e 0.75) and high similarity (Tanimoto coefficient \u3e 0.6) to the known SRC inhibitors. By combining the molecular perturbation design with the kinase inhibition likelihood analysis and similarity assessments, we showed that the proposed molecular design strategy can produce novel valid molecules and transform known inhibitors of different kinase families into potential chemical probes of the SRC kinase with excellent physicochemical profiles and high similarity to the known SRC kinase drugs. The results of our study suggest that task-specific manipulation of a biased latent space may be an important direction for more effective task-oriented and target-specific autonomous chemical design models

    Dimensionality reduction of clustered data sets

    Get PDF
    We present a novel probabilistic latent variable model to perform linear dimensionality reduction on data sets which contain clusters. We prove that the maximum likelihood solution of the model is an unsupervised generalisation of linear discriminant analysis. This provides a completely new approach to one of the most established and widely used classification algorithms. The performance of the model is then demonstrated on a number of real and artificial data sets
    • …
    corecore