Search CORE

7 research outputs found

Subjectively interesting connecting trees

Author: Adriaens Florian
De Bie Tijl
Lijffijt Jefrey
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Ghent University Academic Bibliography

Modeling Relational Data via Latent Factor Blockmodel

Author: Denoyer Ludovic
Gallinari Patrick
Gao Sheng
Publication venue
Publication date: 11/04/2012
Field of study

In this paper we address the problem of modeling relational data, which appear in many applications such as social network analysis, recommender systems and bioinformatics. Previous studies either consider latent feature based models but disregarding local structure in the network, or focus exclusively on capturing local structure of objects based on latent blockmodels without coupling with latent characteristics of objects. To combine the benefits of the previous work, we propose a novel model that can simultaneously incorporate the effect of latent features and covariates if any, as well as the effect of latent structure that may exist in the data. To achieve this, we model the relation graph as a function of both latent feature factors and latent cluster memberships of objects to collectively discover globally predictive intrinsic properties of objects and capture latent block structure in the network to improve prediction performance. We also develop an optimization transfer algorithm based on the generalized EM-style strategy to learn the latent factors. We prove the efficacy of our proposed model through the link prediction task and cluster analysis task, and extensive experiments on the synthetic data and several real world datasets suggest that our proposed LFBM model outperforms the other state of the art approaches in the evaluated tasks.Comment: 10 pages, 12 figure

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce

Author: Chao Liu
Hung-chih Yang
Jinliang Fan
Li-wei He
Yi-min Wang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

The Web abounds with dyadic data that keeps increasing by every single second. Previous work has repeatedly shown the usefulness of extracting the interaction structure inside dyadic data [21, 9, 8]. A commonly used tool in extracting the underlying structure is the matrix factorization, whose fame was further boosted in the Netflix challenge [26]. When we were trying to replicate the same success on real-world Web dyadic data, we were seriously challenged by the scal-ability of available tools. We therefore in this paper report our efforts on scaling up the nonnegative matrix factoriza-tion (NMF) technique. We show that by carefully partition-ing the data and arranging the computations to maximize data locality and parallelism, factorizing a tens of millions by hundreds of millions matrix with billions of nonzero cells can be accomplished within tens of hours. This result ef-fectively assures practitioners of the scalability of NMF on Web-scale dyadic data

CiteSeerX

Crossref

Mining and modeling graphs using patterns and priors

Author: Adriaens Florian
Publication venue: Universiteit Gent. Faculteit Ingenieurswetenschappen en Architectuur
Publication date: 01/01/2020
Field of study

Ghent University Academic Bibliography

Probabilistic Structured Models for Plant Trait Analysis

Author: Fazayeli Farideh
Publication venue
Publication date: 01/03/2017
Field of study

University of Minnesota Ph.D. dissertation. March 2017. Major: Communication Sciences and Disorders. Advisor: Arindam Banerjee. 1 computer file (PDF); xii, 171 pages.Many fields in modern science and engineering such as ecology, computational biology, astronomy, signal processing, climate science, brain imaging, natural language processing, and many more involve collecting data sets in which the dimensionality of the data p exceeds the sample size n. Since it is usually impossible to obtain consistent procedures unless p < n, a line of recent work has studied models with various types of low-dimensional structure, including sparse vectors, sparse structured graphical models, low-rank matrices, and combinations thereof. In such settings, a general approach to estimation is to solve a regularized optimization problem, which combines a loss function measuring how well the model fits the data with some regularization function that encourages the assumed structure. Of particular interest are structure learning of graphical models in high dimensional setting. The majority of statistical analysis of graphical model estimations assume that all the data are fully observed and the data points are sampled from the same distribution and provide the sample complexity and convergence rate by considering only one graphical structure for all the observations. In this thesis, we extend the above results to estimate the structure of graphical models where the data is partially observed or the data is sampled from multiple distributions. First, we consider the problem of estimating change in the dependency structure of two p-dimensional models, based on samples drawn from two graphical models. The change is assumed to be structured, e.g., sparse, block sparse, node-perturbed sparse, etc., such that it can be characterized by a suitable (atomic) norm. We present and analyze a norm-regularized estimator for directly estimating the change in structure, without having to estimate the structures of the individual graphical models. Next, we consider the problem of estimating sparse structure of Gaussian copula distributions (corresponding to non-paranormal distributions) using samples with missing values. We prove that our proposed estimators consistently estimate the non-paranormal correlation matrix where the convergence rate depends on the probability of missing values. In the second part of thesis, we consider matrix completion problem. Low-rank matrix completion methods have been successful in a variety of settings such as recommendation systems. However, most of the existing matrix completion methods only provide a point estimate of missing entries, and do not characterize uncertainties of the predictions. First, we illustrate that the the posterior distribution in latent factor models, such as probabilistic matrix factorization, when marginalized over one latent factor has the Matrix Generalized Inverse Gaussian (MGIG) distribution. We show that the MGIG is unimodal, and the mode can be obtained by solving an Algebraic Riccati Equation equation. The characterization leads to a novel Collapsed Monte Carlo inference algorithm for such latent factor models. Next, we propose a Bayesian hierarchical probabilistic matrix factorization (BHPMF) model to 1) incorporate hierarchical side information, and 2) provide uncertainty quantified predictions. The former yields significant performance improvements in the problem of plant trait prediction, a key problem in ecology, by leveraging the taxonomic hierarchy in the plant kingdom. The latter is helpful in identifying predictions of low confidence which can in turn be used to guide field work for data collection efforts. Finally, we consider applications of probabilistic structured models to plant trait analysis. We apply BHPMF model to fill the gaps in TRY database. The BHPMF model is the-state-of-the-art model for plant trait prediction and is getting increasing visibility and usage in the plant trait analysis. We have submitted a R package for BHPMF to CRAN. Next, we apply the Gaussian graphical model structure estimators to obtain the trait-trait interactions. We study the trait-trait interactions structure at different climate zones and among different plant growth forms and uncover the dependence of traits on climate and on vegetation

University of Minnesota Digital Conservancy

Recommended from our members

Probabilistic Models of Student Learning and Forgetting

Author: Lindsey Robert Victor
Publication venue: CU Scholar
Publication date: 01/01/2014
Field of study

This thesis uses statistical machine learning techniques to construct predictive models of human learning and to improve human learning by discovering optimal teaching methodologies. In Chapters 2 and 3, I present and evaluate models for predicting the changing memory strength of material being studied over time. The models combine a psychological theory of memory with Bayesian methods for inferring individual differences. In Chapter 4, I develop methods for delivering efficient, systematic, personalized review using the statistical models. Results are presented from three large semester-long experiments with middle school students which demonstrate how this \u22big data\u22 approach to education yields substantial gains in the long-term retention of course material. In Chapter 5, I focus on optimizing various aspects of instruction for populations of students. This involves a novel experimental paradigm which combines Bayesian nonparametric modeling techniques and probabilistic generative models of student performance. In Chapters 6 and 7, I present supporting laboratory behavioral studies and theoretical analyses. These include an examination of the relationship between study format and the testing effect, and a parsimonious theoretical account of long-term recency effects

CU Scholar Institutional Repository

A Recommendation System for Preconditioned Iterative Solvers

Author: George Thomas
Publication venue
Publication date
Field of study

Solving linear systems of equations is an integral part of most scientific simulations. In recent years, there has been a considerable interest in large scale scientific simulation of complex physical processes. Iterative solvers are usually preferred for solving linear systems of such magnitude due to their lower computational requirements. Currently, computational scientists have access to a multitude of iterative solver options available as "plug-and- play" components in various problem solving environments. Choosing the right solver configuration from the available choices is critical for ensuring convergence and achieving good performance, especially for large complex matrices. However, identifying the "best" preconditioned iterative solver and parameters is challenging even for an expert due to issues such as the lack of a unified theoretical model, complexity of the solver configuration space, and multiple selection criteria. Therefore, it is desirable to have principled practitioner-centric strategies for identifying solver configuration(s) for solving large linear systems. The current dissertation presents a general practitioner-centric framework for (a) problem independent retrospective analysis, and (b) problem-specific predictive modeling of performance data. Our retrospective performance analysis methodology introduces new metrics such as area under performance-profile curve and conditional variance-based finetuning score that facilitate a robust comparative performance evaluation as well as parameter sensitivity analysis. We present results using this analysis approach on a number of popular preconditioned iterative solvers available in packages such as PETSc, Trilinos, Hypre, ILUPACK, and WSMP. The predictive modeling of performance data is an integral part of our multi-stage approach for solver recommendation. The key novelty of our approach lies in our modular learning based formulation that comprises of three sub problems: (a) solvability modeling, (b) performance modeling, and (c) performance optimization, which provides the flexibility to effectively target challenges such as software failure and multiobjective optimization. Our choice of a "solver trial" instance space represented in terms of the characteristics of the corresponding "linear system", "solver configuration" and their interactions, leads to a scalable and elegant formulation. Empirical evaluation of our approach on performance datasets associated with fairly large groups of solver configurations demonstrates that one can obtain high quality recommendations that are close to the ideal choices

Texas A&M Repository