2,450 research outputs found
Covariance Selection Quality and Approximation Algorithms
Ph.D. Thesis. University of Hawaiʻi at Mānoa 2018
Multi-Scale Link Prediction
The automated analysis of social networks has become an important problem due
to the proliferation of social networks, such as LiveJournal, Flickr and
Facebook. The scale of these social networks is massive and continues to grow
rapidly. An important problem in social network analysis is proximity
estimation that infers the closeness of different users. Link prediction, in
turn, is an important application of proximity estimation. However, many
methods for computing proximity measures have high computational complexity and
are thus prohibitive for large-scale link prediction problems. One way to
address this problem is to estimate proximity measures via low-rank
approximation. However, a single low-rank approximation may not be sufficient
to represent the behavior of the entire network. In this paper, we propose
Multi-Scale Link Prediction (MSLP), a framework for link prediction, which can
handle massive networks. The basis idea of MSLP is to construct low rank
approximations of the network at multiple scales in an efficient manner. Based
on this approach, MSLP combines predictions at multiple scales to make robust
and accurate predictions. Experimental results on real-life datasets with more
than a million nodes show the superior performance and scalability of our
method.Comment: 20 pages, 10 figure
Imbalanced Ensemble Classifier for learning from imbalanced business school data set
Private business schools in India face a common problem of selecting quality
students for their MBA programs to achieve the desired placement percentage.
Generally, such data sets are biased towards one class, i.e., imbalanced in
nature. And learning from the imbalanced dataset is a difficult proposition.
This paper proposes an imbalanced ensemble classifier which can handle the
imbalanced nature of the dataset and achieves higher accuracy in case of the
feature selection (selection of important characteristics of students) cum
classification problem (prediction of placements based on the students'
characteristics) for Indian business school dataset. The optimal value of an
important model parameter is found. Numerical evidence is also provided using
Indian business school dataset to assess the outstanding performance of the
proposed classifier
Encrypted statistical machine learning: new privacy preserving methods
We present two new statistical machine learning methods designed to learn on
fully homomorphic encrypted (FHE) data. The introduction of FHE schemes
following Gentry (2009) opens up the prospect of privacy preserving statistical
machine learning analysis and modelling of encrypted data without compromising
security constraints. We propose tailored algorithms for applying extremely
random forests, involving a new cryptographic stochastic fraction estimator,
and na\"{i}ve Bayes, involving a semi-parametric model for the class decision
boundary, and show how they can be used to learn and predict from encrypted
data. We demonstrate that these techniques perform competitively on a variety
of classification data sets and provide detailed information about the
computational practicalities of these and other FHE methods.Comment: 39 page
PAC-Bayesian High Dimensional Bipartite Ranking
This paper is devoted to the bipartite ranking problem, a classical
statistical learning task, in a high dimensional setting. We propose a scoring
and ranking strategy based on the PAC-Bayesian approach. We consider nonlinear
additive scoring functions, and we derive non-asymptotic risk bounds under a
sparsity assumption. In particular, oracle inequalities in probability holding
under a margin condition assess the performance of our procedure, and prove its
minimax optimality. An MCMC-flavored algorithm is proposed to implement our
method, along with its behavior on synthetic and real-life datasets
- …