86,690 research outputs found
A note on using performance and data profilesfor training algorithms
It is shown how to use the performance and data profile benchmarking tools to
improve algorithms' performance. An illustration for the BFO derivative-free
optimizer suggests that the obtained gains are potentially significant.Comment: 8 pages, 4 tables, 4 figure
Recommended from our members
Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction.
Tumor heterogeneity is a limiting factor in cancer treatment and in the discovery of biomarkers to personalize it. We describe a computational purification tool, ISOpure, to directly address the effects of variable normal tissue contamination in clinical tumor specimens. ISOpure uses a set of tumor expression profiles and a panel of healthy tissue expression profiles to generate a purified cancer profile for each tumor sample and an estimate of the proportion of RNA originating from cancerous cells. Applying ISOpure before identifying gene signatures leads to significant improvements in the prediction of prognosis and other clinical variables in lung and prostate cancer
Sparsity-Based STAP Design Based on Alternating Direction Method with Gain/Phase Errors
We present a novel sparsity-based space-time adaptive processing (STAP)
technique based on the alternating direction method to overcome the severe
performance degradation caused by array gain/phase (GP) errors. The proposed
algorithm reformulates the STAP problem as a joint optimization problem of the
spatio-Doppler profile and GP errors in both single and multiple snapshots, and
introduces a target detector using the reconstructed spatio-Doppler profiles.
Simulations are conducted to illustrate the benefits of the proposed algorithm.Comment: 7 figures, 1 tabl
A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives
In this paper we analyze boosting algorithms in linear regression from a new
perspective: that of modern first-order methods in convex optimization. We show
that classic boosting algorithms in linear regression, namely the incremental
forward stagewise algorithm (FS) and least squares boosting
(LS-Boost()), can be viewed as subgradient descent to minimize the
loss function defined as the maximum absolute correlation between the features
and residuals. We also propose a modification of FS that yields
an algorithm for the Lasso, and that may be easily extended to an algorithm
that computes the Lasso path for different values of the regularization
parameter. Furthermore, we show that these new algorithms for the Lasso may
also be interpreted as the same master algorithm (subgradient descent), applied
to a regularized version of the maximum absolute correlation loss function. We
derive novel, comprehensive computational guarantees for several boosting
algorithms in linear regression (including LS-Boost() and
FS) by using techniques of modern first-order methods in convex
optimization. Our computational guarantees inform us about the statistical
properties of boosting algorithms. In particular they provide, for the first
time, a precise theoretical description of the amount of data-fidelity and
regularization imparted by running a boosting algorithm with a prespecified
learning rate for a fixed but arbitrary number of iterations, for any dataset
Analyzing User Preference for Social Image Recommendation
With the incredibly growing amount of multimedia data shared on the social
media platforms, recommender systems have become an important necessity to ease
users' burden on the information overload. In such a scenario, extensive amount
of heterogeneous information such as tags, image content, in addition to the
user-to-item preferences, is extremely valuable for making effective
recommendations. In this paper, we explore a novel hybrid algorithm termed {\em
STM}, for image recommendation. STM jointly considers the problem of image
content analysis with the users' preferences on the basis of sparse
representation. STM is able to tackle the challenges of highly sparse user
feedbacks and cold-start problmes in the social network scenario. In addition,
our model is based on the classical probabilistic matrix factorization and can
be easily extended to incorporate other useful information such as the social
relationships. We evaluate our approach with a newly collected 0.3 million
social image data set from Flickr. The experimental results demonstrate that
sparse topic modeling of the image content leads to more effective
recommendations, , with a significant performance gain over the
state-of-the-art alternatives
Dictionary Learning for Adaptive GPR Landmine Classification
Ground penetrating radar (GPR) target detection and classification is a
challenging task. Here, we consider online dictionary learning (DL) methods to
obtain sparse representations (SR) of the GPR data to enhance feature
extraction for target classification via support vector machines. Online
methods are preferred because traditional batch DL like K-SVD is not scalable
to high-dimensional training sets and infeasible for real-time operation. We
also develop Drop-Off MINi-batch Online Dictionary Learning (DOMINODL) which
exploits the fact that a lot of the training data may be correlated. The
DOMINODL algorithm iteratively considers elements of the training set in small
batches and drops off samples which become less relevant. For the case of
abandoned anti-personnel landmines classification, we compare the performance
of K-SVD with three online algorithms: classical Online Dictionary Learning,
its correlation-based variant, and DOMINODL. Our experiments with real data
from L-band GPR show that online DL methods reduce learning time by 36-93% and
increase mine detection by 4-28% over K-SVD. Our DOMINODL is the fastest and
retains similar classification performance as the other two online DL
approaches. We use a Kolmogorov-Smirnoff test distance and the
Dvoretzky-Kiefer-Wolfowitz inequality for the selection of DL input parameters
leading to enhanced classification results. To further compare with
state-of-the-art classification approaches, we evaluate a convolutional neural
network (CNN) classifier which performs worse than the proposed approach.
Moreover, when the acquired samples are randomly reduced by 25%, 50% and 75%,
sparse decomposition based classification with DL remains robust while the CNN
accuracy is drastically compromised.Comment: 16 pages, 11 figures, 10 table
Personalized Expertise Search at LinkedIn
LinkedIn is the largest professional network with more than 350 million
members. As the member base increases, searching for experts becomes more and
more challenging. In this paper, we propose an approach to address the problem
of personalized expertise search on LinkedIn, particularly for exploratory
search queries containing {\it skills}. In the offline phase, we introduce a
collaborative filtering approach based on matrix factorization. Our approach
estimates expertise scores for both the skills that members list on their
profiles as well as the skills they are likely to have but do not explicitly
list. In the online phase (at query time) we use expertise scores on these
skills as a feature in combination with other features to rank the results. To
learn the personalized ranking function, we propose a heuristic to extract
training data from search logs while handling position and sample selection
biases. We tested our models on two products - LinkedIn homepage and LinkedIn
recruiter. A/B tests showed significant improvements in click through rates -
31% for CTR@1 for recruiter (18% for homepage) as well as downstream messages
sent from search - 37% for recruiter (20% for homepage). As of writing this
paper, these models serve nearly all live traffic for skills search on LinkedIn
homepage as well as LinkedIn recruiter.Comment: 2015 IEEE International Conference on Big Dat
Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes
Collaborative filtering (CF) and content-based filtering (CBF) have widely
been used in information filtering applications. Both approaches have their
strengths and weaknesses which is why researchers have developed hybrid
systems. This paper proposes a novel approach to unify CF and CBF in a
probabilistic framework, named collaborative ensemble learning. It uses
probabilistic SVMs to model each user's profile (as CBF does).At the prediction
phase, it combines a society OF users profiles, represented by their respective
SVM models, to predict an active users preferences(the CF idea).The combination
scheme is embedded in a probabilistic framework and retains an intuitive
explanation.Moreover, collaborative ensemble learning does not require a global
training stage and thus can incrementally incorporate new data.We report
results based on two data sets. For the Reuters-21578 text data set, we
simulate user ratings under the assumption that each user is interested in only
one category. In the second experiment, we use users' opinions on a set of 642
art images that were collected through a web-based survey. For both data sets,
collaborative ensemble achieved excellent performance in terms of
recommendation accuracy.Comment: Appears in Proceedings of the Nineteenth Conference on Uncertainty in
Artificial Intelligence (UAI2003
DeepProf: Performance Analysis for Deep Learning Applications via Mining GPU Execution Patterns
Deep learning applications are computation-intensive and often employ GPU as
the underlying computing devices. Deep learning frameworks provide powerful
programming interfaces, but the gap between source codes and practical GPU
operations make it difficult to analyze the performance of deep learning
applications. In this paper, through examing the features of GPU traces and
deep learning applications, we use the suffix tree structure to extract the
repeated patten in GPU traces. Performance analysis graphs can be generated
from the preprocessed GPU traces. We further present \texttt{DeepProf}, a novel
tool to automatically process GPU traces and generate performance analysis
reports for deep learning applications. Empirical study verifies the
effectiveness of \texttt{DeepProf} in performance analysis and diagnosis. We
also find out some interesting properties of Tensorflow, which can be used to
guide the deep learning system setup
Operationalizing the Legal Principle of Data Minimization for Personalization
Article 5(1)(c) of the European Union's General Data Protection Regulation
(GDPR) requires that "personal data shall be [...] adequate, relevant, and
limited to what is necessary in relation to the purposes for which they are
processed (`data minimisation')". To date, the legal and computational
definitions of `purpose limitation' and `data minimization' remain largely
unclear. In particular, the interpretation of these principles is an open issue
for information access systems that optimize for user experience through
personalization and do not strictly require personal data collection for the
delivery of basic service.
In this paper, we identify a lack of a homogeneous interpretation of the data
minimization principle and explore two operational definitions applicable in
the context of personalization. The focus of our empirical study in the domain
of recommender systems is on providing foundational insights about the (i)
feasibility of different data minimization definitions, (ii) robustness of
different recommendation algorithms to minimization, and (iii) performance of
different minimization strategies.We find that the performance decrease
incurred by data minimization might not be substantial, but that it might
disparately impact different users---a finding which has implications for the
viability of different formal minimization definitions. Overall, our analysis
uncovers the complexities of the data minimization problem in the context of
personalization and maps the remaining computational and regulatory challenges.Comment: SIGIR 2020 paper: In Proc. of the 43rd International ACM SIGIR
Conference on Research and Development in Information Retrieva
- …