Search CORE

86,690 research outputs found

A note on using performance and data profilesfor training algorithms

Author: Porcelli Margherita
Toint Philippe L.
Publication venue
Publication date: 26/11/2017
Field of study

It is shown how to use the performance and data profile benchmarking tools to improve algorithms' performance. An illustration for the BFO derivative-free optimizer suggests that the obtained gains are potentially significant.Comment: 8 pages, 4 tables, 4 figure

arXiv.org e-Print Archive

Recommended from our members

Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction.

Author: Boutros Paul C
Cui Ang
Deshwar Amit G
Haider Syed
Morris Quaid
Quon Gerald
Publication venue: eScholarship, University of California
Publication date: 28/03/2013
Field of study

Tumor heterogeneity is a limiting factor in cancer treatment and in the discovery of biomarkers to personalize it. We describe a computational purification tool, ISOpure, to directly address the effects of variable normal tissue contamination in clinical tumor specimens. ISOpure uses a set of tumor expression profiles and a panel of healthy tissue expression profiles to generate a purified cancer profile for each tumor sample and an estimate of the proportion of RNA originating from cancerous cells. Applying ISOpure before identifying gene signatures leads to significant improvements in the prediction of prognosis and other clinical variables in lung and prostate cancer

eScholarship - University of California

Sparsity-Based STAP Design Based on Alternating Direction Method with Gain/Phase Errors

Author: de Lamare Rodrigo C.
Liu Weijian
Yang Zhaocheng
Publication venue
Publication date: 24/06/2017
Field of study

We present a novel sparsity-based space-time adaptive processing (STAP) technique based on the alternating direction method to overcome the severe performance degradation caused by array gain/phase (GP) errors. The proposed algorithm reformulates the STAP problem as a joint optimization problem of the spatio-Doppler profile and GP errors in both single and multiple snapshots, and introduces a target detector using the reconstructed spatio-Doppler profiles. Simulations are conducted to illustrate the benefits of the proposed algorithm.Comment: 7 figures, 1 tabl

arXiv.org e-Print Archive

A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives

Author: Freund Robert M.
Grigas Paul
Mazumder Rahul
Publication venue
Publication date: 16/05/2015
Field of study

In this paper we analyze boosting algorithms in linear regression from a new perspective: that of modern first-order methods in convex optimization. We show that classic boosting algorithms in linear regression, namely the incremental forward stagewise algorithm (FS

_\varepsilon

) and least squares boosting (LS-Boost(

\varepsilon

)), can be viewed as subgradient descent to minimize the loss function defined as the maximum absolute correlation between the features and residuals. We also propose a modification of FS

_\varepsilon

that yields an algorithm for the Lasso, and that may be easily extended to an algorithm that computes the Lasso path for different values of the regularization parameter. Furthermore, we show that these new algorithms for the Lasso may also be interpreted as the same master algorithm (subgradient descent), applied to a regularized version of the maximum absolute correlation loss function. We derive novel, comprehensive computational guarantees for several boosting algorithms in linear regression (including LS-Boost(

\varepsilon

) and FS

_\varepsilon

) by using techniques of modern first-order methods in convex optimization. Our computational guarantees inform us about the statistical properties of boosting algorithms. In particular they provide, for the first time, a precise theoretical description of the amount of data-fidelity and regularization imparted by running a boosting algorithm with a prespecified learning rate for a fixed but arbitrary number of iterations, for any dataset

arXiv.org e-Print Archive

Analyzing User Preference for Social Image Recommendation

Author: Huang Thomas
Liu Xianming
Tsai Min-Hsuan
Publication venue
Publication date: 24/04/2016
Field of study

With the incredibly growing amount of multimedia data shared on the social media platforms, recommender systems have become an important necessity to ease users' burden on the information overload. In such a scenario, extensive amount of heterogeneous information such as tags, image content, in addition to the user-to-item preferences, is extremely valuable for making effective recommendations. In this paper, we explore a novel hybrid algorithm termed {\em STM}, for image recommendation. STM jointly considers the problem of image content analysis with the users' preferences on the basis of sparse representation. STM is able to tackle the challenges of highly sparse user feedbacks and cold-start problmes in the social network scenario. In addition, our model is based on the classical probabilistic matrix factorization and can be easily extended to incorporate other useful information such as the social relationships. We evaluate our approach with a newly collected 0.3 million social image data set from Flickr. The experimental results demonstrate that sparse topic modeling of the image content leads to more effective recommendations, , with a significant performance gain over the state-of-the-art alternatives

arXiv.org e-Print Archive

Dictionary Learning for Adaptive GPR Landmine Classification

Author: Eldar Yonina C.
Ender Joachim H. G.
Giovanneschi Fabio
Gonzalez-Huici Maria Antonia
Mishra Kumar Vijay
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/06/2019
Field of study

Ground penetrating radar (GPR) target detection and classification is a challenging task. Here, we consider online dictionary learning (DL) methods to obtain sparse representations (SR) of the GPR data to enhance feature extraction for target classification via support vector machines. Online methods are preferred because traditional batch DL like K-SVD is not scalable to high-dimensional training sets and infeasible for real-time operation. We also develop Drop-Off MINi-batch Online Dictionary Learning (DOMINODL) which exploits the fact that a lot of the training data may be correlated. The DOMINODL algorithm iteratively considers elements of the training set in small batches and drops off samples which become less relevant. For the case of abandoned anti-personnel landmines classification, we compare the performance of K-SVD with three online algorithms: classical Online Dictionary Learning, its correlation-based variant, and DOMINODL. Our experiments with real data from L-band GPR show that online DL methods reduce learning time by 36-93% and increase mine detection by 4-28% over K-SVD. Our DOMINODL is the fastest and retains similar classification performance as the other two online DL approaches. We use a Kolmogorov-Smirnoff test distance and the Dvoretzky-Kiefer-Wolfowitz inequality for the selection of DL input parameters leading to enhanced classification results. To further compare with state-of-the-art classification approaches, we evaluate a convolutional neural network (CNN) classifier which performs worse than the proposed approach. Moreover, when the acquired samples are randomly reduced by 25%, 50% and 75%, sparse decomposition based classification with DL remains robust while the CNN accuracy is drastically compromised.Comment: 16 pages, 11 figures, 10 table

arXiv.org e-Print Archive

Personalized Expertise Search at LinkedIn

Author: Guo Lin
Ha-Thuc Viet
Rodriguez Mario
Sinha Shakti
Sundaram Senthil
Venkataraman Ganesh
Publication venue
Publication date: 15/02/2016
Field of study

LinkedIn is the largest professional network with more than 350 million members. As the member base increases, searching for experts becomes more and more challenging. In this paper, we propose an approach to address the problem of personalized expertise search on LinkedIn, particularly for exploratory search queries containing {\it skills}. In the offline phase, we introduce a collaborative filtering approach based on matrix factorization. Our approach estimates expertise scores for both the skills that members list on their profiles as well as the skills they are likely to have but do not explicitly list. In the online phase (at query time) we use expertise scores on these skills as a feature in combination with other features to rank the results. To learn the personalized ranking function, we propose a heuristic to extract training data from search logs while handling position and sample selection biases. We tested our models on two products - LinkedIn homepage and LinkedIn recruiter. A/B tests showed significant improvements in click through rates - 31% for CTR@1 for recruiter (18% for homepage) as well as downstream messages sent from search - 37% for recruiter (20% for homepage). As of writing this paper, these models serve nearly all live traffic for skills search on LinkedIn homepage as well as LinkedIn recruiter.Comment: 2015 IEEE International Conference on Big Dat

arXiv.org e-Print Archive

Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes

Author: Ma Wei-Ying
Schwaighofer Anton
Tresp Volker
Yu Kai
Zhang HongJiang
Publication venue
Publication date: 19/10/2012
Field of study

Collaborative filtering (CF) and content-based filtering (CBF) have widely been used in information filtering applications. Both approaches have their strengths and weaknesses which is why researchers have developed hybrid systems. This paper proposes a novel approach to unify CF and CBF in a probabilistic framework, named collaborative ensemble learning. It uses probabilistic SVMs to model each user's profile (as CBF does).At the prediction phase, it combines a society OF users profiles, represented by their respective SVM models, to predict an active users preferences(the CF idea).The combination scheme is embedded in a probabilistic framework and retains an intuitive explanation.Moreover, collaborative ensemble learning does not require a global training stage and thus can incrementally incorporate new data.We report results based on two data sets. For the Reuters-21578 text data set, we simulate user ratings under the assumption that each user is interested in only one category. In the second experiment, we use users' opinions on a set of 642 art images that were collected through a web-based survey. For both data sets, collaborative ensemble achieved excellent performance in terms of recommendation accuracy.Comment: Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003

arXiv.org e-Print Archive

DeepProf: Performance Analysis for Deep Learning Applications via Mining GPU Execution Patterns

Author: Gu Jiazhen
Liu Huan
Wang Xin
Zhou Yangfan
Publication venue
Publication date: 12/07/2017
Field of study

Deep learning applications are computation-intensive and often employ GPU as the underlying computing devices. Deep learning frameworks provide powerful programming interfaces, but the gap between source codes and practical GPU operations make it difficult to analyze the performance of deep learning applications. In this paper, through examing the features of GPU traces and deep learning applications, we use the suffix tree structure to extract the repeated patten in GPU traces. Performance analysis graphs can be generated from the preprocessed GPU traces. We further present \texttt{DeepProf}, a novel tool to automatically process GPU traces and generate performance analysis reports for deep learning applications. Empirical study verifies the effectiveness of \texttt{DeepProf} in performance analysis and diagnosis. We also find out some interesting properties of Tensorflow, which can be used to guide the deep learning system setup

arXiv.org e-Print Archive

Operationalizing the Legal Principle of Data Minimization for Personalization

Author: Biega Asia J.
Daumé III Hal
Diaz Fernando
Finck Michèle
Potash Peter
Publication venue
Publication date: 27/05/2020
Field of study

Article 5(1)(c) of the European Union's General Data Protection Regulation (GDPR) requires that "personal data shall be [...] adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed (`data minimisation')". To date, the legal and computational definitions of `purpose limitation' and `data minimization' remain largely unclear. In particular, the interpretation of these principles is an open issue for information access systems that optimize for user experience through personalization and do not strictly require personal data collection for the delivery of basic service. In this paper, we identify a lack of a homogeneous interpretation of the data minimization principle and explore two operational definitions applicable in the context of personalization. The focus of our empirical study in the domain of recommender systems is on providing foundational insights about the (i) feasibility of different data minimization definitions, (ii) robustness of different recommendation algorithms to minimization, and (iii) performance of different minimization strategies.We find that the performance decrease incurred by data minimization might not be substantial, but that it might disparately impact different users---a finding which has implications for the viability of different formal minimization definitions. Overall, our analysis uncovers the complexities of the data minimization problem in the context of personalization and maps the remaining computational and regulatory challenges.Comment: SIGIR 2020 paper: In Proc. of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieva

arXiv.org e-Print Archive