Search CORE

1,778 research outputs found

Optimal estimation for Large-Eddy Simulation of turbulence and application to the analysis of subgrid models

Author: A. Moreau
Bishop C. M.
Deutsch R.
Friedt J.-M.
Haykin S.
J. P. Bertoglio
O. Teytaud
Vapnik V. N.
Publication venue: 'AIP Publishing'
Publication date: 06/06/2006
Field of study

The tools of optimal estimation are applied to the study of subgrid models for Large-Eddy Simulation of turbulence. The concept of optimal estimator is introduced and its properties are analyzed in the context of applications to a priori tests of subgrid models. Attention is focused on the Cook and Riley model in the case of a scalar field in isotropic turbulence. Using DNS data, the relevance of the beta assumption is estimated by computing (i) generalized optimal estimators and (ii) the error brought by this assumption alone. Optimal estimators are computed for the subgrid variance using various sets of variables and various techniques (histograms and neural networks). It is shown that optimal estimators allow a thorough exploration of models. Neural networks are proved to be relevant and very efficient in this framework, and further usages are suggested

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

Crossref

HAL Clermont Université

INRIA a CCSD electronic archive server

Regularizing Portfolio Optimization

Author: Acerbi C
Acerbi C Nordio C Sirtori C
Bengio Y
Bertsekas D P
Bordes A
Bottou L
Bouchaud J-Ph
Burda Z
Chopra V K
DeMiguel V
Elton E J
Embrechts P
Frahm G
Frahm G Memmel Ch
Gulyas N Kondor I
Imre Kondor
Jobson J D
Jorion P
Kempf A
Kondor I Varga-Haszonits I
Macrae R
Markowitz H
Morgan J P Reuters Riskmetrics
Perez-Cruz F
Potters M
Rockafellar R T
Schölkopf B
Schölkopf B
Susanne Still
Tibshirani R
Vanderbei R J
Vapnik V
Vapnik V
Vapnik V
Varga-Haszonits I
Publication venue: 'IOP Publishing'
Publication date: 09/11/2009
Field of study

The optimization of large portfolios displays an inherent instability to estimation error. This poses a fundamental problem, because solutions that are not stable under sample fluctuations may look optimal for a given sample, but are, in effect, very far from optimal with respect to the average risk. In this paper, we approach the problem from the point of view of statistical learning theory. The occurrence of the instability is intimately related to over-fitting which can be avoided using known regularization methods. We show how regularized portfolio optimization with the expected shortfall as a risk measure is related to support vector regression. The budget constraint dictates a modification. We present the resulting optimization problem and discuss the solution. The L2 norm of the weight vector is used as a regularizer, which corresponds to a diversification "pressure". This means that diversification, besides counteracting downward fluctuations in some assets by upward fluctuations in others, is also crucial because it improves the stability of the solution. The approach we provide here allows for the simultaneous treatment of optimization and diversification in one framework that enables the investor to trade-off between the two, depending on the size of the available data set

arXiv.org e-Print Archive

Crossref

ELTE Digital Institutional Repository (EDIT)

A preliminary approach to the multilabel classification problem of Portuguese juridical documents

Author: A. McCallum
B. Schölkopf
C. Cortes
G. Salton
I. Witten
N. Cancedda
N. Cristianini
P. Quaresma
R. Quinlan
T. Joachims
V. Vapnik
V. Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Portuguese juridical documents from Supreme Courts and the Attorney General’s Office are manually classified by juridical experts into a set of classes belonging to a taxonomy of concepts. In this paper, a preliminary approach to develop techniques to automat- ically classify these juridical documents, is proposed. As basic strategy, the integration of natural language processing techniques with machine learning ones is used. Support Vector Machines (SVM) are used as learn- ing algorithm and the obtained results are presented and compared with other approaches, such as C4.5 and Naive Bayes

Crossref

Repositório Científico da Universidade de Évora

Application of support vector machines on the basis of the first Hungarian bankruptcy model

Author: Boyacioglu M. A.
Burges C. J. C.
Ding Y.
Hearst M. A.
Huang Z.
Kim H. K.
Kristóf T.
Lee M. C.
Lensberg T. L.
Miklós Virag
Min J. H.
Moradi M.
Shin K. S.
Sun L.
Szûcs I.
Tamás Nyitrai
Vapnik V. M.
Vapnik V. M.
Virág M.
Virág M.
Virág M.
Yang Y.
Publication venue: 'Akademiai Kiado Zrt.'
Publication date: 01/06/2013
Field of study

In our study we rely on a data mining procedure known as support vector machine (SVM) on the database of the first Hungarian bankruptcy model. The models constructed are then contrasted with the results of earlier bankruptcy models with the use of classification accuracy and the area under the ROC curve. In using the SVM technique, in addition to conventional kernel functions, we also examine the possibilities of applying the ANOVA kernel function and take a detailed look at data preparation tasks recommended in using the SVM method (handling of outliers). The results of the models assembled suggest that a significant improvement of classification accuracy can be achieved on the database of the first Hungarian bankruptcy model when using the SVM method as opposed to neural networks

Corvinus Research Archive

Crossref

Improved website fingerprinting on Tor

Author: Vapnik V.
Weston J.
Wright C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Active Sampling-based Binary Verification of Dynamical Systems

Author: Ali A.
Anguita D.
Bishop C. M.
Brinker K.
Hoxha B.
Montgomery D. C.
Platt J.
Quindlen J. F.
Scholkopf B.
Settles B.
Tipping M. E.
Vapnik V. N.
Publication venue
Publication date: 16/01/2018
Field of study

Nonlinear, adaptive, or otherwise complex control techniques are increasingly relied upon to ensure the safety of systems operating in uncertain environments. However, the nonlinearity of the resulting closed-loop system complicates verification that the system does in fact satisfy those requirements at all possible operating conditions. While analytical proof-based techniques and finite abstractions can be used to provably verify the closed-loop system's response at different operating conditions, they often produce conservative approximations due to restrictive assumptions and are difficult to construct in many applications. In contrast, popular statistical verification techniques relax the restrictions and instead rely upon simulations to construct statistical or probabilistic guarantees. This work presents a data-driven statistical verification procedure that instead constructs statistical learning models from simulated training data to separate the set of possible perturbations into "safe" and "unsafe" subsets. Binary evaluations of closed-loop system requirement satisfaction at various realizations of the uncertainties are obtained through temporal logic robustness metrics, which are then used to construct predictive models of requirement satisfaction over the full set of possible uncertainties. As the accuracy of these predictive statistical models is inherently coupled to the quality of the training data, an active learning algorithm selects additional sample points in order to maximize the expected change in the data-driven model and thus, indirectly, minimize the prediction error. Various case studies demonstrate the closed-loop verification procedure and highlight improvements in prediction error over both existing analytical and statistical verification techniques.Comment: 23 page

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Semantic Entities

Author: Hinton G. E.
Moreira C.
Van Gysel C.
Vapnik V.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Entity retrieval has seen a lot of interest from the research community over the past decade. Ten years ago, the expertise retrieval task gained popularity in the research community during the TREC Enterprise Track [10]. It has remained relevant ever since, while broadening to social media, to tracking the dynamics of expertise [1-5, 8, 11], and, more generally, to a range of entity retrieval tasks. In the talk, which will be given by the second author, we will point out that existing methods to entity or expert retrieval fail to address key challenges: (1) Queries and expert documents use different representations to describe the same concepts [6, 7]. Term mismatches between entities and experts [7] occur due to the inability of widely used maximum-likelihood language models to make use of semantic similarities between words [9]. (2) As the amount of available data increases, the need for more powerful approaches with greater learning capabilities than smoothed maximum-likelihood language models is obvious [13]. (3) Supervised methods for entity or expertise retrieval [5, 8] were introduced at the turn of the last decade. However, the acceleration of data availability has the major disadvantage that, in the case of supervised methods, manual annotation efforts need to sustain a similar order of growth. This calls for the further development of unsupervised methods. (4) According to some entity or expertise retrieval methods, a language model is constructed for every document in the collection. These methods lack efficient query capabilities for large document collections, as each query term needs to be matched against every document [2]. In the talk we will discuss a recently proposed solution [12] that has a strong emphasis on unsupervised model construction, efficient query capabilities and, most importantly, semantic matching between query terms and candidate entities. We show that the proposed approach improves retrieval performance compared to generative language models mainly due to its ability to perform semantic matching [7]. The proposed method does not require any annotations or supervised relevance judgments and is able to learn from raw textual evidence and document-candidate associations alone. The purpose of the proposal is to provide insight in how we avoid explicit annotations and feature engineering and still obtain semantically meaningful retrieval results. In the talk we will provide a comparative error analysis between the proposed semantic entity retrieval model and traditional generative language models that perform exact matching, which yields important insights in the relative strengths of semantic matching and exact matching for the expert retrieval task in particular and entity retrieval in general. We will also discuss extensions of the proposed model that are meant to deal with scalability and dynamic aspects of entity and expert retrieval

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers

Author: Burges C.
Girosi F.
Niyogi P.
Poggio T.
Schoelkopf B.
Sung K.
Vapnik V.
Publication venue
Publication date: 01/12/1996
Field of study

The Support Vector (SV) machine is a novel type of learning machine, based on statistical learning theory, which contains polynomial classifiers, neural networks, and radial basis function (RBF) networks as special cases. In the RBF case, the SV algorithm automatically determines centers, weights and threshold such as to minimize an upper bound on the expected test error. The present study is devoted to an experimental comparison of these machines with a classical approach, where the centers are determined by

k

--means clustering and the weights are found using error backpropagation. We consider three machines, namely a classical RBF machine, an SV machine with Gaussian kernel, and a hybrid system with the centers determined by the SV method and the weights trained by error backpropagation. Our results show that on the US postal service database of handwritten digits, the SV machine achieves the highest test accuracy, followed by the hybrid approach. The SV approach is thus not only theoretically well--founded, but also superior in a practical application

Graph Distillation for Action Detection with Privileged Modalities

Author: Bingbing Ni
C Zach
HS Koppula
J Liu
L Shao
M Liu
M Yu
R Caruana
SJ Pan
V Escorcia
V Vapnik
W Li
Z Ding
Z Qin
Publication venue
Publication date: 27/07/2018
Field of study

We propose a technique that tackles action detection in multimodal videos under a realistic and challenging condition in which only limited training data and partially observed modalities are available. Common methods in transfer learning do not take advantage of the extra modalities potentially available in the source domain. On the other hand, previous work on multimodal learning only focuses on a single domain or task and does not handle the modality discrepancy between training and testing. In this work, we propose a method termed graph distillation that incorporates rich privileged information from a large-scale multimodal dataset in the source domain, and improves the learning in the target domain where training data and modalities are scarce. We evaluate our approach on action classification and detection tasks in multimodal videos, and show that our model outperforms the state-of-the-art by a large margin on the NTU RGB+D and PKU-MMD benchmarks. The code is released at http://alan.vision/eccv18_graph/.Comment: ECCV 201

arXiv.org e-Print Archive

Crossref

Validating the detection of everyday concepts in visual lifelogs

Author: A.C. Bovik
C. Gurrin
H.T. Lin
J. Geusebroek
J.L. Fleiss
J.R. Landis
M. Hoang
V. Bush
V. Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user's day-to-day activities. It can capture up to 3,000 images per day, equating to almost 1 million images per year. It is used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer's life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the novel domain of visual lifelogs. A concept detector models the correspondence between low-level visual features and high-level semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept's presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were then evaluated on a subset of 95,907 images, to determine the precision for detection of each semantic concept and to draw some interesting inferences on the lifestyles of those 5 users. We additionally present future applications of concept detection within the domain of lifelogging. © 2008 Springer Berlin Heidelberg

CiteSeerX

Crossref

Irish Universities

Oxford University Research Archive

DCU Online Research Access Service

International Migration, Integration and Social Cohesion online publications

UvA-DARE