1,778 research outputs found
Optimal estimation for Large-Eddy Simulation of turbulence and application to the analysis of subgrid models
The tools of optimal estimation are applied to the study of subgrid models
for Large-Eddy Simulation of turbulence. The concept of optimal estimator is
introduced and its properties are analyzed in the context of applications to a
priori tests of subgrid models. Attention is focused on the Cook and Riley
model in the case of a scalar field in isotropic turbulence. Using DNS data,
the relevance of the beta assumption is estimated by computing (i) generalized
optimal estimators and (ii) the error brought by this assumption alone. Optimal
estimators are computed for the subgrid variance using various sets of
variables and various techniques (histograms and neural networks). It is shown
that optimal estimators allow a thorough exploration of models. Neural networks
are proved to be relevant and very efficient in this framework, and further
usages are suggested
Regularizing Portfolio Optimization
The optimization of large portfolios displays an inherent instability to
estimation error. This poses a fundamental problem, because solutions that are
not stable under sample fluctuations may look optimal for a given sample, but
are, in effect, very far from optimal with respect to the average risk. In this
paper, we approach the problem from the point of view of statistical learning
theory. The occurrence of the instability is intimately related to over-fitting
which can be avoided using known regularization methods. We show how
regularized portfolio optimization with the expected shortfall as a risk
measure is related to support vector regression. The budget constraint dictates
a modification. We present the resulting optimization problem and discuss the
solution. The L2 norm of the weight vector is used as a regularizer, which
corresponds to a diversification "pressure". This means that diversification,
besides counteracting downward fluctuations in some assets by upward
fluctuations in others, is also crucial because it improves the stability of
the solution. The approach we provide here allows for the simultaneous
treatment of optimization and diversification in one framework that enables the
investor to trade-off between the two, depending on the size of the available
data set
A preliminary approach to the multilabel classification problem of Portuguese juridical documents
Portuguese juridical documents from Supreme Courts and the Attorney General’s Office are manually classified by juridical experts into a set of classes belonging to a taxonomy of concepts. In this paper, a preliminary approach to develop techniques to automat- ically classify these juridical documents, is proposed. As basic strategy, the integration of natural language processing techniques with machine learning ones is used. Support Vector Machines (SVM) are used as learn- ing algorithm and the obtained results are presented and compared with other approaches, such as C4.5 and Naive Bayes
Application of support vector machines on the basis of the first Hungarian bankruptcy model
In our study we rely on a data mining procedure known as support vector machine (SVM) on the database of the first Hungarian bankruptcy model. The models constructed are then contrasted with the results of earlier bankruptcy models with the use of classification accuracy and the area under the ROC curve. In using the SVM technique, in addition to conventional kernel functions, we also examine the possibilities of applying the ANOVA kernel function and take a detailed look at data preparation tasks recommended in using the SVM method (handling of outliers). The results of the models assembled suggest that a significant improvement of classification accuracy can be achieved on the database of the first Hungarian bankruptcy model when using the SVM method as opposed to neural networks
Active Sampling-based Binary Verification of Dynamical Systems
Nonlinear, adaptive, or otherwise complex control techniques are increasingly
relied upon to ensure the safety of systems operating in uncertain
environments. However, the nonlinearity of the resulting closed-loop system
complicates verification that the system does in fact satisfy those
requirements at all possible operating conditions. While analytical proof-based
techniques and finite abstractions can be used to provably verify the
closed-loop system's response at different operating conditions, they often
produce conservative approximations due to restrictive assumptions and are
difficult to construct in many applications. In contrast, popular statistical
verification techniques relax the restrictions and instead rely upon
simulations to construct statistical or probabilistic guarantees. This work
presents a data-driven statistical verification procedure that instead
constructs statistical learning models from simulated training data to separate
the set of possible perturbations into "safe" and "unsafe" subsets. Binary
evaluations of closed-loop system requirement satisfaction at various
realizations of the uncertainties are obtained through temporal logic
robustness metrics, which are then used to construct predictive models of
requirement satisfaction over the full set of possible uncertainties. As the
accuracy of these predictive statistical models is inherently coupled to the
quality of the training data, an active learning algorithm selects additional
sample points in order to maximize the expected change in the data-driven model
and thus, indirectly, minimize the prediction error. Various case studies
demonstrate the closed-loop verification procedure and highlight improvements
in prediction error over both existing analytical and statistical verification
techniques.Comment: 23 page
Semantic Entities
Entity retrieval has seen a lot of interest from the research community over the past decade. Ten years ago, the expertise retrieval task gained popularity in the research community during the TREC Enterprise Track [10]. It has remained relevant ever since, while broadening to social media, to tracking the dynamics of expertise [1-5, 8, 11], and, more generally, to a range of entity retrieval tasks. In the talk, which will be given by the second author, we will point out that existing methods to entity or expert retrieval fail to address key challenges: (1) Queries and expert documents use different representations to describe the same concepts [6, 7]. Term mismatches between entities and experts [7] occur due to the inability of widely used maximum-likelihood language models to make use of semantic similarities between words [9]. (2) As the amount of available data increases, the need for more powerful approaches with greater learning capabilities than smoothed maximum-likelihood language models is obvious [13]. (3) Supervised methods for entity or expertise retrieval [5, 8] were introduced at the turn of the last decade. However, the acceleration of data availability has the major disadvantage that, in the case of supervised methods, manual annotation efforts need to sustain a similar order of growth. This calls for the further development of unsupervised methods. (4) According to some entity or expertise retrieval methods, a language model is constructed for every document in the collection. These methods lack efficient query capabilities for large document collections, as each query term needs to be matched against every document [2]. In the talk we will discuss a recently proposed solution [12] that has a strong emphasis on unsupervised model construction, efficient query capabilities and, most importantly, semantic matching between query terms and candidate entities. We show that the proposed approach improves retrieval performance compared to generative language models mainly due to its ability to perform semantic matching [7]. The proposed method does not require any annotations or supervised relevance judgments and is able to learn from raw textual evidence and document-candidate associations alone. The purpose of the proposal is to provide insight in how we avoid explicit annotations and feature engineering and still obtain semantically meaningful retrieval results. In the talk we will provide a comparative error analysis between the proposed semantic entity retrieval model and traditional generative language models that perform exact matching, which yields important insights in the relative strengths of semantic matching and exact matching for the expert retrieval task in particular and entity retrieval in general. We will also discuss extensions of the proposed model that are meant to deal with scalability and dynamic aspects of entity and expert retrieval
Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers
The Support Vector (SV) machine is a novel type of learning machine, based on statistical learning theory, which contains polynomial classifiers, neural networks, and radial basis function (RBF) networks as special cases. In the RBF case, the SV algorithm automatically determines centers, weights and threshold such as to minimize an upper bound on the expected test error. The present study is devoted to an experimental comparison of these machines with a classical approach, where the centers are determined by --means clustering and the weights are found using error backpropagation. We consider three machines, namely a classical RBF machine, an SV machine with Gaussian kernel, and a hybrid system with the centers determined by the SV method and the weights trained by error backpropagation. Our results show that on the US postal service database of handwritten digits, the SV machine achieves the highest test accuracy, followed by the hybrid approach. The SV approach is thus not only theoretically well--founded, but also superior in a practical application
Graph Distillation for Action Detection with Privileged Modalities
We propose a technique that tackles action detection in multimodal videos
under a realistic and challenging condition in which only limited training data
and partially observed modalities are available. Common methods in transfer
learning do not take advantage of the extra modalities potentially available in
the source domain. On the other hand, previous work on multimodal learning only
focuses on a single domain or task and does not handle the modality discrepancy
between training and testing. In this work, we propose a method termed graph
distillation that incorporates rich privileged information from a large-scale
multimodal dataset in the source domain, and improves the learning in the
target domain where training data and modalities are scarce. We evaluate our
approach on action classification and detection tasks in multimodal videos, and
show that our model outperforms the state-of-the-art by a large margin on the
NTU RGB+D and PKU-MMD benchmarks. The code is released at
http://alan.vision/eccv18_graph/.Comment: ECCV 201
Validating the detection of everyday concepts in visual lifelogs
The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user's day-to-day activities. It can capture up to 3,000 images per day, equating to almost 1 million images per year. It is used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer's life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the novel domain of visual lifelogs. A concept detector models the correspondence between low-level visual features and high-level semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept's presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were then evaluated on a subset of 95,907 images, to determine the precision for detection of each semantic concept and to draw some interesting inferences on the lifestyles of those 5 users. We additionally present future applications of concept detection within the domain of lifelogging. © 2008 Springer Berlin Heidelberg
- …