8,304 research outputs found
Model Selection for Support Vector Machine Classification
We address the problem of model selection for Support Vector Machine (SVM)
classification. For fixed functional form of the kernel, model selection
amounts to tuning kernel parameters and the slack penalty coefficient . We
begin by reviewing a recently developed probabilistic framework for SVM
classification. An extension to the case of SVMs with quadratic slack penalties
is given and a simple approximation for the evidence is derived, which can be
used as a criterion for model selection. We also derive the exact gradients of
the evidence in terms of posterior averages and describe how they can be
estimated numerically using Hybrid Monte Carlo techniques. Though
computationally demanding, the resulting gradient ascent algorithm is a useful
baseline tool for probabilistic SVM model selection, since it can locate maxima
of the exact (unapproximated) evidence. We then perform extensive experiments
on several benchmark data sets. The aim of these experiments is to compare the
performance of probabilistic model selection criteria with alternatives based
on estimates of the test error, namely the so-called ``span estimate'' and
Wahba's Generalized Approximate Cross-Validation (GACV) error. We find that all
the ``simple'' model criteria (Laplace evidence approximations, and the Span
and GACV error estimates) exhibit multiple local optima with respect to the
hyperparameters. While some of these give performance that is competitive with
results from other approaches in the literature, a significant fraction lead to
rather higher test errors. The results for the evidence gradient ascent method
show that also the exact evidence exhibits local optima, but these give test
errors which are much less variable and also consistently lower than for the
simpler model selection criteria
PhysicsGP: A Genetic Programming Approach to Event Selection
We present a novel multivariate classification technique based on Genetic
Programming. The technique is distinct from Genetic Algorithms and offers
several advantages compared to Neural Networks and Support Vector Machines. The
technique optimizes a set of human-readable classifiers with respect to some
user-defined performance measure. We calculate the Vapnik-Chervonenkis
dimension of this class of learning machines and consider a practical example:
the search for the Standard Model Higgs Boson at the LHC. The resulting
classifier is very fast to evaluate, human-readable, and easily portable. The
software may be downloaded at: http://cern.ch/~cranmer/PhysicsGP.htmlComment: 16 pages 9 figures, 1 table. Submitted to Comput. Phys. Commu
From average case complexity to improper learning complexity
The basic problem in the PAC model of computational learning theory is to
determine which hypothesis classes are efficiently learnable. There is
presently a dearth of results showing hardness of learning problems. Moreover,
the existing lower bounds fall short of the best known algorithms.
The biggest challenge in proving complexity results is to establish hardness
of {\em improper learning} (a.k.a. representation independent learning).The
difficulty in proving lower bounds for improper learning is that the standard
reductions from -hard problems do not seem to apply in this
context. There is essentially only one known approach to proving lower bounds
on improper learning. It was initiated in (Kearns and Valiant 89) and relies on
cryptographic assumptions.
We introduce a new technique for proving hardness of improper learning, based
on reductions from problems that are hard on average. We put forward a (fairly
strong) generalization of Feige's assumption (Feige 02) about the complexity of
refuting random constraint satisfaction problems. Combining this assumption
with our new technique yields far reaching implications. In particular,
1. Learning 's is hard.
2. Agnostically learning halfspaces with a constant approximation ratio is
hard.
3. Learning an intersection of halfspaces is hard.Comment: 34 page
Fake View Analytics in Online Video Services
Online video-on-demand(VoD) services invariably maintain a view count for
each video they serve, and it has become an important currency for various
stakeholders, from viewers, to content owners, advertizers, and the online
service providers themselves. There is often significant financial incentive to
use a robot (or a botnet) to artificially create fake views. How can we detect
the fake views? Can we detect them (and stop them) using online algorithms as
they occur? What is the extent of fake views with current VoD service
providers? These are the questions we study in the paper. We develop some
algorithms and show that they are quite effective for this problem.Comment: 25 pages, 15 figure
Anatomy determines etiology in thoracic aortic aneurysm
BACKGROUND: It is well established that thoracic aortic aneurysms (TAA) and abdominal aortic aneurysms (AAA) have different risk factors, clinical features, and genetic influences. Differences between and amongst subtypes of TAAs have received less attention. Despite observations of divergent clinical outcomes between ascending thoracic aortic aneurysms (ATAAs) and descending thoracic aortic aneurysms (DTAAs), etiologic factors determining the anatomic distribution of these aneurysms are not well understood.
METHODS: From 3,247 patients registered in an institutional Thoracic Aortic Center Database from July 1992 through August 2013, we identified 921 patients with full aortic dimensional imaging by CT or MRI scan with TAA > 3.5 cm and without evidence of aortic dissection (AoD). Patients were analyzed in three groups: isolated ATAA (n=677), isolated DTAA (n=97), and combined ATAA and DTAA (n=146).
RESULTS: Patients with a DTAA, alone or with coexistent ATAA, had significantly more hypertension (80.6% vs. 61.8%, p<.001) and a higher burden of atherosclerotic disease ( 86.7% vs. 7.5%, p<.001) ) and were more likely to be female (59.3% vs. 29.5%, P<.001). Conversely, patients with isolated ATAA were significantly younger (average age 59.5 vs. 71, p<.001), and contained almost every case of overt genetically-triggered TAA. Patients with isolated DTAA were demographically indistinguishable from patients with combined ATAA and DTAA. In follow up, patients with isolated DTAA, or with ATAA and DTAA, experienced significantly more aortic events (aortic dissection/rupture) and had higher mortality than patients with isolated ATAA.
CONCLUSIONS: Based on patient characteristics and outcomes, subtypes of TAA emerge. DTAA with or without associated ATAA or AAA appears to be a disease more highly associated with atherosclerosis, hypertension, and advanced age. In contrast, isolated ATAA appears to be a clinically distinct entity with a higher burden of genetically triggered disease. These data have important implications for familial screening recommendations for TAA
1,4-Diazabicyclo[2.2.2]octane (DABCO) as a useful catalyst in organic synthesis
1,4-diazabicyclo[2.2.2]octane (DABCO) has been used in many organic preparations as a good solid catalyst. DABCO has received considerable attention as an inexpensive, eco-friendly, high reactive, easy to handle and non-toxic base catalyst for various organic transformations, affording the corresponding products in excellent yields with high selectivity. In this review, some applications of this catalyst in organic reactions were discussed
Second-Generation Objects in the Universe: Radiative Cooling and Collapse of Halos with Virial Temperatures Above 10^4 Kelvin
The first generation of protogalaxies likely formed out of primordial gas via
H2-cooling in cosmological minihalos with virial temperatures of a few 1000K.
However, their abundance is likely to have been severely limited by feedback
processes which suppressed H2 formation. The formation of the protogalaxies
responsible for reionization and metal-enrichment of the intergalactic medium,
then had to await the collapse of larger halos. Here we investigate the
radiative cooling and collapse of gas in halos with virial temperatures Tvir >
10^4K. In these halos, efficient atomic line radiation allows rapid cooling of
the gas to 8000 K; subsequently the gas can contract nearly isothermally at
this temperature. Without an additional coolant, the gas would likely settle
into a locally gravitationally stable disk; only disks with unusually low spin
would be unstable. However, we find that the initial atomic line cooling leaves
a large, out-of-equilibrium residual free electron fraction. This allows the
molecular fraction to build up to a universal value of about x(H2) = 10^-3,
almost independently of initial density and temperature. We show that this is a
non--equilibrium freezeout value that can be understood in terms of timescale
arguments. Furthermore, unlike in less massive halos, H2 formation is largely
impervious to feedback from external UV fields, due to the high initial
densities achieved by atomic cooling. The H2 molecules cool the gas further to
about 100K, and allow the gas to fragment on scales of a few 100 Msun. We
investigate the importance of various feedback effects such as
H2-photodissociation from internal UV fields and radiation pressure due to
Ly-alpha photon trapping, which are likely to regulate the efficiency of star
formation.Comment: Revised version accepted by ApJ; some reorganization for clarit
On the Chromatic Thresholds of Hypergraphs
Let F be a family of r-uniform hypergraphs. The chromatic threshold of F is
the infimum of all non-negative reals c such that the subfamily of F comprising
hypergraphs H with minimum degree at least has bounded
chromatic number. This parameter has a long history for graphs (r=2), and in
this paper we begin its systematic study for hypergraphs.
{\L}uczak and Thomass\'e recently proved that the chromatic threshold of the
so-called near bipartite graphs is zero, and our main contribution is to
generalize this result to r-uniform hypergraphs. For this class of hypergraphs,
we also show that the exact Tur\'an number is achieved uniquely by the complete
(r+1)-partite hypergraph with nearly equal part sizes. This is one of very few
infinite families of nondegenerate hypergraphs whose Tur\'an number is
determined exactly. In an attempt to generalize Thomassen's result that the
chromatic threshold of triangle-free graphs is 1/3, we prove bounds for the
chromatic threshold of the family of 3-uniform hypergraphs not containing {abc,
abd, cde}, the so-called generalized triangle.
In order to prove upper bounds we introduce the concept of fiber bundles,
which can be thought of as a hypergraph analogue of directed graphs. This leads
to the notion of fiber bundle dimension, a structural property of fiber bundles
that is based on the idea of Vapnik-Chervonenkis dimension in hypergraphs. Our
lower bounds follow from explicit constructions, many of which use a hypergraph
analogue of the Kneser graph. Using methods from extremal set theory, we prove
that these Kneser hypergraphs have unbounded chromatic number. This generalizes
a result of Szemer\'edi for graphs and might be of independent interest. Many
open problems remain.Comment: 37 pages, 4 figure
Classification of partial discharge signals by combining adaptive local iterative filtering and entropy features
Electro-Magnetic Interference (EMI) is a measurement technique for Partial Discharge (PD) signals which arise in operating electrical machines, generators and other auxiliary equipment due to insulation degradation. Assessment of PD can help to reduce machine downtime and circumvent high replacement and maintenance costs. EMI signals can be complex to analyze due to their nonstationary nature. In this paper, a software condition-monitoring model is presented and a novel feature extraction technique, suitable for nonstationary EMI signals, is developed. This method maps multiple discharge sources signals, including PD, from the time domain to a feature space which aids interpretation of subsequent fault information. Results show excellent performance in classifying the different discharge sources
- …