8,304 research outputs found

    Model Selection for Support Vector Machine Classification

    Get PDF
    We address the problem of model selection for Support Vector Machine (SVM) classification. For fixed functional form of the kernel, model selection amounts to tuning kernel parameters and the slack penalty coefficient CC. We begin by reviewing a recently developed probabilistic framework for SVM classification. An extension to the case of SVMs with quadratic slack penalties is given and a simple approximation for the evidence is derived, which can be used as a criterion for model selection. We also derive the exact gradients of the evidence in terms of posterior averages and describe how they can be estimated numerically using Hybrid Monte Carlo techniques. Though computationally demanding, the resulting gradient ascent algorithm is a useful baseline tool for probabilistic SVM model selection, since it can locate maxima of the exact (unapproximated) evidence. We then perform extensive experiments on several benchmark data sets. The aim of these experiments is to compare the performance of probabilistic model selection criteria with alternatives based on estimates of the test error, namely the so-called ``span estimate'' and Wahba's Generalized Approximate Cross-Validation (GACV) error. We find that all the ``simple'' model criteria (Laplace evidence approximations, and the Span and GACV error estimates) exhibit multiple local optima with respect to the hyperparameters. While some of these give performance that is competitive with results from other approaches in the literature, a significant fraction lead to rather higher test errors. The results for the evidence gradient ascent method show that also the exact evidence exhibits local optima, but these give test errors which are much less variable and also consistently lower than for the simpler model selection criteria

    PhysicsGP: A Genetic Programming Approach to Event Selection

    Full text link
    We present a novel multivariate classification technique based on Genetic Programming. The technique is distinct from Genetic Algorithms and offers several advantages compared to Neural Networks and Support Vector Machines. The technique optimizes a set of human-readable classifiers with respect to some user-defined performance measure. We calculate the Vapnik-Chervonenkis dimension of this class of learning machines and consider a practical example: the search for the Standard Model Higgs Boson at the LHC. The resulting classifier is very fast to evaluate, human-readable, and easily portable. The software may be downloaded at: http://cern.ch/~cranmer/PhysicsGP.htmlComment: 16 pages 9 figures, 1 table. Submitted to Comput. Phys. Commu

    From average case complexity to improper learning complexity

    Full text link
    The basic problem in the PAC model of computational learning theory is to determine which hypothesis classes are efficiently learnable. There is presently a dearth of results showing hardness of learning problems. Moreover, the existing lower bounds fall short of the best known algorithms. The biggest challenge in proving complexity results is to establish hardness of {\em improper learning} (a.k.a. representation independent learning).The difficulty in proving lower bounds for improper learning is that the standard reductions from NP\mathbf{NP}-hard problems do not seem to apply in this context. There is essentially only one known approach to proving lower bounds on improper learning. It was initiated in (Kearns and Valiant 89) and relies on cryptographic assumptions. We introduce a new technique for proving hardness of improper learning, based on reductions from problems that are hard on average. We put forward a (fairly strong) generalization of Feige's assumption (Feige 02) about the complexity of refuting random constraint satisfaction problems. Combining this assumption with our new technique yields far reaching implications. In particular, 1. Learning DNF\mathrm{DNF}'s is hard. 2. Agnostically learning halfspaces with a constant approximation ratio is hard. 3. Learning an intersection of ω(1)\omega(1) halfspaces is hard.Comment: 34 page

    Fake View Analytics in Online Video Services

    Full text link
    Online video-on-demand(VoD) services invariably maintain a view count for each video they serve, and it has become an important currency for various stakeholders, from viewers, to content owners, advertizers, and the online service providers themselves. There is often significant financial incentive to use a robot (or a botnet) to artificially create fake views. How can we detect the fake views? Can we detect them (and stop them) using online algorithms as they occur? What is the extent of fake views with current VoD service providers? These are the questions we study in the paper. We develop some algorithms and show that they are quite effective for this problem.Comment: 25 pages, 15 figure

    Anatomy determines etiology in thoracic aortic aneurysm

    Full text link
    BACKGROUND: It is well established that thoracic aortic aneurysms (TAA) and abdominal aortic aneurysms (AAA) have different risk factors, clinical features, and genetic influences. Differences between and amongst subtypes of TAAs have received less attention. Despite observations of divergent clinical outcomes between ascending thoracic aortic aneurysms (ATAAs) and descending thoracic aortic aneurysms (DTAAs), etiologic factors determining the anatomic distribution of these aneurysms are not well understood. METHODS: From 3,247 patients registered in an institutional Thoracic Aortic Center Database from July 1992 through August 2013, we identified 921 patients with full aortic dimensional imaging by CT or MRI scan with TAA > 3.5 cm and without evidence of aortic dissection (AoD). Patients were analyzed in three groups: isolated ATAA (n=677), isolated DTAA (n=97), and combined ATAA and DTAA (n=146). RESULTS: Patients with a DTAA, alone or with coexistent ATAA, had significantly more hypertension (80.6% vs. 61.8%, p<.001) and a higher burden of atherosclerotic disease ( 86.7% vs. 7.5%, p<.001) ) and were more likely to be female (59.3% vs. 29.5%, P<.001). Conversely, patients with isolated ATAA were significantly younger (average age 59.5 vs. 71, p<.001), and contained almost every case of overt genetically-triggered TAA. Patients with isolated DTAA were demographically indistinguishable from patients with combined ATAA and DTAA. In follow up, patients with isolated DTAA, or with ATAA and DTAA, experienced significantly more aortic events (aortic dissection/rupture) and had higher mortality than patients with isolated ATAA. CONCLUSIONS: Based on patient characteristics and outcomes, subtypes of TAA emerge. DTAA with or without associated ATAA or AAA appears to be a disease more highly associated with atherosclerosis, hypertension, and advanced age. In contrast, isolated ATAA appears to be a clinically distinct entity with a higher burden of genetically triggered disease. These data have important implications for familial screening recommendations for TAA

    1,4-Diazabicyclo[2.2.2]octane (DABCO) as a useful catalyst in organic synthesis

    Get PDF
    1,4-diazabicyclo[2.2.2]octane (DABCO) has been used in many organic preparations as a good solid catalyst. DABCO has received considerable attention as an inexpensive, eco-friendly, high reactive, easy to handle and non-toxic base catalyst for various organic transformations, affording the corresponding products in excellent yields with high selectivity. In this review, some applications of this catalyst in organic reactions were discussed

    Second-Generation Objects in the Universe: Radiative Cooling and Collapse of Halos with Virial Temperatures Above 10^4 Kelvin

    Full text link
    The first generation of protogalaxies likely formed out of primordial gas via H2-cooling in cosmological minihalos with virial temperatures of a few 1000K. However, their abundance is likely to have been severely limited by feedback processes which suppressed H2 formation. The formation of the protogalaxies responsible for reionization and metal-enrichment of the intergalactic medium, then had to await the collapse of larger halos. Here we investigate the radiative cooling and collapse of gas in halos with virial temperatures Tvir > 10^4K. In these halos, efficient atomic line radiation allows rapid cooling of the gas to 8000 K; subsequently the gas can contract nearly isothermally at this temperature. Without an additional coolant, the gas would likely settle into a locally gravitationally stable disk; only disks with unusually low spin would be unstable. However, we find that the initial atomic line cooling leaves a large, out-of-equilibrium residual free electron fraction. This allows the molecular fraction to build up to a universal value of about x(H2) = 10^-3, almost independently of initial density and temperature. We show that this is a non--equilibrium freezeout value that can be understood in terms of timescale arguments. Furthermore, unlike in less massive halos, H2 formation is largely impervious to feedback from external UV fields, due to the high initial densities achieved by atomic cooling. The H2 molecules cool the gas further to about 100K, and allow the gas to fragment on scales of a few 100 Msun. We investigate the importance of various feedback effects such as H2-photodissociation from internal UV fields and radiation pressure due to Ly-alpha photon trapping, which are likely to regulate the efficiency of star formation.Comment: Revised version accepted by ApJ; some reorganization for clarit

    On the Chromatic Thresholds of Hypergraphs

    Full text link
    Let F be a family of r-uniform hypergraphs. The chromatic threshold of F is the infimum of all non-negative reals c such that the subfamily of F comprising hypergraphs H with minimum degree at least c(∣V(H)∣r−1)c \binom{|V(H)|}{r-1} has bounded chromatic number. This parameter has a long history for graphs (r=2), and in this paper we begin its systematic study for hypergraphs. {\L}uczak and Thomass\'e recently proved that the chromatic threshold of the so-called near bipartite graphs is zero, and our main contribution is to generalize this result to r-uniform hypergraphs. For this class of hypergraphs, we also show that the exact Tur\'an number is achieved uniquely by the complete (r+1)-partite hypergraph with nearly equal part sizes. This is one of very few infinite families of nondegenerate hypergraphs whose Tur\'an number is determined exactly. In an attempt to generalize Thomassen's result that the chromatic threshold of triangle-free graphs is 1/3, we prove bounds for the chromatic threshold of the family of 3-uniform hypergraphs not containing {abc, abd, cde}, the so-called generalized triangle. In order to prove upper bounds we introduce the concept of fiber bundles, which can be thought of as a hypergraph analogue of directed graphs. This leads to the notion of fiber bundle dimension, a structural property of fiber bundles that is based on the idea of Vapnik-Chervonenkis dimension in hypergraphs. Our lower bounds follow from explicit constructions, many of which use a hypergraph analogue of the Kneser graph. Using methods from extremal set theory, we prove that these Kneser hypergraphs have unbounded chromatic number. This generalizes a result of Szemer\'edi for graphs and might be of independent interest. Many open problems remain.Comment: 37 pages, 4 figure

    Classification of partial discharge signals by combining adaptive local iterative filtering and entropy features

    Get PDF
    Electro-Magnetic Interference (EMI) is a measurement technique for Partial Discharge (PD) signals which arise in operating electrical machines, generators and other auxiliary equipment due to insulation degradation. Assessment of PD can help to reduce machine downtime and circumvent high replacement and maintenance costs. EMI signals can be complex to analyze due to their nonstationary nature. In this paper, a software condition-monitoring model is presented and a novel feature extraction technique, suitable for nonstationary EMI signals, is developed. This method maps multiple discharge sources signals, including PD, from the time domain to a feature space which aids interpretation of subsequent fault information. Results show excellent performance in classifying the different discharge sources
    • …
    corecore