11,568 research outputs found

    A Study of SVM Kernel Functions for Sensitivity Classification Ensembles with POS Sequences

    Get PDF
    Freedom of Information (FOI) laws legislate that government documents should be opened to the public. However, many government documents contain sensitive information, such as confidential information, that is exempt from release. Therefore, government documents must be sensitivity reviewed prior to release, to identify and close any sensitive information. With the adoption of born-digital documents, such as email, there is a need for automatic sensitivity classification to assist digital sensitivity review. SVM classifiers and Part-of-Speech sequences have separately been shown to be promising for sensitivity classification. However, sequence classification methodologies, and specifically SVM kernel functions, have not been fully investigated for sensitivity classification. Therefore, in this work, we present an evaluation of five SVM kernel functions for sensitivity classification using POS sequences. Moreover, we show that an ensemble classifier that combines POS sequence classification with text classification can significantly improve sensitivity classification effectiveness (+6.09% F2) compared with a text classification baseline, according to McNemar's test of significance

    Domain Adaptive Neural Networks for Object Recognition

    Full text link
    We propose a simple neural network model to deal with the domain adaptation problem in object recognition. Our model incorporates the Maximum Mean Discrepancy (MMD) measure as a regularization in the supervised learning to reduce the distribution mismatch between the source and target domains in the latent space. From experiments, we demonstrate that the MMD regularization is an effective tool to provide good domain adaptation models on both SURF features and raw image pixels of a particular image data set. We also show that our proposed model, preceded by the denoising auto-encoder pretraining, achieves better performance than recent benchmark models on the same data sets. This work represents the first study of MMD measure in the context of neural networks

    Improving Sequential Determinantal Point Processes for Supervised Video Summarization

    Full text link
    It is now much easier than ever before to produce videos. While the ubiquitous video data is a great source for information discovery and extraction, the computational challenges are unparalleled. Automatically summarizing the videos has become a substantial need for browsing, searching, and indexing visual content. This paper is in the vein of supervised video summarization using sequential determinantal point process (SeqDPP), which models diversity by a probabilistic distribution. We improve this model in two folds. In terms of learning, we propose a large-margin algorithm to address the exposure bias problem in SeqDPP. In terms of modeling, we design a new probabilistic distribution such that, when it is integrated into SeqDPP, the resulting model accepts user input about the expected length of the summary. Moreover, we also significantly extend a popular video summarization dataset by 1) more egocentric videos, 2) dense user annotations, and 3) a refined evaluation scheme. We conduct extensive experiments on this dataset (about 60 hours of videos in total) and compare our approach to several competitive baselines

    Job matching quality effects of employment promotion measures for people with disabilities

    Get PDF
    In this article, we evaluate the influence that employment promotion measures designed for disabled people have on the latter's job matching quality through the use of matching analysis. We focus on two aspects of quality: the type of contract held (either permanent or temporary) and whether or not the individual is searching for another job. We find that employment promotion measures do not improve the match's job quality. Furthermore, the use of specialized labour market intermediation services by disabled individuals does not affect their job matching quality. As an additional contribution, our definition of disability eludes the self-justification bias

    Optimized Blind Gamma-ray Pulsar Searches at Fixed Computing Budget

    Full text link
    The sensitivity of blind gamma-ray pulsar searches in multiple years worth of photon data, as from the Fermi LAT, is primarily limited by the finite computational resources available. Addressing this "needle in a haystack" problem, we here present methods for optimizing blind searches to achieve the highest sensitivity at fixed computing cost. For both coherent and semicoherent methods, we consider their statistical properties and study their search sensitivity under computational constraints. The results validate a multistage strategy, where the first stage scans the entire parameter space using an efficient semicoherent method and promising candidates are then refined through a fully coherent analysis. We also find that for the first stage of a blind search incoherent harmonic summing of powers is not worthwhile at fixed computing cost for typical gamma-ray pulsars. Further enhancing sensitivity, we present efficiency-improved interpolation techniques for the semicoherent search stage. Via realistic simulations we demonstrate that overall these optimizations can significantly lower the minimum detectable pulsed fraction by almost 50% at the same computational expense.Comment: 22 pages, 13 figures; includes ApJ proof correction

    Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes

    Full text link
    During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data cripples the models' performance. Hence, we propose a curriculum-style learning approach to minimize the domain gap in urban scenery semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and two backbone networks. We also report extensive ablation studies about our approach.Comment: This is the extended version of the ICCV 2017 paper "Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes" with additional GTA experimen

    STEllar Content and Kinematics from high resolution galactic spectra via Maximum A Posteriori

    Full text link
    We introduce STECKMAP (STEllar Content and Kinematics via Maximum A Posteriori), a method to recover the kinematical properties of a galaxy simultaneously with its stellar content from integrated light spectra. It is an extension of STECMAP (astro-ph/0505209) to the general case where the velocity distribution of the underlying stars is also unknown. %and can be used as is for the analysis of large sets of data. The reconstructions of the stellar age distribution, the age-metallicity relation, and the Line-Of-Sight Velocity Distribution (LOSVD) are all non-parametric, i.e. no specific shape is assumed. The only a propri we use are positivity and the requirement that the solution is smooth enough. The smoothness parameter can be set by GCV according to the level of noise in the data in order to avoid overinterpretation. We use single stellar populations (SSP) from PEGASE-HR (R=10000, lambda lambda = 4000-6800 Angstrom, Le Borgne et al. 2004) to test the method through realistic simulations. Non-Gaussianities in LOSVDs are reliably recovered with SNR as low as 20 per 0.2 Angstrom pixel. It turns out that the recovery of the stellar content is not degraded by the simultaneous recovery of the kinematic distribution, so that the resolution in age and error estimates given in Ocvirk et al. 2005 remain appropriate when used with STECKMAP. We also explore the case of age-dependent kinematics (i.e. when each stellar component has its own LOSVD). We separate the bulge and disk components of an idealized simplified spiral galaxy in integrated light from high quality pseudo data (SNR=100 per pixel, R=10000), and constrain the kinematics (mean projected velocity, projected velocity dispersion) and age of both components.Comment: 12 pages, 6 figures, accepted for publication in MNRA
    • 

    corecore