11,568 research outputs found
A Study of SVM Kernel Functions for Sensitivity Classification Ensembles with POS Sequences
Freedom of Information (FOI) laws legislate that government documents should be opened to the public. However, many government documents contain sensitive information, such as confidential information, that is exempt from release. Therefore, government documents must be sensitivity reviewed prior to release, to identify and close any sensitive information. With the adoption of born-digital documents, such as email, there is a need for automatic sensitivity classification to assist digital sensitivity review. SVM classifiers and Part-of-Speech sequences have separately been shown to be promising for sensitivity classification. However, sequence classification methodologies, and specifically SVM kernel functions, have not been fully investigated for sensitivity classification. Therefore, in this work, we present an evaluation of five SVM kernel functions for sensitivity classification using POS sequences. Moreover, we show that an ensemble classifier that combines POS sequence classification with text classification can significantly improve sensitivity classification effectiveness (+6.09% F2) compared with a text classification baseline, according to McNemar's test of significance
Domain Adaptive Neural Networks for Object Recognition
We propose a simple neural network model to deal with the domain adaptation
problem in object recognition. Our model incorporates the Maximum Mean
Discrepancy (MMD) measure as a regularization in the supervised learning to
reduce the distribution mismatch between the source and target domains in the
latent space. From experiments, we demonstrate that the MMD regularization is
an effective tool to provide good domain adaptation models on both SURF
features and raw image pixels of a particular image data set. We also show that
our proposed model, preceded by the denoising auto-encoder pretraining,
achieves better performance than recent benchmark models on the same data sets.
This work represents the first study of MMD measure in the context of neural
networks
Improving Sequential Determinantal Point Processes for Supervised Video Summarization
It is now much easier than ever before to produce videos. While the
ubiquitous video data is a great source for information discovery and
extraction, the computational challenges are unparalleled. Automatically
summarizing the videos has become a substantial need for browsing, searching,
and indexing visual content. This paper is in the vein of supervised video
summarization using sequential determinantal point process (SeqDPP), which
models diversity by a probabilistic distribution. We improve this model in two
folds. In terms of learning, we propose a large-margin algorithm to address the
exposure bias problem in SeqDPP. In terms of modeling, we design a new
probabilistic distribution such that, when it is integrated into SeqDPP, the
resulting model accepts user input about the expected length of the summary.
Moreover, we also significantly extend a popular video summarization dataset by
1) more egocentric videos, 2) dense user annotations, and 3) a refined
evaluation scheme. We conduct extensive experiments on this dataset (about 60
hours of videos in total) and compare our approach to several competitive
baselines
Job matching quality effects of employment promotion measures for people with disabilities
In this article, we evaluate the influence that employment promotion measures designed for disabled people have on the latter's job matching quality through the use of matching analysis. We focus on two aspects of quality: the type of contract held (either permanent or temporary) and whether or not the individual is searching for another job. We find that employment promotion measures do not improve the match's job quality. Furthermore, the use of specialized labour market intermediation services by disabled individuals does not affect their job matching quality. As an additional contribution, our definition of disability eludes the self-justification bias
Optimized Blind Gamma-ray Pulsar Searches at Fixed Computing Budget
The sensitivity of blind gamma-ray pulsar searches in multiple years worth of
photon data, as from the Fermi LAT, is primarily limited by the finite
computational resources available. Addressing this "needle in a haystack"
problem, we here present methods for optimizing blind searches to achieve the
highest sensitivity at fixed computing cost. For both coherent and semicoherent
methods, we consider their statistical properties and study their search
sensitivity under computational constraints. The results validate a multistage
strategy, where the first stage scans the entire parameter space using an
efficient semicoherent method and promising candidates are then refined through
a fully coherent analysis. We also find that for the first stage of a blind
search incoherent harmonic summing of powers is not worthwhile at fixed
computing cost for typical gamma-ray pulsars. Further enhancing sensitivity, we
present efficiency-improved interpolation techniques for the semicoherent
search stage. Via realistic simulations we demonstrate that overall these
optimizations can significantly lower the minimum detectable pulsed fraction by
almost 50% at the same computational expense.Comment: 22 pages, 13 figures; includes ApJ proof correction
Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes
During the last half decade, convolutional neural networks (CNNs) have
triumphed over semantic segmentation, which is one of the core tasks in many
applications such as autonomous driving. However, to train CNNs requires a
considerable amount of data, which is difficult to collect and laborious to
annotate. Recent advances in computer graphics make it possible to train CNNs
on photo-realistic synthetic imagery with computer-generated annotations.
Despite this, the domain mismatch between the real images and the synthetic
data cripples the models' performance. Hence, we propose a curriculum-style
learning approach to minimize the domain gap in urban scenery semantic
segmentation. The curriculum domain adaptation solves easy tasks first to infer
necessary properties about the target domain; in particular, the first task is
to learn global label distributions over images and local distributions over
landmark superpixels. These are easy to estimate because images of urban scenes
have strong idiosyncrasies (e.g., the size and spatial relations of buildings,
streets, cars, etc.). We then train a segmentation network while regularizing
its predictions in the target domain to follow those inferred properties. In
experiments, our method outperforms the baselines on two datasets and two
backbone networks. We also report extensive ablation studies about our
approach.Comment: This is the extended version of the ICCV 2017 paper "Curriculum
Domain Adaptation for Semantic Segmentation of Urban Scenes" with additional
GTA experimen
STEllar Content and Kinematics from high resolution galactic spectra via Maximum A Posteriori
We introduce STECKMAP (STEllar Content and Kinematics via Maximum A
Posteriori), a method to recover the kinematical properties of a galaxy
simultaneously with its stellar content from integrated light spectra. It is an
extension of STECMAP (astro-ph/0505209) to the general case where the velocity
distribution of the underlying stars is also unknown.
%and can be used as is for the analysis of large sets of data. The
reconstructions of the stellar age distribution, the age-metallicity relation,
and the Line-Of-Sight Velocity Distribution (LOSVD) are all non-parametric,
i.e. no specific shape is assumed. The only a propri we use are positivity and
the requirement that the solution is smooth enough. The smoothness parameter
can be set by GCV according to the level of noise in the data in order to avoid
overinterpretation. We use single stellar populations (SSP) from PEGASE-HR
(R=10000, lambda lambda = 4000-6800 Angstrom, Le Borgne et al. 2004) to test
the method through realistic simulations. Non-Gaussianities in LOSVDs are
reliably recovered with SNR as low as 20 per 0.2 Angstrom pixel. It turns out
that the recovery of the stellar content is not degraded by the simultaneous
recovery of the kinematic distribution, so that the resolution in age and error
estimates given in Ocvirk et al. 2005 remain appropriate when used with
STECKMAP. We also explore the case of age-dependent kinematics (i.e. when each
stellar component has its own LOSVD). We separate the bulge and disk components
of an idealized simplified spiral galaxy in integrated light from high quality
pseudo data (SNR=100 per pixel, R=10000), and constrain the kinematics (mean
projected velocity, projected velocity dispersion) and age of both components.Comment: 12 pages, 6 figures, accepted for publication in MNRA
- âŠ