15,652 research outputs found
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
Pretrained contextual representation models (Peters et al., 2018; Devlin et
al., 2018) have pushed forward the state-of-the-art on many NLP tasks. A new
release of BERT (Devlin, 2018) includes a model simultaneously pretrained on
104 languages with impressive performance for zero-shot cross-lingual transfer
on a natural language inference task. This paper explores the broader
cross-lingual potential of mBERT (multilingual) as a zero shot language
transfer model on 5 NLP tasks covering a total of 39 languages from various
language families: NLI, document classification, NER, POS tagging, and
dependency parsing. We compare mBERT with the best-published methods for
zero-shot cross-lingual transfer and find mBERT competitive on each task.
Additionally, we investigate the most effective strategy for utilizing mBERT in
this manner, determine to what extent mBERT generalizes away from language
specific features, and measure factors that influence cross-lingual transfer.Comment: EMNLP 2019 Camera Read
A hierarchical loss and its problems when classifying non-hierarchically
Failing to distinguish between a sheepdog and a skyscraper should be worse
and penalized more than failing to distinguish between a sheepdog and a poodle;
after all, sheepdogs and poodles are both breeds of dogs. However, existing
metrics of failure (so-called "loss" or "win") used in textual or visual
classification/recognition via neural networks seldom leverage a-priori
information, such as a sheepdog being more similar to a poodle than to a
skyscraper. We define a metric that, inter alia, can penalize failure to
distinguish between a sheepdog and a skyscraper more than failure to
distinguish between a sheepdog and a poodle. Unlike previously employed
possibilities, this metric is based on an ultrametric tree associated with any
given tree organization into a semantically meaningful hierarchy of a
classifier's classes. An ultrametric tree is a tree with a so-called
ultrametric distance metric such that all leaves are at the same distance from
the root. Unfortunately, extensive numerical experiments indicate that the
standard practice of training neural networks via stochastic gradient descent
with random starting points often drives down the hierarchical loss nearly as
much when minimizing the standard cross-entropy loss as when trying to minimize
the hierarchical loss directly. Thus, this hierarchical loss is unreliable as
an objective for plain, randomly started stochastic gradient descent to
minimize; the main value of the hierarchical loss may be merely as a meaningful
metric of success of a classifier.Comment: 19 pages, 4 figures, 7 table
The Emission from Post-Shock Flows in mCVs
We re-examine the vertical structure of the post-shock flow in the accretion
region of mCVs, and the X-ray emission as a function of height. We then predict
X-ray light curves and phase-resolved spectra, taking into account the vertical
structure, examine the implications and check whether the predicted heights are
compatible with observation.Comment: 6 pages, to be published in the Proc of the Symp. to mark the 60th
birthday of Brian Warne
Local times of multifractional Brownian sheets
Denote by a function in
with values in . Let
be an
-multifractional Brownian sheet (mfBs) with Hurst functional .
Under some regularity conditions on the function , we prove the
existence, joint continuity and the H\"{o}lder regularity of the local times of
. We also determine the Hausdorff dimensions of the level sets
of . Our results extend the corresponding results for
fractional Brownian sheets and multifractional Brownian motion to
multifractional Brownian sheets.Comment: Published in at http://dx.doi.org/10.3150/08-BEJ126 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Frequency-dependent AVO attribute: theory and example
Fluid-saturated rocks generally have seismic velocities that depend upon frequency. Exploring this property may help us discriminate different fluids from seismic data. In this paper, we introduce a scheme to calculate a frequency-dependent AVO attribute in order to estimate seismic dispersion from pre-stack data, and apply it to North Sea data. The scheme essentially combines the two-term approximation of Smith and Gidlow (1987) with the method of spectral decomposition based on the Wigner-Ville distribution, which is used to achieve high resolution. The result suggests the potential of this method for detection of seismic dispersion due to fluid saturation
CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-core Architectures
The use of graphics processing units (GPUs) in
high-performance parallel computing continues to become more
prevalent, often as part of a heterogeneous system. For years,
CUDA has been the de facto programming environment for
nearly all general-purpose GPU (GPGPU) applications. In spite
of this, the framework is available only on NVIDIA GPUs,
traditionally requiring reimplementation in other frameworks
in order to utilize additional multi- or many-core devices.
On the other hand, OpenCL provides an open and vendorneutral
programming environment and runtime system. With
implementations available for CPUs, GPUs, and other types of
accelerators, OpenCL therefore holds the promise of a “write
once, run anywhere” ecosystem for heterogeneous computing.
Given the many similarities between CUDA and OpenCL,
manually porting a CUDA application to OpenCL is typically
straightforward, albeit tedious and error-prone. In response
to this issue, we created CU2CL, an automated CUDA-to-
OpenCL source-to-source translator that possesses a novel design
and clever reuse of the Clang compiler framework. Currently,
the CU2CL translator covers the primary constructs found in
CUDA runtime API, and we have successfully translated many
applications from the CUDA SDK and Rodinia benchmark suite.
The performance of our automatically translated applications via
CU2CL is on par with their manually ported countparts
On the Nature of Ultra-Luminous X-ray Sources from Optical/IR Measurements
We present a model for the prediction of the optical/infra-red emission from
ULXs. In the model, ULXs are binary systems with accretion taking place through
Roche lobe overflow. We show that irradiation effects and presence of an
accretion disk significantly modify the optical/infrared flux compared to
single stars, and also that the system orientation is important. We include
additional constraints from the mass transfer rate to constrain the parameters
of the donor star, and to a lesser extent the mass of the BH. We apply the
model to fit photometric data for several ULX counterparts. We find that most
donor stars are of spectral type B and are older and less massive than reported
elsewhere, but that no late-type donors are admissable. The degeneracy of the
acceptable parameter space will be significantly reduced with observations over
a wider spectral range, and if time-resolved data become available
- …