Search CORE

15,652 research outputs found

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Author: Dredze Mark
Wu Shijie
Publication venue
Publication date: 01/01/2019
Field of study

Pretrained contextual representation models (Peters et al., 2018; Devlin et al., 2018) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task. This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. We compare mBERT with the best-published methods for zero-shot cross-lingual transfer and find mBERT competitive on each task. Additionally, we investigate the most effective strategy for utilizing mBERT in this manner, determine to what extent mBERT generalizes away from language specific features, and measure factors that influence cross-lingual transfer.Comment: EMNLP 2019 Camera Read

arXiv.org e-Print Archive

Crossref

A hierarchical loss and its problems when classifying non-hierarchically

Author: LeCun Yann
Tygert Mark
Wu Cinna
Publication venue
Publication date: 01/01/2019
Field of study

Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called "loss" or "win") used in textual or visual classification/recognition via neural networks seldom leverage a-priori information, such as a sheepdog being more similar to a poodle than to a skyscraper. We define a metric that, inter alia, can penalize failure to distinguish between a sheepdog and a skyscraper more than failure to distinguish between a sheepdog and a poodle. Unlike previously employed possibilities, this metric is based on an ultrametric tree associated with any given tree organization into a semantically meaningful hierarchy of a classifier's classes. An ultrametric tree is a tree with a so-called ultrametric distance metric such that all leaves are at the same distance from the root. Unfortunately, extensive numerical experiments indicate that the standard practice of training neural networks via stochastic gradient descent with random starting points often drives down the hierarchical loss nearly as much when minimizing the standard cross-entropy loss as when trying to minimize the hierarchical loss directly. Thus, this hierarchical loss is unreliable as an objective for plain, randomly started stochastic gradient descent to minimize; the main value of the hierarchical loss may be merely as a meaningful metric of success of a classifier.Comment: 19 pages, 4 figures, 7 table

arXiv.org e-Print Archive

Directory of Open Access Journals

The Emission from Post-Shock Flows in mCVs

Author: Cropper Mark
Ramsay Gavin
Wu Kinwah
Publication venue: 'Elsevier BV'
Publication date: 21/07/1999
Field of study

We re-examine the vertical structure of the post-shock flow in the accretion region of mCVs, and the X-ray emission as a function of height. We then predict X-ray light curves and phase-resolved spectra, taking into account the vertical structure, examine the implications and check whether the predicted heights are compatible with observation.Comment: 6 pages, to be published in the Proc of the Symp. to mark the 60th birthday of Brian Warne

arXiv.org e-Print Archive

Crossref

CERN Document Server

Local times of multifractional Brownian sheets

Author: Meerschaert Mark
Wu Dongsheng
Xiao Yimin
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 01/01/2007
Field of study

Denote by

H(t)=(H_1(t),...,H_N(t))

a function in

t\in{\mathbb{R}}_+^N

with values in

(0,1)^N

. Let

\{B^{H(t)}(t)\}=\{B^{H(t)}(t),t\in{\mathbb{R}}^N_+\}

be an

(N,d)

-multifractional Brownian sheet (mfBs) with Hurst functional

H(t)

. Under some regularity conditions on the function

H(t)

, we prove the existence, joint continuity and the H\"{o}lder regularity of the local times of

\{B^{H(t)}(t)\}

. We also determine the Hausdorff dimensions of the level sets of

\{B^{H(t)}(t)\}

. Our results extend the corresponding results for fractional Brownian sheets and multifractional Brownian motion to multifractional Brownian sheets.Comment: Published in at http://dx.doi.org/10.3150/08-BEJ126 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

CiteSeerX

Crossref

Frequency-dependent AVO attribute: theory and example

Author: Chapman Mark
Li Xiang-Yang
Wu Xiaoyang
Publication venue: 'EAGE Publications'
Publication date: 01/06/2012
Field of study

Fluid-saturated rocks generally have seismic velocities that depend upon frequency. Exploring this property may help us discriminate different fluids from seismic data. In this paper, we introduce a scheme to calculate a frequency-dependent AVO attribute in order to estimate seismic dispersion from pre-stack data, and apply it to North Sea data. The scheme essentially combines the two-term approximation of Smith and Gidlow (1987) with the method of spectral decomposition based on the Wigner-Ville distribution, which is used to achieve high resolution. The result suggests the potential of this method for detection of seismic dispersion due to fluid saturation

Crossref

NERC Open Research Archive

CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-core Architectures

Author: Feng Wu-chun
Gardner Mark
Martinez Gabriel
Publication venue
Publication date: 01/01/2011
Field of study

The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks in order to utilize additional multi- or many-core devices. On the other hand, OpenCL provides an open and vendorneutral programming environment and runtime system. With implementations available for CPUs, GPUs, and other types of accelerators, OpenCL therefore holds the promise of a “write once, run anywhere” ecosystem for heterogeneous computing. Given the many similarities between CUDA and OpenCL, manually porting a CUDA application to OpenCL is typically straightforward, albeit tedious and error-prone. In response to this issue, we created CU2CL, an automated CUDA-to- OpenCL source-to-source translator that possesses a novel design and clever reuse of the Clang compiler framework. Currently, the CU2CL translator covers the primary constructs found in CUDA runtime API, and we have successfully translated many applications from the CUDA SDK and Rodinia benchmark suite. The performance of our automatically translated applications via CU2CL is on par with their manually ported countparts

Computer Science Technical Reports @Virginia Tech

CiteSeerX

On the Nature of Ultra-Luminous X-ray Sources from Optical/IR Measurements

Author: Chris Copperwheat
Fabbiano
Kinwah Wu
Lejeune
Mark Cropper
Ritter
Roberto Soria
Wu
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 11/04/2007
Field of study

We present a model for the prediction of the optical/infra-red emission from ULXs. In the model, ULXs are binary systems with accretion taking place through Roche lobe overflow. We show that irradiation effects and presence of an accretion disk significantly modify the optical/infrared flux compared to single stars, and also that the system orientation is important. We include additional constraints from the mass transfer rate to constrain the parameters of the donor star, and to a lesser extent the mass of the BH. We apply the model to fit photometric data for several ULX counterparts. We find that most donor stars are of spectral type B and are older and less massive than reported elsewhere, but that no late-type donors are admissable. The degeneracy of the acceptable parameter space will be significantly reduced with observations over a wider spectral range, and if time-resolved data become available

arXiv.org e-Print Archive

Crossref

UCL Discovery