15,652 research outputs found

    Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

    Full text link
    Pretrained contextual representation models (Peters et al., 2018; Devlin et al., 2018) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task. This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. We compare mBERT with the best-published methods for zero-shot cross-lingual transfer and find mBERT competitive on each task. Additionally, we investigate the most effective strategy for utilizing mBERT in this manner, determine to what extent mBERT generalizes away from language specific features, and measure factors that influence cross-lingual transfer.Comment: EMNLP 2019 Camera Read

    A hierarchical loss and its problems when classifying non-hierarchically

    Full text link
    Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called "loss" or "win") used in textual or visual classification/recognition via neural networks seldom leverage a-priori information, such as a sheepdog being more similar to a poodle than to a skyscraper. We define a metric that, inter alia, can penalize failure to distinguish between a sheepdog and a skyscraper more than failure to distinguish between a sheepdog and a poodle. Unlike previously employed possibilities, this metric is based on an ultrametric tree associated with any given tree organization into a semantically meaningful hierarchy of a classifier's classes. An ultrametric tree is a tree with a so-called ultrametric distance metric such that all leaves are at the same distance from the root. Unfortunately, extensive numerical experiments indicate that the standard practice of training neural networks via stochastic gradient descent with random starting points often drives down the hierarchical loss nearly as much when minimizing the standard cross-entropy loss as when trying to minimize the hierarchical loss directly. Thus, this hierarchical loss is unreliable as an objective for plain, randomly started stochastic gradient descent to minimize; the main value of the hierarchical loss may be merely as a meaningful metric of success of a classifier.Comment: 19 pages, 4 figures, 7 table

    The Emission from Post-Shock Flows in mCVs

    Get PDF
    We re-examine the vertical structure of the post-shock flow in the accretion region of mCVs, and the X-ray emission as a function of height. We then predict X-ray light curves and phase-resolved spectra, taking into account the vertical structure, examine the implications and check whether the predicted heights are compatible with observation.Comment: 6 pages, to be published in the Proc of the Symp. to mark the 60th birthday of Brian Warne

    Local times of multifractional Brownian sheets

    Full text link
    Denote by H(t)=(H1(t),...,HN(t))H(t)=(H_1(t),...,H_N(t)) a function in tR+Nt\in{\mathbb{R}}_+^N with values in (0,1)N(0,1)^N. Let {BH(t)(t)}={BH(t)(t),tR+N}\{B^{H(t)}(t)\}=\{B^{H(t)}(t),t\in{\mathbb{R}}^N_+\} be an (N,d)(N,d)-multifractional Brownian sheet (mfBs) with Hurst functional H(t)H(t). Under some regularity conditions on the function H(t)H(t), we prove the existence, joint continuity and the H\"{o}lder regularity of the local times of {BH(t)(t)}\{B^{H(t)}(t)\}. We also determine the Hausdorff dimensions of the level sets of {BH(t)(t)}\{B^{H(t)}(t)\}. Our results extend the corresponding results for fractional Brownian sheets and multifractional Brownian motion to multifractional Brownian sheets.Comment: Published in at http://dx.doi.org/10.3150/08-BEJ126 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Frequency-dependent AVO attribute: theory and example

    Get PDF
    Fluid-saturated rocks generally have seismic velocities that depend upon frequency. Exploring this property may help us discriminate different fluids from seismic data. In this paper, we introduce a scheme to calculate a frequency-dependent AVO attribute in order to estimate seismic dispersion from pre-stack data, and apply it to North Sea data. The scheme essentially combines the two-term approximation of Smith and Gidlow (1987) with the method of spectral decomposition based on the Wigner-Ville distribution, which is used to achieve high resolution. The result suggests the potential of this method for detection of seismic dispersion due to fluid saturation

    CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-core Architectures

    Get PDF
    The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks in order to utilize additional multi- or many-core devices. On the other hand, OpenCL provides an open and vendorneutral programming environment and runtime system. With implementations available for CPUs, GPUs, and other types of accelerators, OpenCL therefore holds the promise of a “write once, run anywhere” ecosystem for heterogeneous computing. Given the many similarities between CUDA and OpenCL, manually porting a CUDA application to OpenCL is typically straightforward, albeit tedious and error-prone. In response to this issue, we created CU2CL, an automated CUDA-to- OpenCL source-to-source translator that possesses a novel design and clever reuse of the Clang compiler framework. Currently, the CU2CL translator covers the primary constructs found in CUDA runtime API, and we have successfully translated many applications from the CUDA SDK and Rodinia benchmark suite. The performance of our automatically translated applications via CU2CL is on par with their manually ported countparts

    On the Nature of Ultra-Luminous X-ray Sources from Optical/IR Measurements

    Full text link
    We present a model for the prediction of the optical/infra-red emission from ULXs. In the model, ULXs are binary systems with accretion taking place through Roche lobe overflow. We show that irradiation effects and presence of an accretion disk significantly modify the optical/infrared flux compared to single stars, and also that the system orientation is important. We include additional constraints from the mass transfer rate to constrain the parameters of the donor star, and to a lesser extent the mass of the BH. We apply the model to fit photometric data for several ULX counterparts. We find that most donor stars are of spectral type B and are older and less massive than reported elsewhere, but that no late-type donors are admissable. The degeneracy of the acceptable parameter space will be significantly reduced with observations over a wider spectral range, and if time-resolved data become available
    corecore