1,110 research outputs found
Large-scale Hierarchical Alignment for Data-driven Text Rewriting
We propose a simple unsupervised method for extracting pseudo-parallel
monolingual sentence pairs from comparable corpora representative of two
different text styles, such as news articles and scientific papers. Our
approach does not require a seed parallel corpus, but instead relies solely on
hierarchical search over pre-trained embeddings of documents and sentences. We
demonstrate the effectiveness of our method through automatic and extrinsic
evaluation on text simplification from the normal to the Simple Wikipedia. We
show that pseudo-parallel sentences extracted with our method not only
supplement existing parallel data, but can even lead to competitive performance
on their own.Comment: RANLP 201
Infinite dimensional Lie algebras in 4D conformal quantum field theory
The concept of global conformal invariance (GCI) opens the way of applying
algebraic techniques, developed in the context of 2-dimensional chiral
conformal field theory, to a higher (even) dimensional space-time. In
particular, a system of GCI scalar fields of conformal dimension two gives rise
to a Lie algebra of harmonic bilocal fields, V_m(x,y), where the m span a
finite dimensional real matrix algebra M closed under transposition. The
associative algebra M is irreducible iff its commutant M' coincides with one of
the three real division rings. The Lie algebra of (the modes of) the bilocal
fields is in each case an infinite dimensional Lie algebra: a central extension
of sp(infty,R) corresponding to the field R of reals, of u(infty,infty)
associated to the field C of complex numbers, and of so*(4 infty) related to
the algebra H of quaternions. They give rise to quantum field theory models
with superselection sectors governed by the (global) gauge groups O(N), U(N),
and U(N,H)=Sp(2N), respectively.Comment: 16 pages, with minor improvements as to appear in J. Phys.
Character-level Chinese-English Translation through ASCII Encoding
Character-level Neural Machine Translation (NMT) models have recently
achieved impressive results on many language pairs. They mainly do well for
Indo-European language pairs, where the languages share the same writing
system. However, for translating between Chinese and English, the gap between
the two different writing systems poses a major challenge because of a lack of
systematic correspondence between the individual linguistic units. In this
paper, we enable character-level NMT for Chinese, by breaking down Chinese
characters into linguistic units similar to that of Indo-European languages. We
use the Wubi encoding scheme, which preserves the original shape and semantic
information of the characters, while also being reversible. We show promising
results from training Wubi-based models on the character- and subword-level
with recurrent as well as convolutional models.Comment: 7 pages, 3 figures, 3rd Conference on Machine Translation (WMT18),
201
Embedding-based Scientific Literature Discovery in a Text Editor Application
Each claim in a research paper requires all relevant prior knowledge to be
discovered, assimilated, and appropriately cited. However, despite the
availability of powerful search engines and sophisticated text editing
software, discovering relevant papers and integrating the knowledge into a
manuscript remain complex tasks associated with high cognitive load. To define
comprehensive search queries requires strong motivation from authors,
irrespective of their familiarity with the research field. Moreover, switching
between independent applications for literature discovery, bibliography
management, reading papers, and writing text burdens authors further and
interrupts their creative process. Here, we present a web application that
combines text editing and literature discovery in an interactive user
interface. The application is equipped with a search engine that couples
Boolean keyword filtering with nearest neighbor search over text embeddings,
providing a discovery experience tuned to an author's manuscript and his
interests. Our application aims to take a step towards more enjoyable and
effortless academic writing.
The demo of the application (https://SciEditorDemo2020.herokuapp.com/) and a
short video tutorial (https://youtu.be/pkdVU60IcRc) are available online
Identification of Distinct Bacillus thuringiensis 4A4 Nematicidal Factors Using the Model Nematodes Pristionchus pacificus and Caenorhabditis elegans
Bacillus thuringiensis has been extensively used for the biological control of insect pests. Nematicidal B. thuringiensis strains have also been identified; however, virulence factors of such strains are poorly investigated. Here, we describe virulence factors of the nematicidal B. thuringiensis 4A4 strain, using the model nematodes Pristionchus pacificus and Caenorhabditis elegans. We show that B. thuringiensis 4A4 kills both nematodes via intestinal damage. Whole genome sequencing of B. thuringiensis 4A4 identified Cry21Ha, Cry1Ba, Vip1/Vip2 and β-exotoxin as potential nematicidal factors. Only Cry21Ha showed toxicity to C. elegans, while neither Cry nor Vip toxins were active against P. pacificus, when expressed in E. coli. Purified crystals also failed to intoxicate P. pacificus, while autoclaved spore-crystal mixture of B. thuringiensis 4A4 retained toxicity, suggesting that primary β-exotoxin is responsible for P. pacificus killing. In support of this, we found that a β-exotoxin-deficient variant of B. thuringiensis 4A4, generated by plasmid curing lost virulence to the nematodes. Thus, using two model nematodes we revealed virulence factors of the nematicidal strain B. thuringiensis 4A4 and showed the multifactorial nature of its virulence
Physical properties, starspot activity, orbital obliquity, and transmission spectrum of the Qatar-2 planetary system from multi-colour photometry
We present seventeen high-precision light curves of five transits of the
planet Qatar-2b, obtained from four defocussed 2m-class telescopes. Three of
the transits were observed simultaneously in the SDSS griz passbands using the
seven-beam GROND imager on the MPG/ESO 2.2-m telescope. A fourth was observed
simultaneously in Gunn grz using the CAHA 2.2-m telescope with BUSCA, and in r
using the Cassini 1.52-m telescope. Every light curve shows small anomalies due
to the passage of the planetary shadow over a cool spot on the surface of the
host star. We fit the light curves with the prism+gemc model to obtain the
photometric parameters of the system and the position, size and contrast of
each spot. We use these photometric parameters and published spectroscopic
measurements to obtain the physical properties of the system to high precision,
finding a larger radius and lower density for both star and planet than
previously thought. By tracking the change in position of one starspot between
two transit observations we measure the orbital obliquity of Qatar-2 b to be
4.3 \pm 4.5 degree, strongly indicating an alignment of the stellar spin with
the orbit of the planet. We calculate the rotation period and velocity of the
cool host star to be 11.4 \pm 0.5 d and 3.28 \pm 0.13 km/s at a colatitude of
74 degree. We assemble the planet's transmission spectrum over the 386-976 nm
wavelength range and search for variations of the measured radius of Qatar-2 b
as a function of wavelength. Our analysis highlights a possible H2/He Rayleigh
scattering in the blue.Comment: 20 pages, 14 figures, to appear in Monthly Notices of the Royal
Astronomical Societ
- …