1,359 research outputs found
Large-scale Hierarchical Alignment for Data-driven Text Rewriting
We propose a simple unsupervised method for extracting pseudo-parallel
monolingual sentence pairs from comparable corpora representative of two
different text styles, such as news articles and scientific papers. Our
approach does not require a seed parallel corpus, but instead relies solely on
hierarchical search over pre-trained embeddings of documents and sentences. We
demonstrate the effectiveness of our method through automatic and extrinsic
evaluation on text simplification from the normal to the Simple Wikipedia. We
show that pseudo-parallel sentences extracted with our method not only
supplement existing parallel data, but can even lead to competitive performance
on their own.Comment: RANLP 201
A Latent Source Model for Nonparametric Time Series Classification
For classifying time series, a nearest-neighbor approach is widely used in
practice with performance often competitive with or better than more elaborate
methods such as neural networks, decision trees, and support vector machines.
We develop theoretical justification for the effectiveness of
nearest-neighbor-like classification of time series. Our guiding hypothesis is
that in many applications, such as forecasting which topics will become trends
on Twitter, there aren't actually that many prototypical time series to begin
with, relative to the number of time series we have access to, e.g., topics
become trends on Twitter only in a few distinct manners whereas we can collect
massive amounts of Twitter data. To operationalize this hypothesis, we propose
a latent source model for time series, which naturally leads to a "weighted
majority voting" classification rule that can be approximated by a
nearest-neighbor classifier. We establish nonasymptotic performance guarantees
of both weighted majority voting and nearest-neighbor classification under our
model accounting for how much of the time series we observe and the model
complexity. Experimental results on synthetic data show weighted majority
voting achieving the same misclassification rate as nearest-neighbor
classification while observing less of the time series. We then use weighted
majority to forecast which news topics on Twitter become trends, where we are
able to detect such "trending topics" in advance of Twitter 79% of the time,
with a mean early advantage of 1 hour and 26 minutes, a true positive rate of
95%, and a false positive rate of 4%.Comment: Advances in Neural Information Processing Systems (NIPS 2013
Unitary Positive-Energy Representations of Scalar Bilocal Quantum Fields
The superselection sectors of two classes of scalar bilocal quantum fields in
D>=4 dimensions are explicitly determined by working out the constraints
imposed by unitarity. The resulting classification in terms of the dual of the
respective gauge groups U(N) and O(N) confirms the expectations based on
general results obtained in the framework of local nets in algebraic quantum
field theory, but the approach using standard Lie algebra methods rather than
abstract duality theory is complementary. The result indicates that one does
not lose interesting models if one postulates the absence of scalar fields of
dimension D-2 in models with global conformal invariance. Another remarkable
outcome is the observation that, with an appropriate choice of the Hamiltonian,
a Lie algebra embedded into the associative algebra of observables completely
fixes the representation theory.Comment: 27 pages, v3: result improved by eliminating redundant assumptio
New methods in conformal partial wave analysis
We report on progress concerning the partial wave analysis of higher
correlation functions in conformal quantum field theory.Comment: 16 page
Survey of Ecological Characteristics of Boreal Tree Species in Fennoscandia and the USSR
The paper presents results from a literature study on autecological characteristics of North European and Asian boreal and boreo-nemoral tree species. It also provides general ecological information about the main forest types in the boreal region of the USSR and Fennoscandia. The work has been mainly done during the Young Scientist's Summer Program of 1988 and is a part of the Biosphere Dynamics Project activities.
Species natural history data have been collected and assembled in such a way that they can be used in parameterization and modification of existing (or new-formulated) mixed-species forest stand simulators (e.g., gap models).
The ecological survey involves 27 tree species divided into two groups. The first one, called "dominant tree species", includes 13 major forest-forming species of the present-day boreal forests of the USSR and Fennoscandia, while the second one, "important species", contains species which either dominate forests at the boreal-border areas (i.e. boreo-nemoral forests) or have restricted distribution within the boreal zone. Each species is attempted to be characterized as completely as possible by the following categories: systematics (scientific name, author and synonymies), spatial distribution (description and maps of continuous range of natural growth), habitat requirements (climate, soil types, associated species, and forest types), life history (reproduction and growth), response to environmental factors (light, soil moisture, nutrients, frost, permafrost, fire, windstorm, flooding and poludification), races and hybrids, enemies and diseases.
The data from the autecological reviews are summarized as 24 input model parameters in the Appendix.
The paper should be considered as a first step in building a boreal tree species natural history database to be used with simulation models. It is also the first attempt to compile autecological data about North Asian tree species for modeling purposes
Character-level Chinese-English Translation through ASCII Encoding
Character-level Neural Machine Translation (NMT) models have recently
achieved impressive results on many language pairs. They mainly do well for
Indo-European language pairs, where the languages share the same writing
system. However, for translating between Chinese and English, the gap between
the two different writing systems poses a major challenge because of a lack of
systematic correspondence between the individual linguistic units. In this
paper, we enable character-level NMT for Chinese, by breaking down Chinese
characters into linguistic units similar to that of Indo-European languages. We
use the Wubi encoding scheme, which preserves the original shape and semantic
information of the characters, while also being reversible. We show promising
results from training Wubi-based models on the character- and subword-level
with recurrent as well as convolutional models.Comment: 7 pages, 3 figures, 3rd Conference on Machine Translation (WMT18),
201
Infinite dimensional Lie algebras in 4D conformal quantum field theory
The concept of global conformal invariance (GCI) opens the way of applying
algebraic techniques, developed in the context of 2-dimensional chiral
conformal field theory, to a higher (even) dimensional space-time. In
particular, a system of GCI scalar fields of conformal dimension two gives rise
to a Lie algebra of harmonic bilocal fields, V_m(x,y), where the m span a
finite dimensional real matrix algebra M closed under transposition. The
associative algebra M is irreducible iff its commutant M' coincides with one of
the three real division rings. The Lie algebra of (the modes of) the bilocal
fields is in each case an infinite dimensional Lie algebra: a central extension
of sp(infty,R) corresponding to the field R of reals, of u(infty,infty)
associated to the field C of complex numbers, and of so*(4 infty) related to
the algebra H of quaternions. They give rise to quantum field theory models
with superselection sectors governed by the (global) gauge groups O(N), U(N),
and U(N,H)=Sp(2N), respectively.Comment: 16 pages, with minor improvements as to appear in J. Phys.
Jacobi Identity for Vertex Algebras in Higher Dimensions
Vertex algebras in higher dimensions provide an algebraic framework for
investigating axiomatic quantum field theory with global conformal invariance.
We develop further the theory of such vertex algebras by introducing formal
calculus techniques and investigating the notion of polylocal fields. We derive
a Jacobi identity which together with the vacuum axiom can be taken as an
equivalent definition of vertex algebra.Comment: 35 pages, references adde
- …