444 research outputs found
A Continuously Growing Dataset of Sentential Paraphrases
A major challenge in paraphrase research is the lack of parallel corpora. In
this paper, we present a new method to collect large-scale sentential
paraphrases from Twitter by linking tweets through shared URLs. The main
advantage of our method is its simplicity, as it gets rid of the classifier or
human in the loop needed to select data before annotation and subsequent
application of paraphrase identification algorithms in the previous work. We
present the largest human-labeled paraphrase corpus to date of 51,524 sentence
pairs and the first cross-domain benchmarking for automatic paraphrase
identification. In addition, we show that more than 30,000 new sentential
paraphrases can be easily and continuously captured every month at ~70%
precision, and demonstrate their utility for downstream NLP tasks through
phrasal paraphrase extraction. We make our code and data freely available.Comment: 11 pages, accepted to EMNLP 201
“I” vs “me”: the urbanization of “post-80s” and “post-90s” Chinese migrant workers
The difference in self-identity among migrant workers of the new
generation leads them towards different desires regarding urbanization.
In this regard, it is imperative to explore the influence of
self-identity on the migrant workers’ willingness to stay. To
explore the phenomenon empirically, the current study used data
sourced from the China Migrants Dynamics Survey (CMDS), during
the year, 2017. The study employed the Heckman two-stage
selection model to explore the study objective. Further, the study
also employed the machine learning methods for robustness
check. The outcome showed that the “I” identity has a more significant
impact on the urbanization by migrant workers belonging
to the “post-90s”. In comparison, the identity of “Me” has a more
significant impact on the urbanization by migrant workers
belonging to the era of the 1980s. And it is clear that if “post-
80s” and “post-90s” migrant workers are uniformly divided into
the union of new generation, the differences and characteristics
within them may conceal. The overall findings proposes that
based on the differences in migrant workers’ self-identity, both
born in the 1980s and 1990s, there is a need to formulate related
policies to promote their residence and boost urbanization
An exact solution of spherical mean-field plus orbit-dependent non-separable pairing model with two non-degenerate j-orbits
An exact solution of nuclear spherical mean-field plus orbit-dependent
non-separable pairing model with two non-degenerate j-orbits is presented. The
extended one-variable Heine-Stieltjes polynomials associated to the Bethe
ansatz equations of the solution are determined, of which the sets of the zeros
give the solution of the model, and can be determined relatively easily. A
comparison of the solution to that of the standard pairing interaction with
constant interaction strength among pairs in any orbit is made. It is shown
that the overlaps of eigenstates of the model with those of the standard
pairing model are always large, especially for the ground and the first excited
state. However, the quantum phase crossover in the non-separable pairing model
cannot be accounted for by the standard pairing interaction.Comment: 5 pages, 1 figure, LaTe
Learning to Predict the Cosmological Structure Formation
Matter evolved under influence of gravity from minuscule density
fluctuations. Non-perturbative structure formed hierarchically over all scales,
and developed non-Gaussian features in the Universe, known as the Cosmic Web.
To fully understand the structure formation of the Universe is one of the holy
grails of modern astrophysics. Astrophysicists survey large volumes of the
Universe and employ a large ensemble of computer simulations to compare with
the observed data in order to extract the full information of our own Universe.
However, to evolve trillions of galaxies over billions of years even with the
simplest physics is a daunting task. We build a deep neural network, the Deep
Density Displacement Model (hereafter DM), to predict the non-linear
structure formation of the Universe from simple linear perturbation theory. Our
extensive analysis, demonstrates that DM outperforms the second order
perturbation theory (hereafter 2LPT), the commonly used fast approximate
simulation method, in point-wise comparison, 2-point correlation, and 3-point
correlation. We also show that DM is able to accurately extrapolate far
beyond its training data, and predict structure formation for significantly
different cosmological parameters. Our study proves, for the first time, that
deep learning is a practical and accurate alternative to approximate
simulations of the gravitational structure formation of the Universe.Comment: 8 pages, 5 figures, 1 tabl
Detecting Galaxy-Filament Alignments in the Sloan Digital Sky Survey III
Previous studies have shown the filamentary structures in the cosmic web
influence the alignments of nearby galaxies. We study this effect in the LOWZ
sample of the Sloan Digital Sky Survey using the "Cosmic Web Reconstruction"
filament catalogue. We find that LOWZ galaxies exhibit a small but
statistically significant alignment in the direction parallel to the
orientation of nearby filaments. This effect is detectable even in the absence
of nearby galaxy clusters, which suggests it is an effect from the matter
distribution in the filament. A nonparametric regression model suggests that
the alignment effect with filaments extends over separations of 30-40 Mpc. We
find that galaxies that are bright and early-forming align more strongly with
the directions of nearby filaments than those that are faint and late-forming;
however, trends with stellar mass are less statistically significant, within
the narrow range of stellar mass of this sample.Comment: 14 pages, 13 figures. Accepted to the MNRA
- …