5,115 research outputs found
Building Morphological Chains for Agglutinative Languages
In this paper, we build morphological chains for agglutinative languages by
using a log-linear model for the morphological segmentation task. The model is
based on the unsupervised morphological segmentation system called
MorphoChains. We extend MorphoChains log linear model by expanding the
candidate space recursively to cover more split points for agglutinative
languages such as Turkish, whereas in the original model candidates are
generated by considering only binary segmentation of each word. The results
show that we improve the state-of-art Turkish scores by 12% having a F-measure
of 72% and we improve the English scores by 3% having a F-measure of 74%.
Eventually, the system outperforms both MorphoChains and other well-known
unsupervised morphological segmentation systems. The results indicate that
candidate generation plays an important role in such an unsupervised log-linear
model that is learned using contrastive estimation with negative samples.Comment: 10 pages, accepted and presented at the CICLing 2017 (18th
International Conference on Intelligent Text Processing and Computational
Linguistics
Signatures of Resonant Super-Partner Production with Charged-Current Decays
Hadron collider signatures of new physics are investigated in which a primary
resonance is produced that decays to a secondary resonance by emitting a
W-boson, with the secondary resonance decaying to two jets. This topology can
arise in supersymmetric theories with R-parity violation where the lightest
supersymmetric particles are either a pair of squarks, or a slepton - sneutrino
pair. The resulting signal can have a cross section consistent with the Wjj
observation reported by the CDF collaboration, while remaining consistent with
earlier constraints. Other observables that can be used to confirm this
scenario include a significant charge asymmetry in the same channel at the LHC.
With strongly interacting resonances such as squarks, pair production
topologies additionally give rise to 4 jet and WW + 4 jet signatures, each with
two equal-mass dijet resonances within the 4 jets.Comment: Note added for recent developments concerning the Wjj final state.
Version to appear in PRD. 21 pages, 12 figure
Partial Enumerative Sphere Shaping
The dependency between the Gaussianity of the input distribution for the
additive white Gaussian noise (AWGN) channel and the gap-to-capacity is
discussed. We show that a set of particular approximations to the
Maxwell-Boltzmann (MB) distribution virtually closes most of the shaping gap.
We relate these symbol-level distributions to bit-level distributions, and
demonstrate that they correspond to keeping some of the amplitude bit-levels
uniform and independent of the others. Then we propose partial enumerative
sphere shaping (P-ESS) to realize such distributions in the probabilistic
amplitude shaping (PAS) framework. Simulations over the AWGN channel exhibit
that shaping 2 amplitude bits of 16-ASK have almost the same performance as
shaping 3 bits, which is 1.3 dB more power-efficient than uniform signaling at
a rate of 3 bit/symbol. In this way, required storage and computational
complexity of shaping are reduced by factors of 6 and 3, respectively.Comment: 6 pages, 6 figure
Probabilistic Shaping for Finite Blocklengths: Distribution Matching and Sphere Shaping
In this paper, we provide for the first time a systematic comparison of
distribution matching (DM) and sphere shaping (SpSh) algorithms for short
blocklength probabilistic amplitude shaping. For asymptotically large
blocklengths, constant composition distribution matching (CCDM) is known to
generate the target capacity-achieving distribution. As the blocklength
decreases, however, the resulting rate loss diminishes the efficiency of CCDM.
We claim that for such short blocklengths and over the additive white Gaussian
channel (AWGN), the objective of shaping should be reformulated as obtaining
the most energy-efficient signal space for a given rate (rather than matching
distributions). In light of this interpretation, multiset-partition DM (MPDM),
enumerative sphere shaping (ESS) and shell mapping (SM), are reviewed as
energy-efficient shaping techniques. Numerical results show that MPDM and SpSh
have smaller rate losses than CCDM. SpSh--whose sole objective is to maximize
the energy efficiency--is shown to have the minimum rate loss amongst all. We
provide simulation results of the end-to-end decoding performance showing that
up to 1 dB improvement in power efficiency over uniform signaling can be
obtained with MPDM and SpSh at blocklengths around 200. Finally, we present a
discussion on the complexity of these algorithms from the perspective of
latency, storage and computations.Comment: 18 pages, 10 figure
Asymptotically distribution-free goodness-of-fit testing for tail copulas
Let be an i.i.d. sample from a bivariate
distribution function that lies in the max-domain of attraction of an extreme
value distribution. The asymptotic joint distribution of the standardized
component-wise maxima and is then
characterized by the marginal extreme value indices and the tail copula . We
propose a procedure for constructing asymptotically distribution-free
goodness-of-fit tests for the tail copula . The procedure is based on a
transformation of a suitable empirical process derived from a semi-parametric
estimator of . The transformed empirical process converges weakly to a
standard Wiener process, paving the way for a multitude of asymptotically
distribution-free goodness-of-fit tests. We also extend our results to the
-variate () case. In a simulation study we show that the limit theorems
provide good approximations for finite samples and that tests based on the
transformed empirical process have high power.Comment: Published at http://dx.doi.org/10.1214/14-AOS1304 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …