8,205 research outputs found
Discriminative Segmental Cascades for Feature-Rich Phone Recognition
Discriminative segmental models, such as segmental conditional random fields
(SCRFs) and segmental structured support vector machines (SSVMs), have had
success in speech recognition via both lattice rescoring and first-pass
decoding. However, such models suffer from slow decoding, hampering the use of
computationally expensive features, such as segment neural networks or other
high-order features. A typical solution is to use approximate decoding, either
by beam pruning in a single pass or by beam pruning to generate a lattice
followed by a second pass. In this work, we study discriminative segmental
models trained with a hinge loss (i.e., segmental structured SVMs). We show
that beam search is not suitable for learning rescoring models in this
approach, though it gives good approximate decoding performance when the model
is already well-trained. Instead, we consider an approach inspired by
structured prediction cascades, which use max-marginal pruning to generate
lattices. We obtain a high-accuracy phonetic recognition system with several
expensive feature types: a segment neural network, a second-order language
model, and second-order phone boundary features
Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations
Deep CCA is a recently proposed deep neural network extension to the
traditional canonical correlation analysis (CCA), and has been successful for
multi-view representation learning in several domains. However, stochastic
optimization of the deep CCA objective is not straightforward, because it does
not decouple over training examples. Previous optimizers for deep CCA are
either batch-based algorithms or stochastic optimization using large
minibatches, which can have high memory consumption. In this paper, we tackle
the problem of stochastic optimization for deep CCA with small minibatches,
based on an iterative solution to the CCA objective, and show that we can
achieve as good performance as previous optimizers and thus alleviate the
memory requirement.Comment: in 2015 Annual Allerton Conference on Communication, Control and
Computin
Revisiting the morphology and phylogeny of Lactifluus with three new lineages from southern China
As a recent group mainly defined by molecular data the genus Lactifluus is in need of further study to provide insight into the morphological and molecular variation within the genus, species limits and relationships. Phylogenetic analyses of nuc rDNA ITS1-5.8S-ITS2 (ITS), D1 and D2 domains of nuc 28S rDNA (28S), and part of the second largest subunit of the RNA polymerase II (rpb2) (6-7 region) sequences of 28 samples from southern China revealed three new lineages of Lactifluus. Two of them are nested in a major clade that includes the type of Lactifluus and here is treated as two new sections: L. sect. Ambicystidiati and L. sect. Tenuicystidiati. Lactifluus ambicystidiatus, described here as a new species (= sect. Ambicystidiati), has both lamprocystidia and macrocystidia in the hymenium, a unique combination of features within Russulaceae. Furthermore, only remnants of lactiferous hyphae are present in L. ambicystidiatus and our results suggest that the ability to form a lactiferous system has been lost in this lineage. Lactifluus sect. Tenuicystidiati forms a strongly supported monophyletic group as a sister lineage to L. sect. Lactifluus. We recognize it based on the thin-walled macrocystidia and smaller ellipsoid spores with an incomplete reticulum compared with L. sect. Lactifluus. The former placement of L. tenuicystidiatus in the African L. sect. Pseudogymnocarpi is not supported. Using genealogical concordance we recognize five phylogenetic species within L. sect. Tenuicystidiati and describe two of these as new, L. subpruinosus and L. tropicosinicus. The third lineage, represented by L. leoninus, forms a sister group to L. subg. Lactariopsis sensu stricto. The three lineages provide further evidence for morphological features in Lactifluus being homoplasious. Some sections and species complexes are likely to be composed of more species and merit further investigations. Subtropical-tropical Asia is likely a key region for additional sampling
Association Signals Unveiled by a Comprehensive Gene Set Enrichment Analysis of Dental Caries Genome-Wide Association Studies
Gene set-based analysis of genome-wide association study (GWAS) data has recently emerged as a useful approach to examine the joint effects of multiple risk loci in complex human diseases or phenotypes. Dental caries is a common, chronic, and complex disease leading to a decrease in quality of life worldwide. In this study, we applied the approaches of gene set enrichment analysis to a major dental caries GWAS dataset, which consists of 537 cases and 605 controls. Using four complementary gene set analysis methods, we analyzed 1331 Gene Ontology (GO) terms collected from the Molecular Signatures Database (MSigDB). Setting false discovery rate (FDR) threshold as 0.05, we identified 13 significantly associated GO terms. Additionally, 17 terms were further included as marginally associated because they were top ranked by each method, although their FDR is higher than 0.05. In total, we identified 30 promising GO terms, including 'Sphingoid metabolic process,' 'Ubiquitin protein ligase activity,' 'Regulation of cytokine secretion,' and 'Ceramide metabolic process.' These GO terms encompass broad functions that potentially interact and contribute to the oral immune response related to caries development, which have not been reported in the standard single marker based analysis. Collectively, our gene set enrichment analysis provided complementary insights into the molecular mechanisms and polygenic interactions in dental caries, revealing promising association signals that could not be detected through single marker analysis of GWAS data. © 2013 Wang et al
Costly Blackouts? Measuring Productivity and Environmental Effects of Electricity Shortages
In many countries, unreliable inputs, particularly those lacking storage, can significantly limit a firm's productivity. In the case of an increasing frequency of blackouts, a firm may change factor shares in a number of ways. It may decide to self generate electricity, to purchase intermediate goods that it used to produce directly, or to improve its technical efficiency. We examine how industrial firms responded to China's severe power shortages in the early 2000s. Fast-growing demand coupled with regulated electricity prices led to blackouts that varied in degree over location and time. Our data consist of annual observations from 1999 to 2004 for approximately 32,000 energy-intensive, enterprises from all industries. We estimate the losses in productivity due to factor-neutral and factor-biased effects of electricity scarcity. Our results suggest that enterprises re-optimize among factors in response to electricity scarcity by shifting from energy (both electric and non-electric sources) into materials---a shift from "make" to "buy." These effects are strongest for firms in textiles, timber, chemicals, and metals. Contrary to the literature, we do not find evidence of an increase in self generation. Finally, we find that these productivity changes, while costly to firms, led to small reductions in carbon emissions.
Combining isotonic regression and EM algorithm to predict genetic risk under monotonicity constraint
In certain genetic studies, clinicians and genetic counselors are interested
in estimating the cumulative risk of a disease for individuals with and without
a rare deleterious mutation. Estimating the cumulative risk is difficult,
however, when the estimates are based on family history data. Often, the
genetic mutation status in many family members is unknown; instead, only
estimated probabilities of a patient having a certain mutation status are
available. Also, ages of disease-onset are subject to right censoring. Existing
methods to estimate the cumulative risk using such family-based data only
provide estimation at individual time points, and are not guaranteed to be
monotonic or nonnegative. In this paper, we develop a novel method that
combines Expectation-Maximization and isotonic regression to estimate the
cumulative risk across the entire support. Our estimator is monotonic,
satisfies self-consistent estimating equations and has high power in detecting
differences between the cumulative risks of different populations. Application
of our estimator to a Parkinson's disease (PD) study provides the age-at-onset
distribution of PD in PARK2 mutation carriers and noncarriers, and reveals a
significant difference between the distribution in compound heterozygous
carriers compared to noncarriers, but not between heterozygous carriers and
noncarriers.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS730 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …