564 research outputs found
Asynchronous Training of Word Embeddings for Large Text Corpora
Word embeddings are a powerful approach for analyzing language and have been
widely popular in numerous tasks in information retrieval and text mining.
Training embeddings over huge corpora is computationally expensive because the
input is typically sequentially processed and parameters are synchronously
updated. Distributed architectures for asynchronous training that have been
proposed either focus on scaling vocabulary sizes and dimensionality or suffer
from expensive synchronization latencies.
In this paper, we propose a scalable approach to train word embeddings by
partitioning the input space instead in order to scale to massive text corpora
while not sacrificing the performance of the embeddings. Our training procedure
does not involve any parameter synchronization except a final sub-model merge
phase that typically executes in a few minutes. Our distributed training scales
seamlessly to large corpus sizes and we get comparable and sometimes even up to
45% performance improvement in a variety of NLP benchmarks using models trained
by our distributed procedure which requires of the time taken by the
baseline approach. Finally we also show that we are robust to missing words in
sub-models and are able to effectively reconstruct word representations.Comment: This paper contains 9 pages and has been accepted in the WSDM201
V-I characteristics in the vicinity of order-disorder transition in vortex matter
The shape of the V-I characteristics leading to a peak in the differential
resistance r_d=dV/dI in the vicinity of the order-disorder transition in NbSe2
is investigated. r_d is large when measured by dc current. However, for a small
Iac on a dc bias r_d decreases rapidly with frequency, even at a few Hz, and
displays a large out-of-phase signal. In contrast, the ac response increases
with frequency in the absence of dc bias. These surprisingly opposite phenomena
and the peak in r_d are shown to result from a dynamic coexistence of two
vortex matter phases rather than from the commonly assumed plastic depinning.Comment: 12 pages 4 figures. Accepted for publication in PRB rapi
High Prevalence and Genetic Diversity of HCV among HIV-1 Infected People from Various High-Risk Groups in China
BACKGROUND: Co-infection with HIV-1 and HCV is a significant global public health problem and a major consideration for anti-HIV-1 treatment. HCV infection among HIV-1 positive people who are eligible for the newly launched nationwide anti-HIV-1 treatment program in China has not been well characterized. METHODOLOGY: A nationwide survey of HIV-1 positive injection drug uses (IDU), former paid blood donors (FBD), and sexually transmitted cases from multiple provinces including the four most affected provinces in China was conducted. HCV prevalence and genetic diversity were determined. We found that IDU and FBD have extremely high rates of HCV infection (97% and 93%, respectively). Surprisingly, people who acquired HIV-1 through sexual contact also had a higher rate of HCV infection (20%) than the general population. HIV-1 subtype and HCV genotypes were amazingly similar among FBD from multiple provinces stretching from Central to Northeast China. However, although patterns of overland trafficking of heroin and distinct HIV-1 subtypes could be detected among IDU, HCV genotypes of IDU were more diverse and exhibited significant regional differences. CONCLUSION: Emerging HIV-1 and HCV co-infection and possible sexual transmission of HCV in China require urgent prevention measures and should be taken into consideration in the nationwide antiretroviral treatment program
HCV 6a Prevalence in Guangdong Province Had the Origin from Vietnam and Recent Dissemination to Other Regions of China: Phylogeographic Analyses
Recently in China, HCV 6a infection has shown a fast increase among patients and blood donors, possibly due to IDU linked transmission.We recruited 210 drug users in Shanwei city, Guangdong province. Among them, HCV RNA was detected in 150 (71.4%), both E1 and NS5B genes were sequenced in 136, and 6a genotyped in 70. Of the 6a sequences, most were grouped into three clusters while 23% represent emerging strains. For coalescent analysis, additional 6a sequences were determined among 21 blood donors from Vietnam, 22 donors from 12 provinces of China, and 36 IDUs from Liuzhou City in Guangxi Province. Phylogeographic analyses indicated that Vietnam could be the origin of 6a in China. The Guangxi Province, which borders Vietnam, could be the first region to accept 6a for circulation. Migration from Yunnan, which also borders Vietnam, might be equally important, but it was only detected among IDUs in limited regions. From Guangxi, 6a could have further spread to Guangdong, Yunnan, Hainan, and Hubei provinces. However, evidence showed that only in Guangdong has 6a become a local epidemic, making Guangdong the second source region to disseminate 6a to the other 12 provinces. With a rate of 2.737×10⁻³ (95% CI: 1.792×10⁻³ to 3.745×10⁻³), a Bayesian Skyline Plot was portrayed. It revealed an exponential 6a growth during 1994-1998, while before and after 1994-1998 slow 6a growths were maintained. Concurrently, 1994-1998 corresponded to a period when contaminated blood transfusion was common, which caused many people being infected with HIV and HCV, until the Chinese government outlawed the use of paid blood donations in 1998.With an origin from Vietnam, 6a has become a local epidemic in Guangdong Province, where an increasing prevalence has subsequently led to 6a spread to many other regions of China
Electron correlation effects in electron-hole recombination in organic light-emitting diodes
We develop a general theory of electron--hole recombination in organic light
emitting diodes that leads to formation of emissive singlet excitons and
nonemissive triplet excitons. We briefly review other existing theories and
show how our approach is substantively different from these theories. Using an
exact time-dependent approach to the interchain/intermolecular charge-transfer
within a long-range interacting model we find that, (i) the relative yield of
the singlet exciton in polymers is considerably larger than the 25% predicted
from statistical considerations, (ii) the singlet exciton yield increases with
chain length in oligomers, and, (iii) in small molecules containing nitrogen
heteroatoms, the relative yield of the singlet exciton is considerably smaller
and may be even close to 25%. The above results are independent of whether or
not the bond-charge repulsion, X_perp, is included in the interchain part of
the Hamiltonian for the two-chain system. The larger (smaller) yield of the
singlet (triplet) exciton in carbon-based long-chain polymers is a consequence
of both its ionic (covalent) nature and smaller (larger) binding energy. In
nitrogen containing monomers, wavefunctions are closer to the noninteracting
limit, and this decreases (increases) the relative yield of the singlet
(triplet) exciton. Our results are in qualitative agreement with
electroluminescence experiments involving both molecular and polymeric light
emitters. The time-dependent approach developed here for describing
intermolecular charge-transfer processes is completely general and may be
applied to many other such processes.Comment: 19 pages, 11 figure
Naturally occurring mutations in the PA gene are key contributors to increased virulence of pandemic H1N1/09 influenza virus in mice
We examined the molecular basis of virulence of pandemic H1N1/09 influenza viruses by reverse genetics based on two H1N1/09 virus isolates (A/California/04/2009 [CA04] and A/swine/Shandong/731/2009 [SD731]) with contrasting pathogenicities in mice. We found that four amino acid mutations (P224S in the PA protein [PA-P224S], PB2-T588I, NA-V106I, and NS1-I123V) contributed to the lethal phenotype of SD731. In particular, the PA-P224S mutation when combined with PA-A70V in CA04 drastically reduced the virus's 50% mouse lethal dose (LD50), by almost 1,000-fold
Validating module network learning algorithms using simulated data
In recent years, several authors have used probabilistic graphical models to
learn expression modules and their regulatory programs from gene expression
data. Here, we demonstrate the use of the synthetic data generator SynTReN for
the purpose of testing and comparing module network learning algorithms. We
introduce a software package for learning module networks, called LeMoNe, which
incorporates a novel strategy for learning regulatory programs. Novelties
include the use of a bottom-up Bayesian hierarchical clustering to construct
the regulatory programs, and the use of a conditional entropy measure to assign
regulators to the regulation program nodes. Using SynTReN data, we test the
performance of LeMoNe in a completely controlled situation and assess the
effect of the methodological changes we made with respect to an existing
software package, namely Genomica. Additionally, we assess the effect of
various parameters, such as the size of the data set and the amount of noise,
on the inference performance. Overall, application of Genomica and LeMoNe to
simulated data sets gave comparable results. However, LeMoNe offers some
advantages, one of them being that the learning process is considerably faster
for larger data sets. Additionally, we show that the location of the regulators
in the LeMoNe regulation programs and their conditional entropy may be used to
prioritize regulators for functional validation, and that the combination of
the bottom-up clustering strategy with the conditional entropy-based assignment
of regulators improves the handling of missing or hidden regulators.Comment: 13 pages, 6 figures + 2 pages, 2 figures supplementary informatio
Cross-protection against European swine influenza viruses in the context of infection immunity against the 2009 pandemic H1N1 virus : studies in the pig model of influenza
Pigs are natural hosts for the same influenza virus subtypes as humans and are a valuable model for cross-protection studies with influenza. In this study, we have used the pig model to examine the extent of virological protection between a) the 2009 pandemic H1N1 (pH1N1) virus and three different European H1 swine influenza virus (SIV) lineages, and b) these H1 viruses and a European H3N2 SIV. Pigs were inoculated intranasally with representative strains of each virus lineage with 6- and 17-week intervals between H1 inoculations and between H1 and H3 inoculations, respectively. Virus titers in nasal swabs and/or tissues of the respiratory tract were determined after each inoculation. There was substantial though differing cross-protection between pH1N1 and other H1 viruses, which was directly correlated with the relatedness in the viral hemagglutinin (HA) and neuraminidase (NA) proteins. Cross-protection against H3N2 was almost complete in pigs with immunity against H1N2, but was weak in H1N1/pH1N1-immune pigs. In conclusion, infection with a live, wild type influenza virus may offer substantial cross-lineage protection against viruses of the same HA and/or NA subtype. True heterosubtypic protection, in contrast, appears to be minimal in natural influenza virus hosts. We discuss our findings in the light of the zoonotic and pandemic risks of SIVs
Sequence-based prediction for vaccine strain selection and identification of antigenic variability in foot-and-mouth disease virus
Identifying when past exposure to an infectious disease will protect against newly emerging strains is central to understanding the spread and the severity of epidemics, but the prediction of viral cross-protection remains an important unsolved problem. For foot-and-mouth disease virus (FMDV) research in particular, improved methods for predicting this cross-protection are critical for predicting the severity of outbreaks within endemic settings where multiple serotypes and subtypes commonly co-circulate, as well as for deciding whether appropriate vaccine(s) exist and how much they could mitigate the effects of any outbreak. To identify antigenic relationships and their predictors, we used linear mixed effects models to account for variation in pairwise cross-neutralization titres using only viral sequences and structural data. We identified those substitutions in surface-exposed structural proteins that are correlates of loss of cross-reactivity. These allowed prediction of both the best vaccine match for any single virus and the breadth of coverage of new vaccine candidates from their capsid sequences as effectively as or better than serology. Sub-sequences chosen by the model-building process all contained sites that are known epitopes on other serotypes. Furthermore, for the SAT1 serotype, for which epitopes have never previously been identified, we provide strong evidence - by controlling for phylogenetic structure - for the presence of three epitopes across a panel of viruses and quantify the relative significance of some individual residues in determining cross-neutralization. Identifying and quantifying the importance of sites that predict viral strain cross-reactivity not just for single viruses but across entire serotypes can help in the design of vaccines with better targeting and broader coverage. These techniques can be generalized to any infectious agents where cross-reactivity assays have been carried out. As the parameterization uses pre-existing datasets, this approach quickly and cheaply increases both our understanding of antigenic relationships and our power to control disease
- …