654 research outputs found
2kenize: Tying Subword Sequences for Chinese Script Conversion
Simplified Chinese to Traditional Chinese character conversion is a common
preprocessing step in Chinese NLP. Despite this, current approaches have poor
performance because they do not take into account that a simplified Chinese
character can correspond to multiple traditional characters. Here, we propose a
model that can disambiguate between mappings and convert between the two
scripts. The model is based on subword segmentation, two language models, as
well as a method for mapping between subword sequences. We further construct
benchmark datasets for topic classification and script conversion. Our proposed
method outperforms previous Chinese Character conversion approaches by 6 points
in accuracy. These results are further confirmed in a downstream application,
where 2kenize is used to convert pretraining dataset for topic classification.
An error analysis reveals that our method's particular strengths are in dealing
with code-mixing and named entities.Comment: Accepted to ACL 202
Recommended from our members
Control Systems and Robotics Outreach to Middle-school Girls: Approach, Results, and Suggestions
We conducted a three-day outreach camp focused on
control systems and robotics for 8th grade girls from
economically disadvantaged families. The overall objective
of the camp was motivating the young girls to consider
pursuing a career in engineering and sciences. The main
focus of the camp were hands-on labs using LEGO
Mindstorms EV3 kit. Students learned about programming,
sensors, motors and put their skills to test by creating a
mobile robot that took part in three contests: car racing, line
following, and parallel parking. A pre- and post-camp survey
indicated that although program did not predominantly
change the girls’ excitement towards careers in engineering
and sciences, it increased the girls’ knowledge and
excitement towards robotics and control systems. Our results
indicate that short camps help kindle the interests of young
girls, but are not able to sway them to take on
engineering/science careers. In the latter case, we
hypothesize that long-term STEM-based programs (e.g., a
quarter or year-long robotics course) might be more
effective.Cockrell School of Engineerin
Alignment Analysis of Sequential Segmentation of Lexicons to Improve Automatic Cognate Detection
Ranking functions in information retrieval are often used in search engines
to recommend the relevant answers to the query. This paper makes use of this
notion of information retrieval and applies onto the problem domain of cognate
detection. The main contributions of this paper are: (1) positional
segmentation, which incorporates the sequential notion; (2) graphical error
modelling, which deduces the transformations. The current research work focuses
on classification problem; which is distinguishing whether a pair of words are
cognates. This paper focuses on a harder problem, whether we could predict a
possible cognate from the given input. Our study shows that when language
modelling smoothing methods are applied as the retrieval functions and used in
conjunction with positional segmentation and error modelling gives better
results than competing baselines, in both classification and prediction of
cognates.
Source code is at: https://github.com/pranav-ust/cognatesComment: Published at ACL-SRW 201
Modeling highly pathogenic avian influenza transmission in wild birds and poultry in West Bengal, India.
Wild birds are suspected to have played a role in highly pathogenic avian influenza (HPAI) H5N1 outbreaks in West Bengal. Cluster analysis showed that H5N1 was introduced in West Bengal at least 3 times between 2008 and 2010. We simulated the introduction of H5N1 by wild birds and their contact with poultry through a stochastic continuous-time mathematical model. Results showed that reducing contact between wild birds and domestic poultry, and increasing the culling rate of infected domestic poultry communities will reduce the probability of outbreaks. Poultry communities that shared habitat with wild birds or those indistricts with previous outbreaks were more likely to suffer an outbreak. These results indicate that wild birds can introduce HPAI to domestic poultry and that limiting their contact at shared habitats together with swift culling of infected domestic poultry can greatly reduce the likelihood of HPAI outbreaks
Hierarchical Learning in Euclidean Neural Networks
Equivariant machine learning methods have shown wide success at 3D learning
applications in recent years. These models explicitly build in the reflection,
translation and rotation symmetries of Euclidean space and have facilitated
large advances in accuracy and data efficiency for a range of applications in
the physical sciences. An outstanding question for equivariant models is why
they achieve such larger-than-expected advances in these applications. To probe
this question, we examine the role of higher order (non-scalar) features in
Euclidean Neural Networks (\texttt{e3nn}). We focus on the previously studied
application of \texttt{e3nn} to the problem of electron density prediction,
which allows for a variety of non-scalar outputs, and examine whether the
nature of the output (scalar , vector , or higher order ) is
relevant to the effectiveness of non-scalar hidden features in the network.
Further, we examine the behavior of non-scalar features throughout training,
finding a natural hierarchy of features by , reminiscent of a multipole
expansion. We aim for our work to ultimately inform design principles and
choices of domain applications for {\tt e3nn} networks.Comment: 9 pages, 3 figure
Sphingomyelin and GM1 Influence Huntingtin Binding to, Disruption of, and Aggregation on Lipid Membranes
Huntington disease (HD) is an inherited neurodegenerative disease caused by the expansion beyond a critical threshold of a polyglutamine (polyQ) tract near the N-terminus of the huntingtin (htt) protein. Expanded polyQ promotes the formation of a variety of oligomeric and fibrillar aggregates of htt that accumulate into the hallmark proteinaceous inclusion bodies associated with HD. htt is also highly associated with numerous cellular and subcellular membranes that contain a variety of lipids. As lipid homeostasis and metabolism abnormalities are observed in HD patients, we investigated how varying both the sphingomyelin (SM) and ganglioside (GM1) contents modifies the interactions between htt and lipid membranes. SM composition is altered in HD, and GM1 has been shown to have protective effects in animal models of HD. A combination of Langmuir trough monolayer techniques, vesicle permeability and binding assays, and in situ atomic force microscopy (AFM) were used to directly monitor the interaction of a model, synthetic htt peptide and a full-length htt-exon1 recombinant protein with model membranes comprised of total brain lipid extract (TBLE) and varying amounts of exogenously added SM or GM1. The addition of either SM or GM1 decreased htt insertion into the lipid monolayers. However, TBLE vesicles with an increased SM content were more susceptible to htt-induced permeabilization, whereas GM1 had no effect on permeablization. Pure TBLE bilayers and TBLE bilayers enriched with GM1 developed regions of roughened, granular morphologies upon exposure to htt-exon1, but plateau-like domains with a smoother appearance formed in bilayers enriched with SM. Oligomeric aggregates were observed on all bilayer systems regardless of induced morphology. Collectively, these observations suggest that the lipid composition and its subsequent effects on membrane material properties strongly influence htt binding and aggregation on lipid membranes
- …