Search CORE

654 research outputs found

2kenize: Tying Subword Sequences for Chinese Script Conversion

Author: A Pranav
Augenstein Isabelle
Publication venue
Publication date: 01/01/2020
Field of study

Simplified Chinese to Traditional Chinese character conversion is a common preprocessing step in Chinese NLP. Despite this, current approaches have poor performance because they do not take into account that a simplified Chinese character can correspond to multiple traditional characters. Here, we propose a model that can disambiguate between mappings and convert between the two scripts. The model is based on subword segmentation, two language models, as well as a method for mapping between subword sequences. We further construct benchmark datasets for topic classification and script conversion. Our proposed method outperforms previous Chinese Character conversion approaches by 6 points in accuracy. These results are further confirmed in a downstream application, where 2kenize is used to convert pretraining dataset for topic classification. An error analysis reveals that our method's particular strengths are in dealing with code-mixing and named entities.Comment: Accepted to ACL 202

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Recommended from our members

Control Systems and Robotics Outreach to Middle-school Girls: Approach, Results, and Suggestions

Author: Bhounsule Pranav A.
Nugruho Sebastian
Taha Ahmad
Publication venue: American Society for Engineering Education
Publication date: 01/04/2019
Field of study

We conducted a three-day outreach camp focused on control systems and robotics for 8th grade girls from economically disadvantaged families. The overall objective of the camp was motivating the young girls to consider pursuing a career in engineering and sciences. The main focus of the camp were hands-on labs using LEGO Mindstorms EV3 kit. Students learned about programming, sensors, motors and put their skills to test by creating a mobile robot that took part in three contests: car racing, line following, and parallel parking. A pre- and post-camp survey indicated that although program did not predominantly change the girls’ excitement towards careers in engineering and sciences, it increased the girls’ knowledge and excitement towards robotics and control systems. Our results indicate that short camps help kindle the interests of young girls, but are not able to sway them to take on engineering/science careers. In the latter case, we hypothesize that long-term STEM-based programs (e.g., a quarter or year-long robotics course) might be more effective.Cockrell School of Engineerin

Texas ScholarWorks

Alignment Analysis of Sequential Segmentation of Lexicons to Improve Automatic Cognate Detection

Author: A Pranav
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

Ranking functions in information retrieval are often used in search engines to recommend the relevant answers to the query. This paper makes use of this notion of information retrieval and applies onto the problem domain of cognate detection. The main contributions of this paper are: (1) positional segmentation, which incorporates the sequential notion; (2) graphical error modelling, which deduces the transformations. The current research work focuses on classification problem; which is distinguishing whether a pair of words are cognates. This paper focuses on a harder problem, whether we could predict a possible cognate from the given input. Our study shows that when language modelling smoothing methods are applied as the retrieval functions and used in conjunction with positional segmentation and error modelling gives better results than competing baselines, in both classification and prediction of cognates. Source code is at: https://github.com/pranav-ust/cognatesComment: Published at ACL-SRW 201

arXiv.org e-Print Archive

Crossref

Modeling highly pathogenic avian influenza transmission in wild birds and poultry in West Bengal, India.

Author: Aly Sharif S
Bunn David A
Pande Satish A
Pandit Pranav S
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

Wild birds are suspected to have played a role in highly pathogenic avian influenza (HPAI) H5N1 outbreaks in West Bengal. Cluster analysis showed that H5N1 was introduced in West Bengal at least 3 times between 2008 and 2010. We simulated the introduction of H5N1 by wild birds and their contact with poultry through a stochastic continuous-time mathematical model. Results showed that reducing contact between wild birds and domestic poultry, and increasing the culling rate of infected domestic poultry communities will reduce the probability of outbreaks. Poultry communities that shared habitat with wild birds or those indistricts with previous outbreaks were more likely to suffer an outbreak. These results indicate that wild birds can introduce HPAI to domestic poultry and that limiting their contact at shared habitats together with swift culling of infected domestic poultry can greatly reduce the likelihood of HPAI outbreaks

PubMed Central

eScholarship - University of California

Hierarchical Learning in Euclidean Neural Networks

Author: Rackers Joshua A.
Rao Pranav
Publication venue
Publication date: 10/10/2022
Field of study

Equivariant machine learning methods have shown wide success at 3D learning applications in recent years. These models explicitly build in the reflection, translation and rotation symmetries of Euclidean space and have facilitated large advances in accuracy and data efficiency for a range of applications in the physical sciences. An outstanding question for equivariant models is why they achieve such larger-than-expected advances in these applications. To probe this question, we examine the role of higher order (non-scalar) features in Euclidean Neural Networks (\texttt{e3nn}). We focus on the previously studied application of \texttt{e3nn} to the problem of electron density prediction, which allows for a variety of non-scalar outputs, and examine whether the nature of the output (scalar

l=0

, vector

l=1

, or higher order

l>1

) is relevant to the effectiveness of non-scalar hidden features in the network. Further, we examine the behavior of non-scalar features throughout training, finding a natural hierarchy of features by

l

, reminiscent of a multipole expansion. We aim for our work to ultimately inform design principles and choices of domain applications for {\tt e3nn} networks.Comment: 9 pages, 3 figure

arXiv.org e-Print Archive

Sphingomyelin and GM1 Influence Huntingtin Binding to, Disruption of, and Aggregation on Lipid Membranes

Author: Campbell Warren A.
Chaibva Maxmore
Frey Shelli L.
Gao Xiang
Jain Pranav
Legleiter Justin
Publication venue: The Cupola: Scholarship at Gettysburg College
Publication date: 01/01/2018
Field of study

Huntington disease (HD) is an inherited neurodegenerative disease caused by the expansion beyond a critical threshold of a polyglutamine (polyQ) tract near the N-terminus of the huntingtin (htt) protein. Expanded polyQ promotes the formation of a variety of oligomeric and fibrillar aggregates of htt that accumulate into the hallmark proteinaceous inclusion bodies associated with HD. htt is also highly associated with numerous cellular and subcellular membranes that contain a variety of lipids. As lipid homeostasis and metabolism abnormalities are observed in HD patients, we investigated how varying both the sphingomyelin (SM) and ganglioside (GM1) contents modifies the interactions between htt and lipid membranes. SM composition is altered in HD, and GM1 has been shown to have protective effects in animal models of HD. A combination of Langmuir trough monolayer techniques, vesicle permeability and binding assays, and in situ atomic force microscopy (AFM) were used to directly monitor the interaction of a model, synthetic htt peptide and a full-length htt-exon1 recombinant protein with model membranes comprised of total brain lipid extract (TBLE) and varying amounts of exogenously added SM or GM1. The addition of either SM or GM1 decreased htt insertion into the lipid monolayers. However, TBLE vesicles with an increased SM content were more susceptible to htt-induced permeabilization, whereas GM1 had no effect on permeablization. Pure TBLE bilayers and TBLE bilayers enriched with GM1 developed regions of roughened, granular morphologies upon exposure to htt-exon1, but plateau-like domains with a smoother appearance formed in bilayers enriched with SM. Oligomeric aggregates were observed on all bilayer systems regardless of induced morphology. Collectively, these observations suggest that the lipid composition and its subsequent effects on membrane material properties strongly influence htt binding and aggregation on lipid membranes

Directory of Open Access Journals

Gettysburg College

The Research Repository @ WVU (West Virginia University)