2,366 research outputs found
Resolving Regular Polysemy in Named Entities
Word sense disambiguation primarily addresses the lexical ambiguity of common
words based on a predefined sense inventory. Conversely, proper names are
usually considered to denote an ad-hoc real-world referent. Once the reference
is decided, the ambiguity is purportedly resolved. However, proper names also
exhibit ambiguities through appellativization, i.e., they act like common words
and may denote different aspects of their referents. We proposed to address the
ambiguities of proper names through the light of regular polysemy, which we
formalized as dot objects. This paper introduces a combined word sense
disambiguation (WSD) model for disambiguating common words against Chinese
Wordnet (CWN) and proper names as dot objects. The model leverages the
flexibility of a gloss-based model architecture, which takes advantage of the
glosses and example sentences of CWN. We show that the model achieves
competitive results on both common and proper nouns, even on a relatively
sparse sense dataset. Aside from being a performant WSD tool, the model further
facilitates the future development of the lexical resource
PDA: Pooled DNA analyzer
BACKGROUND: Association mapping using abundant single nucleotide polymorphisms is a powerful tool for identifying disease susceptibility genes for complex traits and exploring possible genetic diversity. Genotyping large numbers of SNPs individually is performed routinely but is cost prohibitive for large-scale genetic studies. DNA pooling is a reliable and cost-saving alternative genotyping method. However, no software has been developed for complete pooled-DNA analyses, including data standardization, allele frequency estimation, and single/multipoint DNA pooling association tests. This motivated the development of the software, 'PDA' (Pooled DNA Analyzer), to analyze pooled DNA data. RESULTS: We develop the software, PDA, for the analysis of pooled-DNA data. PDA is originally implemented with the MATLAB(® )language, but it can also be executed on a Windows system without installing the MATLAB(®). PDA provides estimates of the coefficient of preferential amplification and allele frequency. PDA considers an extended single-point association test, which can compare allele frequencies between two DNA pools constructed under different experimental conditions. Moreover, PDA also provides novel chromosome-wide multipoint association tests based on p-value combinations and a sliding-window concept. This new multipoint testing procedure overcomes a computational bottleneck of conventional haplotype-oriented multipoint methods in DNA pooling analyses and can handle data sets having a large pool size and/or large numbers of polymorphic markers. All of the PDA functions are illustrated in the four bona fide examples. CONCLUSION: PDA is simple to operate and does not require that users have a strong statistical background. The software is available at
Lexical Retrieval Hypothesis in Multimodal Context
Multimodal corpora have become an essential language resource for language
science and grounded natural language processing (NLP) systems due to the
growing need to understand and interpret human communication across various
channels. In this paper, we first present our efforts in building the first
Multimodal Corpus for Languages in Taiwan (MultiMoco). Based on the corpus, we
conduct a case study investigating the Lexical Retrieval Hypothesis (LRH),
specifically examining whether the hand gestures co-occurring with speech
constants facilitate lexical retrieval or serve other discourse functions. With
detailed annotations on eight parliamentary interpellations in Taiwan Mandarin,
we explore the co-occurrence between speech constants and non-verbal features
(i.e., head movement, face movement, hand gesture, and function of hand
gesture). Our findings suggest that while hand gestures do serve as
facilitators for lexical retrieval in some cases, they also serve the purpose
of information emphasis. This study highlights the potential of the MultiMoco
Corpus to provide an important resource for in-depth analysis and further
research in multimodal communication studies
Exploring Affordance and Situated Meaning in Image Captions: A Multimodal Analysis
This paper explores the grounding issue regarding multimodal semantic
representation from a computational cognitive-linguistic view. We annotate
images from the Flickr30k dataset with five perceptual properties: Affordance,
Perceptual Salience, Object Number, Gaze Cueing, and Ecological Niche
Association (ENA), and examine their association with textual elements in the
image captions. Our findings reveal that images with Gibsonian affordance show
a higher frequency of captions containing 'holding-verbs' and 'container-nouns'
compared to images displaying telic affordance. Perceptual Salience, Object
Number, and ENA are also associated with the choice of linguistic expressions.
Our study demonstrates that comprehensive understanding of objects or events
requires cognitive attention, semantic nuances in language, and integration
across multiple modalities. We highlight the vital importance of situated
meaning and affordance grounding in natural language understanding, with the
potential to advance human-like interpretation in various scenarios.Comment: 10 pages, 9 figure
- …