23 research outputs found
PersoNER: Persian named-entity recognition
© 1963-2018 ACL. Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present and provide ArmanPerosNERCorpus, the first manually-annotated Persian NER corpus. Then, we introduce PersoNER, an NER pipeline for Persian that leverages a word embedding and a sequential max-margin classifier. The experimental results show that the proposed approach is capable of achieving interesting MUC7 and CoNNL scores while outperforming two alternatives based on a CRF and a recurrent neural network
Recommended from our members
Integrating Machine Learning Into Language Documentation and Description
At least 40% of the world’s 7000+ languages are believed to be in danger of disappearing from human use by the end of this century. Many languages will disappear with almost no record of their existence because efforts to document and describe these languages are encountering an “annotation bottleneck” at early stages of analysis and annotation. Current annotation methods are too slow and expensive to counteract the pace of language endangerment and loss. Annotation could be sped and improved by machine learning. However, state-of-the-art supervised machine learning depends heavily on large amounts of annotated data.
This dissertation explores how to train supervised machine learning systems for morphological analysis during language documentation and description. The systems are applied to nine languages. The research investigates ways that linguists and NLP scientists may want to adjust their expectations and workflows so that both can achieve optimal results with endangered data.
New methods for tasks in morphological analysis are explored. First, various approaches to automating morpheme segmentation and glossing are compared. Second, a new task is presented for learning morphological paradigms and automatically generating new morphological resources: IGT-to-paradigms (IGT2P). Third, the impact of POS tags on segmentation, glossing, and paradigm induction is examined, showing that the presence or absence of POS tags does not have a significant bearing on the performance of machine learning systems. The results indicate that Natural Language Processing (NLP) systems could be successfully integrated into the documentary and descriptive workflow. At the same time, the relatively high accuracy achieved from noisy field data with little or no additional human annotation hints that NLP may benefit from limited documentary linguistic data which may be the only or largest linguistically annotated resource available for some languages.</p
Deep Architectures for Visual Recognition and Description
In recent times, digital media contents are inherently of multimedia type, consisting of the form text, audio, image and video. Several of the outstanding computer Vision (CV) problems are being successfully solved with the help of modern Machine Learning (ML) techniques. Plenty of research work has already been carried out in the field of Automatic Image Annotation (AIA), Image Captioning and Video Tagging. Video Captioning, i.e., automatic description generation from digital video, however, is a different and complex problem altogether. This study compares various existing video captioning approaches available today and attempts their classification and analysis based on different parameters, viz., type of captioning methods (generation/retrieval), type of learning models employed, the desired output description length generated, etc. This dissertation also attempts to critically analyze the existing benchmark datasets used in various video captioning models and the evaluation metrics for assessing the final quality of the resultant video descriptions generated. A detailed study of important existing models, highlighting their comparative advantages as well as disadvantages are also included.
In this study a novel approach for video captioning on the Microsoft Video Description (MSVD) dataset and Microsoft Video-to-Text (MSR-VTT) dataset is proposed using supervised learning techniques to train a deep combinational framework, for achieving better quality video captioning via predicting semantic tags. We develop simple shallow CNN (2D and 3D) as feature extractors, Deep Neural Networks (DNNs and Bidirectional LSTMs (BiLSTMs) as tag prediction models and Recurrent Neural Networks (RNNs) (LSTM) model as the language model. The aim of the work was to provide an alternative narrative to generating captions from videos via semantic tag predictions and deploy simpler shallower deep model architectures with lower memory requirements as solution so that it is not very memory extensive and the developed models prove to be stable and viable options when the scale of the data is increased.
This study also successfully employed deep architectures like the Convolutional Neural Network (CNN) for speeding up automation process of hand gesture recognition and classification of the sign languages of the Indian classical dance form, ‘Bharatnatyam’. This hand gesture classification is primarily aimed at 1) building a novel dataset of 2D single hand gestures belonging to 27 classes that were collected from (i) Google search engine (Google images), (ii) YouTube videos (dynamic and with background considered) and (iii) professional artists under staged environment constraints (plain backgrounds). 2) exploring the effectiveness of CNNs for identifying and classifying the single hand gestures by optimizing the hyperparameters, and 3) evaluating the impacts of transfer learning and double transfer learning, which is a novel concept explored for achieving higher classification accuracy
African linguistics on the prairie
African Linguistics on the Prairie features select revised peer-reviewed papers from the 45th Annual Conference on African Linguistics, held at the University of Kansas. The articles in this volume reflect the enormous diversity of African languages, as they focus on languages from all of the major African language phyla. The articles here also reflect the many different research perspectives that frame the work of linguists in the Association for Contemporary African Linguistics. The diversity of views presented in this volume are thus indicative of the vitality of current African linguistics research. The work presented in this volume represents both descriptive and theoretical methodologies and covers fields ranging from phonetics, phonology, morphology, typology, syntax, and semantics to sociolinguistics, discourse analysis, language acquisition, computational linguistics and beyond. This broad scope and the quality of the articles contained within holds out the promise of continued advancement in linguistic research on African languages
Selected papers from the 45th Annual Conference on African Linguistics
African Linguistics on the Prairie features select revised peer-reviewed papers from the 45th Annual Conference on African Linguistics, held at the University of Kansas. The articles in this volume reflect the enormous diversity of African languages, as they focus on languages from all of the major African language phyla. The articles here also reflect the many different research perspectives that frame the work of linguists in the Association for Contemporary African Linguistics. The diversity of views presented in this volume are thus indicative of the vitality of current African linguistics research. The work presented in this volume represents both descriptive and theoretical methodologies and covers fields ranging from phonetics, phonology, morphology, typology, syntax, and semantics to sociolinguistics, discourse analysis, language acquisition, computational linguistics and beyond. This broad scope and the quality of the articles contained within holds out the promise of continued advancement in linguistic research on African languages
A Survey on Semantic Processing Techniques
Semantic processing is a fundamental research domain in computational
linguistics. In the era of powerful pre-trained language models and large
language models, the advancement of research in this domain appears to be
decelerating. However, the study of semantics is multi-dimensional in
linguistics. The research depth and breadth of computational semantic
processing can be largely improved with new technologies. In this survey, we
analyzed five semantic processing tasks, e.g., word sense disambiguation,
anaphora resolution, named entity recognition, concept extraction, and
subjectivity detection. We study relevant theoretical research in these fields,
advanced methods, and downstream applications. We connect the surveyed tasks
with downstream applications because this may inspire future scholars to fuse
these low-level semantic processing tasks with high-level natural language
processing tasks. The review of theoretical research may also inspire new tasks
and technologies in the semantic processing domain. Finally, we compare the
different semantic processing techniques and summarize their technical trends,
application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN
1566-2535. The equal contribution mark is missed in the published version due
to the publication policies. Please contact Prof. Erik Cambria for detail