372,784 research outputs found
Improved Chinese Language Processing for an Open Source Search Engine
Natural Language Processing (NLP) is the process of computers analyzing on human languages. There are also many areas in NLP. Some of the areas include speech recognition, natural language understanding, and natural language generation.
Information retrieval and natural language processing for Asians languages has its own unique set of challenges not present for Indo-European languages. Some of these are text segmentation, named entity recognition in unsegmented text, and part of speech tagging. In this report, we describe our implementation of and experiments with improving the Chinese language processing sub-component of an open source search engine, Yioop. In particular, we rewrote and improved the following sub-systems of Yioop to try to make them as state-of-the-art as possible: Chinese text segmentation, Part-of-speech (POS) tagging, Named Entity Recognition (NER), and Question and Answering System.
Compared to the previous system we had a 9% improvement on Chinese words Segmentation accuracy. We built POS tagging with 89% accuracy. And We implement NER System with 76% accuracy
Do Multi-Sense Embeddings Improve Natural Language Understanding?
Learning a distinct representation for each sense of an ambiguous word could
lead to more powerful and fine-grained models of vector-space representations.
Yet while `multi-sense' methods have been proposed and tested on artificial
word-similarity tasks, we don't know if they improve real natural language
understanding tasks. In this paper we introduce a multi-sense embedding model
based on Chinese Restaurant Processes that achieves state of the art
performance on matching human word similarity judgments, and propose a
pipelined architecture for incorporating multi-sense embeddings into language
understanding.
We then test the performance of our model on part-of-speech tagging, named
entity recognition, sentiment analysis, semantic relation identification and
semantic relatedness, controlling for embedding dimensionality. We find that
multi-sense embeddings do improve performance on some tasks (part-of-speech
tagging, semantic relation identification, semantic relatedness) but not on
others (named entity recognition, various forms of sentiment analysis). We
discuss how these differences may be caused by the different role of word sense
information in each of the tasks. The results highlight the importance of
testing embedding models in real applications
Convo: What does conversational programming need? An exploration of machine learning interface design
Vast improvements in natural language understanding and speech recognition
have paved the way for conversational interaction with computers. While
conversational agents have often been used for short goal-oriented dialog, we
know little about agents for developing computer programs. To explore the
utility of natural language for programming, we conducted a study (=45)
comparing different input methods to a conversational programming system we
developed. Participants completed novice and advanced tasks using voice-based,
text-based, and voice-or-text-based systems. We found that users appreciated
aspects of each system (e.g., voice-input efficiency, text-input precision) and
that novice users were more optimistic about programming using voice-input than
advanced users. Our results show that future conversational programming tools
should be tailored to users' programming experience and allow users to choose
their preferred input mode. To reduce cognitive load, future interfaces can
incorporate visualizations and possess custom natural language understanding
and speech recognition models for programming.Comment: 9 pages, 7 figures, submitted to VL/HCC 2020, for associated user
study video: https://youtu.be/TC5P3OO5ex
- …