39,140 research outputs found
Constructing Financial Sentimental Factors in Chinese Market Using Natural Language Processing
In this paper, we design an integrated algorithm to evaluate the sentiment of
Chinese market. Firstly, with the help of the web browser automation, we crawl
a lot of news and comments from several influential financial websites
automatically. Secondly, we use techniques of Natural Language Processing(NLP)
under Chinese context, including tokenization, Word2vec word embedding and
semantic database WordNet, to compute Senti-scores of these news and comments,
and then construct the sentimental factor. Here, we build a finance-specific
sentimental lexicon so that the sentimental factor can reflect the sentiment of
financial market but not the general sentiments as happiness, sadness, etc.
Thirdly, we also implement an adjustment of the standard sentimental factor.
Our experimental performance shows that there is a significant correlation
between our standard sentimental factor and the Chinese market, and the
adjusted factor is even more informative, having a stronger correlation with
the Chinese market. Therefore, our sentimental factors can be important
references when making investment decisions. Especially during the Chinese
market crash in 2015, the Pearson correlation coefficient of adjusted
sentimental factor with SSE is 0.5844, which suggests that our model can
provide a solid guidance, especially in the special period when the market is
influenced greatly by public sentiment
Entity Suggestion by Example using a Conceptual Taxonomy
Entity suggestion by example (ESbE) refers to a type of entity acquisition
query in which a user provides a set of example entities as the query and
obtains in return some entities that best complete the concept underlying the
given query. Such entity acquisition queries can be useful in many applications
such as related-entity recommendation and query expansion. A number of ESbE
query processing solutions exist in the literature. However, they mostly build
only on the idea of entity co-occurrences either in text or web lists, without
taking advantage of the existence of many web-scale conceptual taxonomies that
consist of hierarchical isA relationships between entity-concept pairs. This
paper provides a query processing method based on the relevance models between
entity sets and concepts. These relevance models can be used to obtain the
fine-grained concepts implied by the query entity set, and the entities that
belong to a given concept, thereby providing the entity suggestions. Extensive
evaluations with real data sets show that the accuracy of the queries processed
with this new method is significantly higher than that of existing solutions
Phonetic-enriched Text Representation for Chinese Sentiment Analysis with Reinforcement Learning
The Chinese pronunciation system offers two characteristics that distinguish
it from other languages: deep phonemic orthography and intonation variations.
We are the first to argue that these two important properties can play a major
role in Chinese sentiment analysis. Particularly, we propose two effective
features to encode phonetic information. Next, we develop a Disambiguate
Intonation for Sentiment Analysis (DISA) network using a reinforcement network.
It functions as disambiguating intonations for each Chinese character (pinyin).
Thus, a precise phonetic representation of Chinese is learned. Furthermore, we
also fuse phonetic features with textual and visual features in order to mimic
the way humans read and understand Chinese text. Experimental results on five
different Chinese sentiment analysis datasets show that the inclusion of
phonetic features significantly and consistently improves the performance of
textual and visual representations and outshines the state-of-the-art Chinese
character level representations
Unsupervised Neural Word Segmentation for Chinese via Segmental Language Modeling
Previous traditional approaches to unsupervised Chinese word segmentation
(CWS) can be roughly classified into discriminative and generative models. The
former uses the carefully designed goodness measures for candidate
segmentation, while the latter focuses on finding the optimal segmentation of
the highest generative probability. However, while there exists a trivial way
to extend the discriminative models into neural version by using neural
language models, those of generative ones are non-trivial. In this paper, we
propose the segmental language models (SLMs) for CWS. Our approach explicitly
focuses on the segmental nature of Chinese, as well as preserves several
properties of language models. In SLMs, a context encoder encodes the previous
context and a segment decoder generates each segment incrementally. As far as
we know, we are the first to propose a neural model for unsupervised CWS and
achieve competitive performance to the state-of-the-art statistical models on
four different datasets from SIGHAN 2005 bakeoff.Comment: To appear in EMNLP 201
Neural Network Models for Implicit Discourse Relation Classification in English and Chinese without Surface Features
Inferring implicit discourse relations in natural language text is the most
difficult subtask in discourse parsing. Surface features achieve good
performance, but they are not readily applicable to other languages without
semantic lexicons. Previous neural models require parses, surface features, or
a small label set to work well. Here, we propose neural network models that are
based on feedforward and long-short term memory architecture without any
surface features. To our surprise, our best configured feedforward architecture
outperforms LSTM-based model in most cases despite thorough tuning. Under
various fine-grained label sets and a cross-linguistic setting, our feedforward
models perform consistently better or at least just as well as systems that
require hand-crafted surface features. Our models present the first neural
Chinese discourse parser in the style of Chinese Discourse Treebank, showing
that our results hold cross-linguistically
Topic Memory Networks for Short Text Classification
Many classification models work poorly on short texts due to data sparsity.
To address this issue, we propose topic memory networks for short text
classification with a novel topic memory mechanism to encode latent topic
representations indicative of class labels. Different from most prior work that
focuses on extending features with external knowledge or pre-trained topics,
our model jointly explores topic inference and text classification with memory
networks in an end-to-end manner. Experimental results on four benchmark
datasets show that our model outperforms state-of-the-art models on short text
classification, meanwhile generates coherent topics.Comment: EMNLP 201
Revisiting Regex Generation for Modeling Industrial Applications by Incorporating Byte Pair Encoder
Regular expression is important for many natural language processing tasks
especially when used to deal with unstructured and semi-structured data. This
work focuses on automatically generating regular expressions and proposes a
novel genetic algorithm to deal with this problem. Different from the methods
which generate regular expressions from character level, we first utilize byte
pair encoder (BPE) to extract some frequent items, which are then used to
construct regular expressions. The fitness function of our genetic algorithm
contains multi objectives and is solved based on evolutionary procedure
including crossover and mutation operation. In the fitness function, we take
the length of generated regular expression, the maximum matching characters and
samples for positive training samples, and the minimum matching characters and
samples for negative training samples into consideration. In addition, to
accelerate the training process, we do exponential decay on the population size
of the genetic algorithm. Our method together with a strong baseline is tested
on 13 kinds of challenging datasets. The results demonstrate the effectiveness
of our method, which outperforms the baseline on 10 kinds of data and achieves
nearly 50 percent improvement on average. By doing exponential decay, the
training speed is approximately 100 times faster than the methods without using
exponential decay. In summary, our method possesses both effectiveness and
efficiency, and can be implemented for the industry application
Structure Regularized Neural Network for Entity Relation Classification for Chinese Literature Text
Relation classification is an important semantic processing task in the field
of natural language processing. In this paper, we propose the task of relation
classification for Chinese literature text. A new dataset of Chinese literature
text is constructed to facilitate the study in this task. We present a novel
model, named Structure Regularized Bidirectional Recurrent Convolutional Neural
Network (SR-BRCNN), to identify the relation between entities. The proposed
model learns relation representations along the shortest dependency path (SDP)
extracted from the structure regularized dependency tree, which has the
benefits of reducing the complexity of the whole model. Experimental results
show that the proposed method significantly improves the F1 score by 10.3, and
outperforms the state-of-the-art approaches on Chinese literature text.Comment: Accepted at NAACL HLT 2018. arXiv admin note: substantial text
overlap with arXiv:1711.0250
Real-time Automatic Word Segmentation for User-generated Text
For readability and possibly for disambiguation, appropriate word
segmentation is recommended for written text. In this paper, we propose a
real-time assistive technology that utilizes an automatic segmentation. The
language investigated is Korean, a head-final language with various
morpho-syllabic blocks as characters. The training scheme is fully neural
network-based and straightforward. Besides, we show how the proposed system can
be utilized in a web-based real-time revision for a user-generated text. With
qualitative and quantitative comparison with widely used text processing
toolkits, we show the reliability of the proposed system and how it fits with
conversation-style and non-canonical texts. The demonstration is available
online.Comment: 8 pages, 4 figures, 1 tabl
Generalize Symbolic Knowledge With Neural Rule Engine
As neural networks have dominated the state-of-the-art results in a wide
range of NLP tasks, it attracts considerable attention to improve the
performance of neural models by integrating symbolic knowledge. Different from
existing works, this paper investigates the combination of these two powerful
paradigms from the knowledge-driven side. We propose Neural Rule Engine (NRE),
which can learn knowledge explicitly from logic rules and then generalize them
implicitly with neural networks. NRE is implemented with neural module networks
in which each module represents an action of a logic rule. The experiments show
that NRE could greatly improve the generalization abilities of logic rules with
a significant increase in recall. Meanwhile, the precision is still maintained
at a high level
- …