39,140 research outputs found

    Constructing Financial Sentimental Factors in Chinese Market Using Natural Language Processing

    Full text link
    In this paper, we design an integrated algorithm to evaluate the sentiment of Chinese market. Firstly, with the help of the web browser automation, we crawl a lot of news and comments from several influential financial websites automatically. Secondly, we use techniques of Natural Language Processing(NLP) under Chinese context, including tokenization, Word2vec word embedding and semantic database WordNet, to compute Senti-scores of these news and comments, and then construct the sentimental factor. Here, we build a finance-specific sentimental lexicon so that the sentimental factor can reflect the sentiment of financial market but not the general sentiments as happiness, sadness, etc. Thirdly, we also implement an adjustment of the standard sentimental factor. Our experimental performance shows that there is a significant correlation between our standard sentimental factor and the Chinese market, and the adjusted factor is even more informative, having a stronger correlation with the Chinese market. Therefore, our sentimental factors can be important references when making investment decisions. Especially during the Chinese market crash in 2015, the Pearson correlation coefficient of adjusted sentimental factor with SSE is 0.5844, which suggests that our model can provide a solid guidance, especially in the special period when the market is influenced greatly by public sentiment

    Entity Suggestion by Example using a Conceptual Taxonomy

    Full text link
    Entity suggestion by example (ESbE) refers to a type of entity acquisition query in which a user provides a set of example entities as the query and obtains in return some entities that best complete the concept underlying the given query. Such entity acquisition queries can be useful in many applications such as related-entity recommendation and query expansion. A number of ESbE query processing solutions exist in the literature. However, they mostly build only on the idea of entity co-occurrences either in text or web lists, without taking advantage of the existence of many web-scale conceptual taxonomies that consist of hierarchical isA relationships between entity-concept pairs. This paper provides a query processing method based on the relevance models between entity sets and concepts. These relevance models can be used to obtain the fine-grained concepts implied by the query entity set, and the entities that belong to a given concept, thereby providing the entity suggestions. Extensive evaluations with real data sets show that the accuracy of the queries processed with this new method is significantly higher than that of existing solutions

    Phonetic-enriched Text Representation for Chinese Sentiment Analysis with Reinforcement Learning

    Full text link
    The Chinese pronunciation system offers two characteristics that distinguish it from other languages: deep phonemic orthography and intonation variations. We are the first to argue that these two important properties can play a major role in Chinese sentiment analysis. Particularly, we propose two effective features to encode phonetic information. Next, we develop a Disambiguate Intonation for Sentiment Analysis (DISA) network using a reinforcement network. It functions as disambiguating intonations for each Chinese character (pinyin). Thus, a precise phonetic representation of Chinese is learned. Furthermore, we also fuse phonetic features with textual and visual features in order to mimic the way humans read and understand Chinese text. Experimental results on five different Chinese sentiment analysis datasets show that the inclusion of phonetic features significantly and consistently improves the performance of textual and visual representations and outshines the state-of-the-art Chinese character level representations

    Unsupervised Neural Word Segmentation for Chinese via Segmental Language Modeling

    Full text link
    Previous traditional approaches to unsupervised Chinese word segmentation (CWS) can be roughly classified into discriminative and generative models. The former uses the carefully designed goodness measures for candidate segmentation, while the latter focuses on finding the optimal segmentation of the highest generative probability. However, while there exists a trivial way to extend the discriminative models into neural version by using neural language models, those of generative ones are non-trivial. In this paper, we propose the segmental language models (SLMs) for CWS. Our approach explicitly focuses on the segmental nature of Chinese, as well as preserves several properties of language models. In SLMs, a context encoder encodes the previous context and a segment decoder generates each segment incrementally. As far as we know, we are the first to propose a neural model for unsupervised CWS and achieve competitive performance to the state-of-the-art statistical models on four different datasets from SIGHAN 2005 bakeoff.Comment: To appear in EMNLP 201

    Neural Network Models for Implicit Discourse Relation Classification in English and Chinese without Surface Features

    Full text link
    Inferring implicit discourse relations in natural language text is the most difficult subtask in discourse parsing. Surface features achieve good performance, but they are not readily applicable to other languages without semantic lexicons. Previous neural models require parses, surface features, or a small label set to work well. Here, we propose neural network models that are based on feedforward and long-short term memory architecture without any surface features. To our surprise, our best configured feedforward architecture outperforms LSTM-based model in most cases despite thorough tuning. Under various fine-grained label sets and a cross-linguistic setting, our feedforward models perform consistently better or at least just as well as systems that require hand-crafted surface features. Our models present the first neural Chinese discourse parser in the style of Chinese Discourse Treebank, showing that our results hold cross-linguistically

    Topic Memory Networks for Short Text Classification

    Full text link
    Many classification models work poorly on short texts due to data sparsity. To address this issue, we propose topic memory networks for short text classification with a novel topic memory mechanism to encode latent topic representations indicative of class labels. Different from most prior work that focuses on extending features with external knowledge or pre-trained topics, our model jointly explores topic inference and text classification with memory networks in an end-to-end manner. Experimental results on four benchmark datasets show that our model outperforms state-of-the-art models on short text classification, meanwhile generates coherent topics.Comment: EMNLP 201

    Revisiting Regex Generation for Modeling Industrial Applications by Incorporating Byte Pair Encoder

    Full text link
    Regular expression is important for many natural language processing tasks especially when used to deal with unstructured and semi-structured data. This work focuses on automatically generating regular expressions and proposes a novel genetic algorithm to deal with this problem. Different from the methods which generate regular expressions from character level, we first utilize byte pair encoder (BPE) to extract some frequent items, which are then used to construct regular expressions. The fitness function of our genetic algorithm contains multi objectives and is solved based on evolutionary procedure including crossover and mutation operation. In the fitness function, we take the length of generated regular expression, the maximum matching characters and samples for positive training samples, and the minimum matching characters and samples for negative training samples into consideration. In addition, to accelerate the training process, we do exponential decay on the population size of the genetic algorithm. Our method together with a strong baseline is tested on 13 kinds of challenging datasets. The results demonstrate the effectiveness of our method, which outperforms the baseline on 10 kinds of data and achieves nearly 50 percent improvement on average. By doing exponential decay, the training speed is approximately 100 times faster than the methods without using exponential decay. In summary, our method possesses both effectiveness and efficiency, and can be implemented for the industry application

    Structure Regularized Neural Network for Entity Relation Classification for Chinese Literature Text

    Full text link
    Relation classification is an important semantic processing task in the field of natural language processing. In this paper, we propose the task of relation classification for Chinese literature text. A new dataset of Chinese literature text is constructed to facilitate the study in this task. We present a novel model, named Structure Regularized Bidirectional Recurrent Convolutional Neural Network (SR-BRCNN), to identify the relation between entities. The proposed model learns relation representations along the shortest dependency path (SDP) extracted from the structure regularized dependency tree, which has the benefits of reducing the complexity of the whole model. Experimental results show that the proposed method significantly improves the F1 score by 10.3, and outperforms the state-of-the-art approaches on Chinese literature text.Comment: Accepted at NAACL HLT 2018. arXiv admin note: substantial text overlap with arXiv:1711.0250

    Real-time Automatic Word Segmentation for User-generated Text

    Full text link
    For readability and possibly for disambiguation, appropriate word segmentation is recommended for written text. In this paper, we propose a real-time assistive technology that utilizes an automatic segmentation. The language investigated is Korean, a head-final language with various morpho-syllabic blocks as characters. The training scheme is fully neural network-based and straightforward. Besides, we show how the proposed system can be utilized in a web-based real-time revision for a user-generated text. With qualitative and quantitative comparison with widely used text processing toolkits, we show the reliability of the proposed system and how it fits with conversation-style and non-canonical texts. The demonstration is available online.Comment: 8 pages, 4 figures, 1 tabl

    Generalize Symbolic Knowledge With Neural Rule Engine

    Full text link
    As neural networks have dominated the state-of-the-art results in a wide range of NLP tasks, it attracts considerable attention to improve the performance of neural models by integrating symbolic knowledge. Different from existing works, this paper investigates the combination of these two powerful paradigms from the knowledge-driven side. We propose Neural Rule Engine (NRE), which can learn knowledge explicitly from logic rules and then generalize them implicitly with neural networks. NRE is implemented with neural module networks in which each module represents an action of a logic rule. The experiments show that NRE could greatly improve the generalization abilities of logic rules with a significant increase in recall. Meanwhile, the precision is still maintained at a high level
    corecore