52,042 research outputs found

    CD-CNN: A Partially Supervised Cross-Domain Deep Learning Model for Urban Resident Recognition

    Full text link
    Driven by the wave of urbanization in recent decades, the research topic about migrant behavior analysis draws great attention from both academia and the government. Nevertheless, subject to the cost of data collection and the lack of modeling methods, most of existing studies use only questionnaire surveys with sparse samples and non-individual level statistical data to achieve coarse-grained studies of migrant behaviors. In this paper, a partially supervised cross-domain deep learning model named CD-CNN is proposed for migrant/native recognition using mobile phone signaling data as behavioral features and questionnaire survey data as incomplete labels. Specifically, CD-CNN features in decomposing the mobile data into location domain and communication domain, and adopts a joint learning framework that combines two convolutional neural networks with a feature balancing scheme. Moreover, CD-CNN employs a three-step algorithm for training, in which the co-training step is of great value to partially supervised cross-domain learning. Comparative experiments on the city Wuxi demonstrate the high predictive power of CD-CNN. Two interesting applications further highlight the ability of CD-CNN for in-depth migrant behavioral analysis.Comment: 8 pages, 5 figures, conferenc

    Weakly-Supervised Neural Text Classification

    Full text link
    Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.Comment: CIKM 2018 Full Pape

    Topically Driven Neural Language Model

    Full text link
    Language models are typically applied at the sentence level, without access to the broader document context. We present a neural language model that incorporates document context in the form of a topic model-like architecture, thus providing a succinct representation of the broader document context outside of the current sentence. Experiments over a range of datasets demonstrate that our model outperforms a pure sentence-based model in terms of language model perplexity, and leads to topics that are potentially more coherent than those produced by a standard LDA topic model. Our model also has the ability to generate related sentences for a topic, providing another way to interpret topics.Comment: 11 pages, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017) (to appear
    • …
    corecore