17,175 research outputs found

    Memory-Efficient Topic Modeling

    Full text link
    As one of the simplest probabilistic topic modeling techniques, latent Dirichlet allocation (LDA) has found many important applications in text mining, computer vision and computational biology. Recent training algorithms for LDA can be interpreted within a unified message passing framework. However, message passing requires storing previous messages with a large amount of memory space, increasing linearly with the number of documents or the number of topics. Therefore, the high memory usage is often a major problem for topic modeling of massive corpora containing a large number of topics. To reduce the space complexity, we propose a novel algorithm without storing previous messages for training LDA: tiny belief propagation (TBP). The basic idea of TBP relates the message passing algorithms with the non-negative matrix factorization (NMF) algorithms, which absorb the message updating into the message passing process, and thus avoid storing previous messages. Experimental results on four large data sets confirm that TBP performs comparably well or even better than current state-of-the-art training algorithms for LDA but with a much less memory consumption. TBP can do topic modeling when massive corpora cannot fit in the computer memory, for example, extracting thematic topics from 7 GB PUBMED corpora on a common desktop computer with 2GB memory.Comment: 20 pages, 7 figure

    A New Approach to Speeding Up Topic Modeling

    Full text link
    Latent Dirichlet allocation (LDA) is a widely-used probabilistic topic modeling paradigm, and recently finds many applications in computer vision and computational biology. In this paper, we propose a fast and accurate batch algorithm, active belief propagation (ABP), for training LDA. Usually batch LDA algorithms require repeated scanning of the entire corpus and searching the complete topic space. To process massive corpora having a large number of topics, the training iteration of batch LDA algorithms is often inefficient and time-consuming. To accelerate the training speed, ABP actively scans the subset of corpus and searches the subset of topic space for topic modeling, therefore saves enormous training time in each iteration. To ensure accuracy, ABP selects only those documents and topics that contribute to the largest residuals within the residual belief propagation (RBP) framework. On four real-world corpora, ABP performs around 1010 to 100100 times faster than state-of-the-art batch LDA algorithms with a comparable topic modeling accuracy.Comment: 14 pages, 12 figure

    Research on the Equilibrium Speed-Density Relationship Around Flyover Work Zone

    Get PDF
    Increasing traffic demand has already reached the capacity of existing traffic facilities in most cities. In order to alleviate the traffic pressure and expand the capacity of the road network, constructing flyovers has become an effective way in most cities in China. During the period of the flyover construction, work zones occupy road space, impact traffic flow characteristics and driver behaviour; therefore, this causes a significant reduction of the capacity. Researching of the traffic flow characteristics during the period of flyover construction can improve traffic organization and traffic safety around work zones. This study analyses the traffic flow characteristics around the flyover work zone based on the site data collected in Hohhot City, China. This study shows that the traditional Logistic model for the equilibrium speed-density relationship is not applied to the traffic flow around the flyover work zone. Based on an in-depth analysis of the traffic flow characteristics and specific driver behaviours, this paper proposes an improved Logistic model to depict the equilibrium speed-density relationship around the flyover work zone. To analyse the mathematical characteristics of the speed-density relationship, this paper proposes a method to insert virtual data points into the initial data, which can make the fit curve be continuous.</p

    The sunflower moth, Homoeosoma nebulella (Denis et Schiffermüller ) (Lepidoptera: Pyralidae): outbreaks and pest management in Linhe, Inner Mongolia 2007–2008

    Get PDF
    Sunflowermoth Homoeosoma nebulella is the most common pest of sunflowers (Helianthus annuus L.) in China. A large outbreak involving H. nebulella was discovered in Linhe of the Mongolia Autonomous Region in 2007. Different issues related to pest management were investigated in 2007–2008. Irrigation for overwintering could promote pest outbreak in the following year. It is the safest practice to sow from mid-May to mid-June, i.e. not too early. The quantity of larvae could be reduced by the treatment with Bacillus thuringiensis. The effective pest management should include selecting proper sowing date, non-irrigation and B. thuringiensis treatment. Sex pheromone trapping as a potential control measure requires further studies
    • …
    corecore