17,175 research outputs found
Memory-Efficient Topic Modeling
As one of the simplest probabilistic topic modeling techniques, latent
Dirichlet allocation (LDA) has found many important applications in text
mining, computer vision and computational biology. Recent training algorithms
for LDA can be interpreted within a unified message passing framework. However,
message passing requires storing previous messages with a large amount of
memory space, increasing linearly with the number of documents or the number of
topics. Therefore, the high memory usage is often a major problem for topic
modeling of massive corpora containing a large number of topics. To reduce the
space complexity, we propose a novel algorithm without storing previous
messages for training LDA: tiny belief propagation (TBP). The basic idea of TBP
relates the message passing algorithms with the non-negative matrix
factorization (NMF) algorithms, which absorb the message updating into the
message passing process, and thus avoid storing previous messages. Experimental
results on four large data sets confirm that TBP performs comparably well or
even better than current state-of-the-art training algorithms for LDA but with
a much less memory consumption. TBP can do topic modeling when massive corpora
cannot fit in the computer memory, for example, extracting thematic topics from
7 GB PUBMED corpora on a common desktop computer with 2GB memory.Comment: 20 pages, 7 figure
A New Approach to Speeding Up Topic Modeling
Latent Dirichlet allocation (LDA) is a widely-used probabilistic topic
modeling paradigm, and recently finds many applications in computer vision and
computational biology. In this paper, we propose a fast and accurate batch
algorithm, active belief propagation (ABP), for training LDA. Usually batch LDA
algorithms require repeated scanning of the entire corpus and searching the
complete topic space. To process massive corpora having a large number of
topics, the training iteration of batch LDA algorithms is often inefficient and
time-consuming. To accelerate the training speed, ABP actively scans the subset
of corpus and searches the subset of topic space for topic modeling, therefore
saves enormous training time in each iteration. To ensure accuracy, ABP selects
only those documents and topics that contribute to the largest residuals within
the residual belief propagation (RBP) framework. On four real-world corpora,
ABP performs around to times faster than state-of-the-art batch LDA
algorithms with a comparable topic modeling accuracy.Comment: 14 pages, 12 figure
Recommended from our members
A low-bandgap dimeric porphyrin molecule for 10% efficiency solar cells with small photon energy loss
Dimeric porphyrin molecules have great potential as donor materials for high performance bulk heterojunction organic solar cells (OSCs). Recently reported dimeric porphyrins bridged by ethynylenes showed power conversion efficiencies (PCEs) of more than 8%. In this study, we design and synthesize a new conjugated dimeric D-A porphyrin ZnP2BT-RH, in which the two porphyrin units are linked by an electron accepting benzothiadiazole (BT) unit. The introduction of the BT unit enhances the electron delocalization, resulting in a lower highest occupied molecular orbital (HOMO) energy level and an increased molar extinction coefficient in the near-infrared (NIR) region. The bulk heterojunction solar cells with ZnP2BT-RH as the donor material exhibit a high PCE of up to 10% with a low energy loss (Eloss) of only 0.56 eV. The 10% PCE is the highest for porphyrin-based OSCs with a conventional structure, and this Eloss is also the smallest among those reported for small molecule-based OSCs with a PCE higher than 10% to date
Research on the Equilibrium Speed-Density Relationship Around Flyover Work Zone
Increasing traffic demand has already reached the capacity of existing traffic facilities in most cities. In order to alleviate the traffic pressure and expand the capacity of the road network, constructing flyovers has become an effective way in most cities in China. During the period of the flyover construction, work zones occupy road space, impact traffic flow characteristics and driver behaviour; therefore, this causes a significant reduction of the capacity. Researching of the traffic flow characteristics during the period of flyover construction can improve traffic organization and traffic safety around work zones. This study analyses the traffic flow characteristics around the flyover work zone based on the site data collected in Hohhot City, China. This study shows that the traditional Logistic model for the equilibrium speed-density relationship is not applied to the traffic flow around the flyover work zone. Based on an in-depth analysis of the traffic flow characteristics and specific driver behaviours, this paper proposes an improved Logistic model to depict the equilibrium speed-density relationship around the flyover work zone. To analyse the mathematical characteristics of the speed-density relationship, this paper proposes a method to insert virtual data points into the initial data, which can make the fit curve be continuous.</p
The sunflower moth, Homoeosoma nebulella (Denis et Schiffermüller ) (Lepidoptera: Pyralidae): outbreaks and pest management in Linhe, Inner Mongolia 2007–2008
Sunflowermoth Homoeosoma nebulella is the most common pest of sunflowers (Helianthus annuus L.) in China. A large outbreak involving H. nebulella was discovered in Linhe of the Mongolia Autonomous Region in 2007. Different issues related to pest management were investigated in 2007–2008. Irrigation for overwintering could promote pest outbreak in the following year. It is the safest practice to sow from mid-May to mid-June, i.e. not too early. The quantity of larvae could be reduced by the treatment with Bacillus thuringiensis. The effective pest management should include selecting proper sowing date, non-irrigation and B. thuringiensis treatment. Sex pheromone trapping as a potential control measure requires further studies
- …