5,280 research outputs found

    Memory-Efficient Topic Modeling

    Full text link
    As one of the simplest probabilistic topic modeling techniques, latent Dirichlet allocation (LDA) has found many important applications in text mining, computer vision and computational biology. Recent training algorithms for LDA can be interpreted within a unified message passing framework. However, message passing requires storing previous messages with a large amount of memory space, increasing linearly with the number of documents or the number of topics. Therefore, the high memory usage is often a major problem for topic modeling of massive corpora containing a large number of topics. To reduce the space complexity, we propose a novel algorithm without storing previous messages for training LDA: tiny belief propagation (TBP). The basic idea of TBP relates the message passing algorithms with the non-negative matrix factorization (NMF) algorithms, which absorb the message updating into the message passing process, and thus avoid storing previous messages. Experimental results on four large data sets confirm that TBP performs comparably well or even better than current state-of-the-art training algorithms for LDA but with a much less memory consumption. TBP can do topic modeling when massive corpora cannot fit in the computer memory, for example, extracting thematic topics from 7 GB PUBMED corpora on a common desktop computer with 2GB memory.Comment: 20 pages, 7 figure

    A New Approach to Speeding Up Topic Modeling

    Full text link
    Latent Dirichlet allocation (LDA) is a widely-used probabilistic topic modeling paradigm, and recently finds many applications in computer vision and computational biology. In this paper, we propose a fast and accurate batch algorithm, active belief propagation (ABP), for training LDA. Usually batch LDA algorithms require repeated scanning of the entire corpus and searching the complete topic space. To process massive corpora having a large number of topics, the training iteration of batch LDA algorithms is often inefficient and time-consuming. To accelerate the training speed, ABP actively scans the subset of corpus and searches the subset of topic space for topic modeling, therefore saves enormous training time in each iteration. To ensure accuracy, ABP selects only those documents and topics that contribute to the largest residuals within the residual belief propagation (RBP) framework. On four real-world corpora, ABP performs around 1010 to 100100 times faster than state-of-the-art batch LDA algorithms with a comparable topic modeling accuracy.Comment: 14 pages, 12 figure

    Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate

    Full text link
    Large Language Models (LLMs) have shown impressive capabilities in various applications, but they still face various inconsistency issues. Existing works primarily focus on the inconsistency issues within a single LLM, while we complementarily explore the inter-consistency among multiple LLMs for collaboration. To examine whether LLMs can collaborate effectively to achieve a consensus for a shared goal, we focus on commonsense reasoning, and introduce a formal debate framework (FORD) to conduct a three-stage debate among LLMs with real-world scenarios alignment: fair debate, mismatched debate, and roundtable debate. Through extensive experiments on various datasets, LLMs can effectively collaborate to reach a consensus despite noticeable inter-inconsistencies, but imbalances in their abilities can lead to domination by superior LLMs. Leveraging a more advanced LLM like GPT-4 as an authoritative judge can boost collaboration performance. Our work contributes to understanding the inter-consistency among LLMs and lays the foundation for developing future collaboration methods. Codes and data are available at https://github.com/Waste-Wood/FORDComment: EMNLP 2023 Findings Camera Ready Versio

    (E)-2-(Cyclo­hexyl­methyl­ene)succinic acid

    Get PDF
    The title compound, C11H16O4, crystallizes with three molecules in the asymmetric unit. The cyclo­hexane ring adopts a chair conformation. Inter­molecular O—H⋯O hydrogen bonds are observed and these help to establish the crystal packing

    Hemiballism-hemichorea induced by ketotic hyperglycemia: case report with PET study and review of the literature

    Get PDF
    Hemiballism-hemichorea (HB-HC) is commonly used to describe the basal ganglion dysfunction in non-ketotic hyperglycemic elderly patients. Here we report two elderly female patients with acute onset of involuntary movements induced by hyperglycemia with positive urine ketones. We described the computed tomography and magnetic resonance imaging findings in these two patients, which is similar to that of non-ketotic hyperglycemic HB-HC patients. FDG-PET was performed and the glucose metabolism in the corresponding lesion in these two patients was contradictory with each other. We tried to clarify the underlying mechanisms of HB-HC and explain the contradictory neuroradiological findings in FDG-PET as being performed at different clinical stages

    Microstructure and texture evolutions in FeCrAl cladding tube during pilger processing

    Get PDF
    The microstructure of FeCrAl cladding tubes depends on the fabricating process history. In this study, the microstructural characteristics of wrought FeCrAl alloys during industrial pilger processing into thin-walled tubes were investigated. The hot extruded tube showed ∼100 μm equiaxed grains with weak α∗-fiber in {h11}<1/h12> texture, while pilger rolling process change the microstructure to fragmented and elongated grains along the rolling direction. The pilgered textures could be predicted with the VPSC model. The inter-pass annealing at 800–850 \ub0C for 1 h results in recovery and recrystallization of the ferric matrix and restoration of ductility. The final finished tube shows fine recrystallized grains (∼11 μm) with dominant γ-fiber in three dimensions. Pilger rolling enhanced α-fiber while annealing reduced α-fiber and enhanced γ-fiber. Microstructural evolution in the Laves precipitates followed the sequence of faceted needle-like → spherical → faceted ellipsoidal. Thermomechanical processing resulted in cladding tubes with an area fraction of ∼5% and a number density of 5 7 10−11 m−2 in Laves precipitates, which is half that of the first-pilgered tube. Laves precipitates pin the grain boundaries to control the microstructure and prevent grain coarsening

    Meaningful Learning: Advancing Abstract Reasoning in Large Language Models via Generic Fact Guidance

    Full text link
    Large language models (LLMs) have developed impressive performance and strong explainability across various reasoning scenarios, marking a significant stride towards mimicking human-like intelligence. Despite this, when tasked with simple questions supported by a generic fact, LLMs often fail to provide consistent and precise answers, indicating a deficiency in abstract reasoning abilities. This has sparked a vigorous debate about whether LLMs are genuinely reasoning or merely memorizing. In light of this, we design a preliminary study to quantify and delve into the abstract reasoning abilities of existing LLMs. Our findings reveal a substantial discrepancy between their general reasoning and abstract reasoning performances. To relieve this problem, we tailor an abstract reasoning dataset (AbsR) together with a meaningful learning paradigm to teach LLMs how to leverage generic facts for reasoning purposes. The results show that our approach not only boosts the general reasoning performance of LLMs but also makes considerable strides towards their capacity for abstract reasoning, moving beyond simple memorization or imitation to a more nuanced understanding and application of generic facts

    Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering

    Get PDF
    The conventional k-modes algorithm and its variants have been extensively used for categorical data clustering. However, these algorithms have some drawbacks, e.g., they can be trapped into local optima and sensitive to initial clusters/modes. Our numerical experiments even showed that the k-modes algorithm could not identify the optimal clustering results for some special datasets regardless the selection of the initial centers. In this paper, we developed an integer linear programming (ILP) approach for the k-modes clustering, which is independent to the initial solution and can obtain directly the optimal results for small-sized datasets. We also developed a heuristic algorithm that implements iterative partial optimization in the ILP approach based on a framework of variable neighborhood search, known as IPO-ILP-VNS, to search for near-optimal results of medium and large sized datasets with controlled computing time. Experiments on 38 datasets, including 27 synthesized small datasets and 11 known benchmark datasets from the UCI site were carried out to test the proposed ILP approach and the IPO-ILP-VNS algorithm. The experimental results outperformed the conventional and other existing enhanced k-modes algorithms in literature, updated 9 of the UCI benchmark datasets with new and improved results
    • …
    corecore