5,280 research outputs found
Memory-Efficient Topic Modeling
As one of the simplest probabilistic topic modeling techniques, latent
Dirichlet allocation (LDA) has found many important applications in text
mining, computer vision and computational biology. Recent training algorithms
for LDA can be interpreted within a unified message passing framework. However,
message passing requires storing previous messages with a large amount of
memory space, increasing linearly with the number of documents or the number of
topics. Therefore, the high memory usage is often a major problem for topic
modeling of massive corpora containing a large number of topics. To reduce the
space complexity, we propose a novel algorithm without storing previous
messages for training LDA: tiny belief propagation (TBP). The basic idea of TBP
relates the message passing algorithms with the non-negative matrix
factorization (NMF) algorithms, which absorb the message updating into the
message passing process, and thus avoid storing previous messages. Experimental
results on four large data sets confirm that TBP performs comparably well or
even better than current state-of-the-art training algorithms for LDA but with
a much less memory consumption. TBP can do topic modeling when massive corpora
cannot fit in the computer memory, for example, extracting thematic topics from
7 GB PUBMED corpora on a common desktop computer with 2GB memory.Comment: 20 pages, 7 figure
A New Approach to Speeding Up Topic Modeling
Latent Dirichlet allocation (LDA) is a widely-used probabilistic topic
modeling paradigm, and recently finds many applications in computer vision and
computational biology. In this paper, we propose a fast and accurate batch
algorithm, active belief propagation (ABP), for training LDA. Usually batch LDA
algorithms require repeated scanning of the entire corpus and searching the
complete topic space. To process massive corpora having a large number of
topics, the training iteration of batch LDA algorithms is often inefficient and
time-consuming. To accelerate the training speed, ABP actively scans the subset
of corpus and searches the subset of topic space for topic modeling, therefore
saves enormous training time in each iteration. To ensure accuracy, ABP selects
only those documents and topics that contribute to the largest residuals within
the residual belief propagation (RBP) framework. On four real-world corpora,
ABP performs around to times faster than state-of-the-art batch LDA
algorithms with a comparable topic modeling accuracy.Comment: 14 pages, 12 figure
Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate
Large Language Models (LLMs) have shown impressive capabilities in various
applications, but they still face various inconsistency issues. Existing works
primarily focus on the inconsistency issues within a single LLM, while we
complementarily explore the inter-consistency among multiple LLMs for
collaboration. To examine whether LLMs can collaborate effectively to achieve a
consensus for a shared goal, we focus on commonsense reasoning, and introduce a
formal debate framework (FORD) to conduct a three-stage debate among LLMs with
real-world scenarios alignment: fair debate, mismatched debate, and roundtable
debate. Through extensive experiments on various datasets, LLMs can effectively
collaborate to reach a consensus despite noticeable inter-inconsistencies, but
imbalances in their abilities can lead to domination by superior LLMs.
Leveraging a more advanced LLM like GPT-4 as an authoritative judge can boost
collaboration performance. Our work contributes to understanding the
inter-consistency among LLMs and lays the foundation for developing future
collaboration methods. Codes and data are available at
https://github.com/Waste-Wood/FORDComment: EMNLP 2023 Findings Camera Ready Versio
(E)-2-(CycloÂhexylÂmethylÂene)succinic acid
The title compound, C11H16O4, crystallizes with three molecules in the asymmetric unit. The cycloÂhexane ring adopts a chair conformation. InterÂmolecular O—H⋯O hydrogen bonds are observed and these help to establish the crystal packing
Hemiballism-hemichorea induced by ketotic hyperglycemia: case report with PET study and review of the literature
Hemiballism-hemichorea (HB-HC) is commonly used to describe the basal ganglion dysfunction in non-ketotic hyperglycemic elderly patients. Here we report two elderly female patients with acute onset of involuntary movements induced by hyperglycemia with positive urine ketones. We described the computed tomography and magnetic resonance imaging findings in these two patients, which is similar to that of non-ketotic hyperglycemic HB-HC patients. FDG-PET was performed and the glucose metabolism in the corresponding lesion in these two patients was contradictory with each other. We tried to clarify the underlying mechanisms of HB-HC and explain the contradictory neuroradiological findings in FDG-PET as being performed at different clinical stages
Microstructure and texture evolutions in FeCrAl cladding tube during pilger processing
The microstructure of FeCrAl cladding tubes depends on the fabricating process history. In this study, the microstructural characteristics of wrought FeCrAl alloys during industrial pilger processing into thin-walled tubes were investigated. The hot extruded tube showed ∼100 μm equiaxed grains with weak α∗-fiber in {h11}<1/h12> texture, while pilger rolling process change the microstructure to fragmented and elongated grains along the rolling direction. The pilgered textures could be predicted with the VPSC model. The inter-pass annealing at 800–850 \ub0C for 1 h results in recovery and recrystallization of the ferric matrix and restoration of ductility. The final finished tube shows fine recrystallized grains (∼11 μm) with dominant γ-fiber in three dimensions. Pilger rolling enhanced α-fiber while annealing reduced α-fiber and enhanced γ-fiber. Microstructural evolution in the Laves precipitates followed the sequence of faceted needle-like → spherical → faceted ellipsoidal. Thermomechanical processing resulted in cladding tubes with an area fraction of ∼5% and a number density of 5
7 10−11 m−2 in Laves precipitates, which is half that of the first-pilgered tube. Laves precipitates pin the grain boundaries to control the microstructure and prevent grain coarsening
Meaningful Learning: Advancing Abstract Reasoning in Large Language Models via Generic Fact Guidance
Large language models (LLMs) have developed impressive performance and strong
explainability across various reasoning scenarios, marking a significant stride
towards mimicking human-like intelligence. Despite this, when tasked with
simple questions supported by a generic fact, LLMs often fail to provide
consistent and precise answers, indicating a deficiency in abstract reasoning
abilities. This has sparked a vigorous debate about whether LLMs are genuinely
reasoning or merely memorizing. In light of this, we design a preliminary study
to quantify and delve into the abstract reasoning abilities of existing LLMs.
Our findings reveal a substantial discrepancy between their general reasoning
and abstract reasoning performances. To relieve this problem, we tailor an
abstract reasoning dataset (AbsR) together with a meaningful learning paradigm
to teach LLMs how to leverage generic facts for reasoning purposes. The results
show that our approach not only boosts the general reasoning performance of
LLMs but also makes considerable strides towards their capacity for abstract
reasoning, moving beyond simple memorization or imitation to a more nuanced
understanding and application of generic facts
Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering
The conventional k-modes algorithm and its variants have been extensively used for categorical data clustering. However, these algorithms have some drawbacks, e.g., they can be trapped into local optima and sensitive to initial clusters/modes. Our numerical experiments even showed that the k-modes algorithm could not identify the optimal clustering results for some special datasets regardless the selection of the initial centers. In this paper, we developed an integer linear programming (ILP) approach for the k-modes clustering, which is independent to the initial solution and can obtain directly the optimal results for small-sized datasets. We also developed a heuristic algorithm that implements iterative partial optimization in the ILP approach based on a framework of variable neighborhood search, known as IPO-ILP-VNS, to search for near-optimal results of medium and large sized datasets with controlled computing time. Experiments on 38 datasets, including 27 synthesized small datasets and 11 known benchmark datasets from the UCI site were carried out to test the proposed ILP approach and the IPO-ILP-VNS algorithm. The experimental results outperformed the conventional and other existing enhanced k-modes algorithms in literature, updated 9 of the UCI benchmark datasets with new and improved results
- …