Search CORE

5,280 research outputs found

Memory-Efficient Topic Modeling

Author: Cao Xiao-Qin
Liu Zhi-Qiang
Zeng Jia
Publication venue
Publication date: 08/06/2012
Field of study

As one of the simplest probabilistic topic modeling techniques, latent Dirichlet allocation (LDA) has found many important applications in text mining, computer vision and computational biology. Recent training algorithms for LDA can be interpreted within a unified message passing framework. However, message passing requires storing previous messages with a large amount of memory space, increasing linearly with the number of documents or the number of topics. Therefore, the high memory usage is often a major problem for topic modeling of massive corpora containing a large number of topics. To reduce the space complexity, we propose a novel algorithm without storing previous messages for training LDA: tiny belief propagation (TBP). The basic idea of TBP relates the message passing algorithms with the non-negative matrix factorization (NMF) algorithms, which absorb the message updating into the message passing process, and thus avoid storing previous messages. Experimental results on four large data sets confirm that TBP performs comparably well or even better than current state-of-the-art training algorithms for LDA but with a much less memory consumption. TBP can do topic modeling when massive corpora cannot fit in the computer memory, for example, extracting thematic topics from 7 GB PUBMED corpora on a common desktop computer with 2GB memory.Comment: 20 pages, 7 figure

arXiv.org e-Print Archive

CiteSeerX

A New Approach to Speeding Up Topic Modeling

Author: Jia Zeng
Senior Member
Xiao-qin Cao
Zhi-qiang Liu
Publication venue
Publication date: 07/04/2014
Field of study

Latent Dirichlet allocation (LDA) is a widely-used probabilistic topic modeling paradigm, and recently finds many applications in computer vision and computational biology. In this paper, we propose a fast and accurate batch algorithm, active belief propagation (ABP), for training LDA. Usually batch LDA algorithms require repeated scanning of the entire corpus and searching the complete topic space. To process massive corpora having a large number of topics, the training iteration of batch LDA algorithms is often inefficient and time-consuming. To accelerate the training speed, ABP actively scans the subset of corpus and searches the subset of topic space for topic modeling, therefore saves enormous training time in each iteration. To ensure accuracy, ABP selects only those documents and topics that contribute to the largest residuals within the residual belief propagation (RBP) framework. On four real-world corpora, ABP performs around

10

100

times faster than state-of-the-art batch LDA algorithms with a comparable topic modeling accuracy.Comment: 14 pages, 12 figure

arXiv.org e-Print Archive

CiteSeerX

Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate

Author: Cao Yixin
Ding Xiao
Liu Ting
Qin Bing
Xiong Kai
Publication venue
Publication date: 18/10/2023
Field of study

Large Language Models (LLMs) have shown impressive capabilities in various applications, but they still face various inconsistency issues. Existing works primarily focus on the inconsistency issues within a single LLM, while we complementarily explore the inter-consistency among multiple LLMs for collaboration. To examine whether LLMs can collaborate effectively to achieve a consensus for a shared goal, we focus on commonsense reasoning, and introduce a formal debate framework (FORD) to conduct a three-stage debate among LLMs with real-world scenarios alignment: fair debate, mismatched debate, and roundtable debate. Through extensive experiments on various datasets, LLMs can effectively collaborate to reach a consensus despite noticeable inter-inconsistencies, but imbalances in their abilities can lead to domination by superior LLMs. Leveraging a more advanced LLM like GPT-4 as an authoritative judge can boost collaboration performance. Our work contributes to understanding the inter-consistency among LLMs and lays the foundation for developing future collaboration methods. Codes and data are available at https://github.com/Waste-Wood/FORDComment: EMNLP 2023 Findings Camera Ready Versio

arXiv.org e-Print Archive

(E)-2-(Cyclohexylmethylene)succinic acid

Author: Ling Qin
Stobbe
Wei Wang
Xiao-hui Cao
Yi Deng
Yue Wang
Publication venue: International Union of Crystallography
Publication date: 01/01/2008
Field of study

The title compound, C11H16O4, crystallizes with three molecules in the asymmetric unit. The cyclohexane ring adopts a chair conformation. Intermolecular O—H⋯O hydrogen bonds are observed and these help to establish the crystal packing

Crossref

Directory of Open Access Journals

PubMed Central

Hemiballism-hemichorea induced by ketotic hyperglycemia: case report with PET study and review of the literature

Author: Huidong Tang
Li Cao
Qin Xiao
Shengdi Chen
Xiaoyu Xin
Yuyan Tan
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Hemiballism-hemichorea (HB-HC) is commonly used to describe the basal ganglion dysfunction in non-ketotic hyperglycemic elderly patients. Here we report two elderly female patients with acute onset of involuntary movements induced by hyperglycemia with positive urine ketones. We described the computed tomography and magnetic resonance imaging findings in these two patients, which is similar to that of non-ketotic hyperglycemic HB-HC patients. FDG-PET was performed and the glucose metabolism in the corresponding lesion in these two patients was contradictory with each other. We tried to clarify the underlying mechanisms of HB-HC and explain the contradictory neuroradiological findings in FDG-PET as being performed at different clinical stages

Springer - Publisher Connector

PubMed Central

Microstructure and texture evolutions in FeCrAl cladding tube during pilger processing

Author: Cao Emmy
Du Peinan
Liu Huiqun
Pan Qianfu
Pei Jingyuan
Qin Xiao
Zhang Ruiqian
Publication venue
Publication date: 01/01/2023
Field of study

The microstructure of FeCrAl cladding tubes depends on the fabricating process history. In this study, the microstructural characteristics of wrought FeCrAl alloys during industrial pilger processing into thin-walled tubes were investigated. The hot extruded tube showed ∼100 μm equiaxed grains with weak α∗-fiber in {h11}<1/h12> texture, while pilger rolling process change the microstructure to fragmented and elongated grains along the rolling direction. The pilgered textures could be predicted with the VPSC model. The inter-pass annealing at 800–850 \ub0C for 1 h results in recovery and recrystallization of the ferric matrix and restoration of ductility. The final finished tube shows fine recrystallized grains (∼11 μm) with dominant γ-fiber in three dimensions. Pilger rolling enhanced α-fiber while annealing reduced α-fiber and enhanced γ-fiber. Microstructural evolution in the Laves precipitates followed the sequence of faceted needle-like → spherical → faceted ellipsoidal. Thermomechanical processing resulted in cladding tubes with an area fraction of ∼5% and a number density of 5 7 10−11 m−2 in Laves precipitates, which is half that of the first-pilgered tube. Laves precipitates pin the grain boundaries to control the microstructure and prevent grain coarsening

Chalmers Research

Meaningful Learning: Advancing Abstract Reasoning in Large Language Models via Generic Fact Guidance

Author: Cao Yixin
Ding Xiao
Liu Hongtao
Liu Ting
Qin Bing
Xiong Kai
Xu Dongliang
Yang Qing
Publication venue
Publication date: 14/03/2024
Field of study

Large language models (LLMs) have developed impressive performance and strong explainability across various reasoning scenarios, marking a significant stride towards mimicking human-like intelligence. Despite this, when tasked with simple questions supported by a generic fact, LLMs often fail to provide consistent and precise answers, indicating a deficiency in abstract reasoning abilities. This has sparked a vigorous debate about whether LLMs are genuinely reasoning or merely memorizing. In light of this, we design a preliminary study to quantify and delve into the abstract reasoning abilities of existing LLMs. Our findings reveal a substantial discrepancy between their general reasoning and abstract reasoning performances. To relieve this problem, we tailor an abstract reasoning dataset (AbsR) together with a meaningful learning paradigm to teach LLMs how to leverage generic facts for reasoning purposes. The results show that our approach not only boosts the general reasoning performance of LLMs but also makes considerable strides towards their capacity for abstract reasoning, moving beyond simple memorization or imitation to a more nuanced understanding and application of generic facts

arXiv.org e-Print Archive

Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering

Author: Alguwaizani
Bai
Bai
Bai
Barbara
Bradley
Cao
Cao
Cao
Changhao Huang
Chen
Chen
Franceschi
Frossyniotis
Gan
Ganti
Gilpin
Guha
Gupta
Hansen
Hansen
Hansen
He
Helber
Huang
Ikou Kaku
Jain
Jiang
Jiaoying Huang
Kao
Kaufman
Khan
Khan
Kim
MacQueen
Mladenovic
Mladenović
Mueller
Myhre
Ng
Parmar
Qin
Ralambondrainy
Saha
Sun
Wu
Xiao
Xiao
Xiao
Xiao
Xiao
Xiao
Xiao
Yiyong Xiao
Yuchun Xu
Zhao
Publication venue: 'Elsevier BV'
Publication date: 01/06/2019
Field of study

The conventional k-modes algorithm and its variants have been extensively used for categorical data clustering. However, these algorithms have some drawbacks, e.g., they can be trapped into local optima and sensitive to initial clusters/modes. Our numerical experiments even showed that the k-modes algorithm could not identify the optimal clustering results for some special datasets regardless the selection of the initial centers. In this paper, we developed an integer linear programming (ILP) approach for the k-modes clustering, which is independent to the initial solution and can obtain directly the optimal results for small-sized datasets. We also developed a heuristic algorithm that implements iterative partial optimization in the ILP approach based on a framework of variable neighborhood search, known as IPO-ILP-VNS, to search for near-optimal results of medium and large sized datasets with controlled computing time. Experiments on 38 datasets, including 27 synthesized small datasets and 11 known benchmark datasets from the UCI site were carried out to test the proposed ILP approach and the IPO-ILP-VNS algorithm. The experimental results outperformed the conventional and other existing enhanced k-modes algorithms in literature, updated 9 of the UCI benchmark datasets with new and improved results

Crossref

Aston Publications Explorer