12 research outputs found
LIDER: An Efficient High-dimensional Learned Index for Large-scale Dense Passage Retrieval
Many recent approaches of passage retrieval are using dense embeddings
generated from deep neural models, called "dense passage retrieval". The
state-of-the-art end-to-end dense passage retrieval systems normally deploy a
deep neural model followed by an approximate nearest neighbor (ANN) search
module. The model generates embeddings of the corpus and queries, which are
then indexed and searched by the high-performance ANN module. With the
increasing data scale, the ANN module unavoidably becomes the bottleneck on
efficiency. An alternative is the learned index, which achieves significantly
high search efficiency by learning the data distribution and predicting the
target data location. But most of the existing learned indexes are designed for
low dimensional data, which are not suitable for dense passage retrieval with
high-dimensional dense embeddings. In this paper, we propose LIDER, an
efficient high-dimensional Learned Index for large-scale DEnse passage
Retrieval. LIDER has a clustering-based hierarchical architecture formed by two
layers of core models. As the basic unit of LIDER to index and search data, a
core model includes an adapted recursive model index (RMI) and a dimension
reduction component which consists of an extended SortingKeys-LSH (SK-LSH) and
a key re-scaling module. The dimension reduction component reduces the
high-dimensional dense embeddings into one-dimensional keys and sorts them in a
specific order, which are then used by the RMI to make fast prediction.
Experiments show that LIDER has a higher search speed with high retrieval
quality comparing to the state-of-the-art ANN indexes on passage retrieval
tasks, e.g., on large-scale data it achieves 1.2x search speed and
significantly higher retrieval quality than the fastest baseline in our
evaluation. Furthermore, LIDER has a better capability of speed-quality
trade-off.Comment: Accepted by VLDB 202
Can Knowledge Graphs Simplify Text?
Knowledge Graph (KG)-to-Text Generation has seen recent improvements in
generating fluent and informative sentences which describe a given KG. As KGs
are widespread across multiple domains and contain important entity-relation
information, and as text simplification aims to reduce the complexity of a text
while preserving the meaning of the original text, we propose KGSimple, a novel
approach to unsupervised text simplification which infuses KG-established
techniques in order to construct a simplified KG path and generate a concise
text which preserves the original input's meaning. Through an iterative and
sampling KG-first approach, our model is capable of simplifying text when
starting from a KG by learning to keep important information while harnessing
KG-to-text generation to output fluent and descriptive sentences. We evaluate
various settings of the KGSimple model on currently-available KG-to-text
datasets, demonstrating its effectiveness compared to unsupervised text
simplification models which start with a given complex text. Our code is
available on GitHub.Comment: Accepted as a Main Conference Long Paper at CIKM 202
Can a permutation be sorted by best short swaps?
A short swap switches two elements with at most one element caught between them. Sorting permutation by short swaps asks to find a shortest short swap sequence to transform a permutation into another. A short swap can eliminate at most three inversions. It is still open for whether a permutation can be sorted by short swaps each of which can eliminate three inversions. In this paper, we present a polynomial time algorithm to solve the problem, which can decide whether a permutation can be sorted by short swaps each of which can eliminate 3 inversions in O(n) time, and if so, sort the permutation by such short swaps in O(n^2) time, where n is the number of elements in the permutation.
A short swap can cause the total length of two element vectors to decrease by at most 4. We further propose an algorithm to recognize a permutation which can be sorted by short swaps each of which can cause the element vector length sum to decrease by 4 in O(n) time, and if so, sort the permutation by such short swaps in O(n^2) time. This improves upon the O(n^2) algorithm proposed by Heath and Vergara to decide whether a permutation is so called lucky
Reasoning with Language Model is Planning with World Model
Large language models (LLMs) have shown remarkable reasoning capabilities,
especially when prompted to generate intermediate reasoning steps (e.g.,
Chain-of-Thought, CoT). However, LLMs can still struggle with problems that are
easy for humans, such as generating action plans for executing tasks in a given
environment, or performing complex math, logical, and commonsense reasoning.
The deficiency stems from the key fact that LLMs lack an internal
to predict the world (e.g., environment
status, intermediate variable values) and simulate long-term outcomes of
actions. This prevents LLMs from performing deliberate planning akin to human
brains, which involves exploring alternative reasoning paths, anticipating
future states and rewards, and iteratively refining existing reasoning steps.
To overcome the limitations, we propose a new LLM reasoning framework,
. RAP repurposes the LLM as both a world model and a reasoning
agent, and incorporates a principled planning algorithm (based on Monto Carlo
Tree Search) for strategic exploration in the vast reasoning space. During
reasoning, the LLM (as agent) incrementally builds a reasoning tree under the
guidance of the LLM (as world model) and task-specific rewards, and obtains a
high-reward reasoning path efficiently with a proper balance between
exploration exploitation. We apply RAP to a variety of
challenging reasoning problems including plan generation, math reasoning, and
logical inference. Empirical results on these tasks demonstrate the superiority
of RAP over various strong baselines, including CoT and least-to-most prompting
with self-consistency. RAP on LLAMA-33B surpasses CoT on GPT-4 with 33%
relative improvement in a plan generation setting
Focusing on the premature death of redeployed miners in China: an analysis of cause-of-death information from non-communicable diseases
Abstract Background Reducing premature deaths is an important step towards achieving the World Health Organization’s sustainable development goal. Redeployed miners are more prone to disease or premature death due to the special occupational characteristics. Our aims were to describe the deaths of redeployed miners, assess the losses due to premature death and identify their main health problems. All the records of individuals were obtained from Fuxin Mining Area Social Security Administration Center. Year of life lost (YLL) and average year of life lost were used to assess the loss due to premature death. YLL rates per 1000 individuals were considered to compare deaths from different populations. Results Circulatory system diseases contributed the most years of life lost in the causes of death, followed by neoplasms. But average year of life lost in neoplasms was 6.85, higher than circulatory system diseases, 5.63. Cerebrovascular disease and ischemic heart disease were the main causes of death in circulatory system diseases. And average years of life lost in cerebrovascular disease and ischemic heart disease were 5.85 and 5.62, higher than those in other circulatory system diseases. Lung cancer was the principal cause of death in neoplasms. Average year of life lost in liver cancer was 7.92, the highest in neoplasms. Conclusions For redeployed miners, YLL rates per 1000 individuals in cerebrovascular disease, ischemic heart disease and lung cancer were higher than those in other populations, especially in men. It is important to attach importance to the health of redeployed miners, take appropriate measures to reduce premature death and achieve the sustainable development goal. Our findings also contribute to a certain theoretical reference for other countries that face or will face the same problem
Plasma Metabolomic Profiling Reveals Preliminary Biomarkers of Pork Quality Based on pH Value
This study aimed to identify biomarkers for pork quality evaluation. Firstly, the correlation between indicators of pork quality evaluation was investigated. The pH of pork meat at 45 min post slaughter showed a significant negative correlation with meat color indicators (r: −0.4868–−0.3040). Subsequently, porcine plasma samples were further divided into low pH (pH = 6.16 ± 0.22) or high pH (pH = 6.75 ± 0.08) groups. Plasma metabolites in both sample groups were investigated using untargeted metabolomics. In total, 90 metabolites were recognized as differential metabolites using partial least squares discriminant analysis. Pathway enrichment analysis indicated these differential metabolites were enriched in amino acid metabolism and energy metabolism. Correlation analysis revealed that creatinine, L-carnitine, D-sphingosine, citraconic acid, and other metabolites may constitute novel plasma biomarkers with the pH value of pork meat. The current study provides important insights into plasma biomarkers for predicting pork quality based on pH value
Additional file 1 of The integration of multidisciplinary approaches revealed PTGES3 as a novel drug target for breast cancer treatment
Additional file 1: Figure S1. The flow chart for whole process analysis of this study. Figure S2. The Kaplan–Meier basing on the ESTIMATE analysis and Venn diagram of the differentially expressed genes (DEGs). The Kaplan–Meier curve and volcano plot basing on a immune score, b stromal score and c ESTIMATE score; d Venn diagram of the up-regulation and down-regulation DEGs. The overlapped DEGs are used for further analysis. Figure S3. The Kaplan–Meier and functional analysis. a Survival analyses according to the optimal cut-off expression value of each gene in the TCGA-BRCA cohort. All p < 0.05; b GO enrichment analysis; c KEGG analysis; d Protein–protein interaction analysis of six genes. Figure S4. Single sample gene set enrichment (ssGSEA) analysis in TCGA-BRAC cohort. a The expression levels of different immune cells in low- and high-risk groups; b The distribution of immune cells; red font represents upregulation and blue font represents downregulation; *p < 0.05, **p < 0.01; ***p < 0.005, ****p < 0.001. Figure S5. Correlation analysis for the six genes in TCGA-BRAC cohort. a The correlation between riskScore and immunScore. b The correlation between riskScore and 6 genes. c The correlation between immuneScore and six genes
Long non-coding RNA LHX1-DT regulates cardiomyocyte differentiation through H2A.Z-mediated LHX1 transcriptional activation
Summary: Long non-coding RNAs (lncRNAs) play widespread roles in various processes. However, there is still limited understanding of the precise mechanisms through which they regulate early stage cardiomyocyte differentiation. In this study, we identified a specific lncRNA called LHX1-DT, which is transcribed from a bidirectional promoter of LIM Homeobox 1 (LHX1) gene. Our findings demonstrated that LHX1-DT is nuclear-localized and transiently elevated expression along with LHX1 during early differentiation of cardiomyocytes. The phenotype was rescued by overexpression of LHX1 into the LHX1-DT−/− hESCs, indicating LHX1 is the downstream of LHX1-DT. Mechanistically, we discovered that LHX1-DT physically interacted with RNA/histone-binding protein PHF6 during mesoderm commitment and efficiently replaced conventional histone H2A with a histone variant H2A.Z at the promoter region of LHX1. In summary, our work uncovers a novel lncRNA, LHX1-DT, which plays a vital role in mediating the exchange of histone variants H2A.Z and H2A at the promoter region of LHX1