12 research outputs found

    LIDER: An Efficient High-dimensional Learned Index for Large-scale Dense Passage Retrieval

    Full text link
    Many recent approaches of passage retrieval are using dense embeddings generated from deep neural models, called "dense passage retrieval". The state-of-the-art end-to-end dense passage retrieval systems normally deploy a deep neural model followed by an approximate nearest neighbor (ANN) search module. The model generates embeddings of the corpus and queries, which are then indexed and searched by the high-performance ANN module. With the increasing data scale, the ANN module unavoidably becomes the bottleneck on efficiency. An alternative is the learned index, which achieves significantly high search efficiency by learning the data distribution and predicting the target data location. But most of the existing learned indexes are designed for low dimensional data, which are not suitable for dense passage retrieval with high-dimensional dense embeddings. In this paper, we propose LIDER, an efficient high-dimensional Learned Index for large-scale DEnse passage Retrieval. LIDER has a clustering-based hierarchical architecture formed by two layers of core models. As the basic unit of LIDER to index and search data, a core model includes an adapted recursive model index (RMI) and a dimension reduction component which consists of an extended SortingKeys-LSH (SK-LSH) and a key re-scaling module. The dimension reduction component reduces the high-dimensional dense embeddings into one-dimensional keys and sorts them in a specific order, which are then used by the RMI to make fast prediction. Experiments show that LIDER has a higher search speed with high retrieval quality comparing to the state-of-the-art ANN indexes on passage retrieval tasks, e.g., on large-scale data it achieves 1.2x search speed and significantly higher retrieval quality than the fastest baseline in our evaluation. Furthermore, LIDER has a better capability of speed-quality trade-off.Comment: Accepted by VLDB 202

    Can Knowledge Graphs Simplify Text?

    Full text link
    Knowledge Graph (KG)-to-Text Generation has seen recent improvements in generating fluent and informative sentences which describe a given KG. As KGs are widespread across multiple domains and contain important entity-relation information, and as text simplification aims to reduce the complexity of a text while preserving the meaning of the original text, we propose KGSimple, a novel approach to unsupervised text simplification which infuses KG-established techniques in order to construct a simplified KG path and generate a concise text which preserves the original input's meaning. Through an iterative and sampling KG-first approach, our model is capable of simplifying text when starting from a KG by learning to keep important information while harnessing KG-to-text generation to output fluent and descriptive sentences. We evaluate various settings of the KGSimple model on currently-available KG-to-text datasets, demonstrating its effectiveness compared to unsupervised text simplification models which start with a given complex text. Our code is available on GitHub.Comment: Accepted as a Main Conference Long Paper at CIKM 202

    Can a permutation be sorted by best short swaps?

    Get PDF
    A short swap switches two elements with at most one element caught between them. Sorting permutation by short swaps asks to find a shortest short swap sequence to transform a permutation into another. A short swap can eliminate at most three inversions. It is still open for whether a permutation can be sorted by short swaps each of which can eliminate three inversions. In this paper, we present a polynomial time algorithm to solve the problem, which can decide whether a permutation can be sorted by short swaps each of which can eliminate 3 inversions in O(n) time, and if so, sort the permutation by such short swaps in O(n^2) time, where n is the number of elements in the permutation. A short swap can cause the total length of two element vectors to decrease by at most 4. We further propose an algorithm to recognize a permutation which can be sorted by short swaps each of which can cause the element vector length sum to decrease by 4 in O(n) time, and if so, sort the permutation by such short swaps in O(n^2) time. This improves upon the O(n^2) algorithm proposed by Heath and Vergara to decide whether a permutation is so called lucky

    Reasoning with Language Model is Planning with World Model

    Full text link
    Large language models (LLMs) have shown remarkable reasoning capabilities, especially when prompted to generate intermediate reasoning steps (e.g., Chain-of-Thought, CoT). However, LLMs can still struggle with problems that are easy for humans, such as generating action plans for executing tasks in a given environment, or performing complex math, logical, and commonsense reasoning. The deficiency stems from the key fact that LLMs lack an internal world model\textit{world model} to predict the world state\textit{state} (e.g., environment status, intermediate variable values) and simulate long-term outcomes of actions. This prevents LLMs from performing deliberate planning akin to human brains, which involves exploring alternative reasoning paths, anticipating future states and rewards, and iteratively refining existing reasoning steps. To overcome the limitations, we propose a new LLM reasoning framework, R‾easoning via‾P‾lanning\underline{R}\textit{easoning vi}\underline{a} \underline{P}\textit{lanning} (RAP)\textbf{(RAP)}. RAP repurposes the LLM as both a world model and a reasoning agent, and incorporates a principled planning algorithm (based on Monto Carlo Tree Search) for strategic exploration in the vast reasoning space. During reasoning, the LLM (as agent) incrementally builds a reasoning tree under the guidance of the LLM (as world model) and task-specific rewards, and obtains a high-reward reasoning path efficiently with a proper balance between exploration vs.\textit{vs.} exploitation. We apply RAP to a variety of challenging reasoning problems including plan generation, math reasoning, and logical inference. Empirical results on these tasks demonstrate the superiority of RAP over various strong baselines, including CoT and least-to-most prompting with self-consistency. RAP on LLAMA-33B surpasses CoT on GPT-4 with 33% relative improvement in a plan generation setting

    Focusing on the premature death of redeployed miners in China: an analysis of cause-of-death information from non-communicable diseases

    No full text
    Abstract Background Reducing premature deaths is an important step towards achieving the World Health Organization’s sustainable development goal. Redeployed miners are more prone to disease or premature death due to the special occupational characteristics. Our aims were to describe the deaths of redeployed miners, assess the losses due to premature death and identify their main health problems. All the records of individuals were obtained from Fuxin Mining Area Social Security Administration Center. Year of life lost (YLL) and average year of life lost were used to assess the loss due to premature death. YLL rates per 1000 individuals were considered to compare deaths from different populations. Results Circulatory system diseases contributed the most years of life lost in the causes of death, followed by neoplasms. But average year of life lost in neoplasms was 6.85, higher than circulatory system diseases, 5.63. Cerebrovascular disease and ischemic heart disease were the main causes of death in circulatory system diseases. And average years of life lost in cerebrovascular disease and ischemic heart disease were 5.85 and 5.62, higher than those in other circulatory system diseases. Lung cancer was the principal cause of death in neoplasms. Average year of life lost in liver cancer was 7.92, the highest in neoplasms. Conclusions For redeployed miners, YLL rates per 1000 individuals in cerebrovascular disease, ischemic heart disease and lung cancer were higher than those in other populations, especially in men. It is important to attach importance to the health of redeployed miners, take appropriate measures to reduce premature death and achieve the sustainable development goal. Our findings also contribute to a certain theoretical reference for other countries that face or will face the same problem

    Plasma Metabolomic Profiling Reveals Preliminary Biomarkers of Pork Quality Based on pH Value

    No full text
    This study aimed to identify biomarkers for pork quality evaluation. Firstly, the correlation between indicators of pork quality evaluation was investigated. The pH of pork meat at 45 min post slaughter showed a significant negative correlation with meat color indicators (r: −0.4868–−0.3040). Subsequently, porcine plasma samples were further divided into low pH (pH = 6.16 ± 0.22) or high pH (pH = 6.75 ± 0.08) groups. Plasma metabolites in both sample groups were investigated using untargeted metabolomics. In total, 90 metabolites were recognized as differential metabolites using partial least squares discriminant analysis. Pathway enrichment analysis indicated these differential metabolites were enriched in amino acid metabolism and energy metabolism. Correlation analysis revealed that creatinine, L-carnitine, D-sphingosine, citraconic acid, and other metabolites may constitute novel plasma biomarkers with the pH value of pork meat. The current study provides important insights into plasma biomarkers for predicting pork quality based on pH value

    Additional file 1 of The integration of multidisciplinary approaches revealed PTGES3 as a novel drug target for breast cancer treatment

    No full text
    Additional file 1: Figure S1. The flow chart for whole process analysis of this study. Figure S2. The Kaplan–Meier basing on the ESTIMATE analysis and Venn diagram of the differentially expressed genes (DEGs). The Kaplan–Meier curve and volcano plot basing on a immune score, b stromal score and c ESTIMATE score; d Venn diagram of the up-regulation and down-regulation DEGs. The overlapped DEGs are used for further analysis. Figure S3. The Kaplan–Meier and functional analysis. a Survival analyses according to the optimal cut-off expression value of each gene in the TCGA-BRCA cohort. All p < 0.05; b GO enrichment analysis; c KEGG analysis; d Protein–protein interaction analysis of six genes. Figure S4. Single sample gene set enrichment (ssGSEA) analysis in TCGA-BRAC cohort. a The expression levels of different immune cells in low- and high-risk groups; b The distribution of immune cells; red font represents upregulation and blue font represents downregulation; *p < 0.05, **p < 0.01; ***p < 0.005, ****p < 0.001. Figure S5. Correlation analysis for the six genes in TCGA-BRAC cohort. a The correlation between riskScore and immunScore. b The correlation between riskScore and 6 genes. c The correlation between immuneScore and six genes

    Long non-coding RNA LHX1-DT regulates cardiomyocyte differentiation through H2A.Z-mediated LHX1 transcriptional activation

    No full text
    Summary: Long non-coding RNAs (lncRNAs) play widespread roles in various processes. However, there is still limited understanding of the precise mechanisms through which they regulate early stage cardiomyocyte differentiation. In this study, we identified a specific lncRNA called LHX1-DT, which is transcribed from a bidirectional promoter of LIM Homeobox 1 (LHX1) gene. Our findings demonstrated that LHX1-DT is nuclear-localized and transiently elevated expression along with LHX1 during early differentiation of cardiomyocytes. The phenotype was rescued by overexpression of LHX1 into the LHX1-DT−/− hESCs, indicating LHX1 is the downstream of LHX1-DT. Mechanistically, we discovered that LHX1-DT physically interacted with RNA/histone-binding protein PHF6 during mesoderm commitment and efficiently replaced conventional histone H2A with a histone variant H2A.Z at the promoter region of LHX1. In summary, our work uncovers a novel lncRNA, LHX1-DT, which plays a vital role in mediating the exchange of histone variants H2A.Z and H2A at the promoter region of LHX1
    corecore