76 research outputs found

    RORS: Enhanced Rule-based OWL Reasoning on Spark

    Full text link
    The rule-based OWL reasoning is to compute the deductive closure of an ontology by applying RDF/RDFS and OWL entailment rules. The performance of the rule-based OWL reasoning is often sensitive to the rule execution order. In this paper, we present an approach to enhancing the performance of the rule-based OWL reasoning on Spark based on a locally optimal executable strategy. Firstly, we divide all rules (27 in total) into four main classes, namely, SPO rules (5 rules), type rules (7 rules), sameAs rules (7 rules), and schema rules (8 rules) since, as we investigated, those triples corresponding to the first three classes of rules are overwhelming (e.g., over 99% in the LUBM dataset) in our practical world. Secondly, based on the interdependence among those entailment rules in each class, we pick out an optimal rule executable order of each class and then combine them into a new rule execution order of all rules. Finally, we implement the new rule execution order on Spark in a prototype called RORS. The experimental results show that the running time of RORS is improved by about 30% as compared to Kim & Park's algorithm (2015) using the LUBM200 (27.6 million triples).Comment: 12 page

    Relative Counterfactual Contrastive Learning for Mitigating Pretrained Stance Bias in Stance Detection

    Full text link
    Stance detection classifies stance relations (namely, Favor, Against, or Neither) between comments and targets. Pretrained language models (PLMs) are widely used to mine the stance relation to improve the performance of stance detection through pretrained knowledge. However, PLMs also embed ``bad'' pretrained knowledge concerning stance into the extracted stance relation semantics, resulting in pretrained stance bias. It is not trivial to measure pretrained stance bias due to its weak quantifiability. In this paper, we propose Relative Counterfactual Contrastive Learning (RCCL), in which pretrained stance bias is mitigated as relative stance bias instead of absolute stance bias to overtake the difficulty of measuring bias. Firstly, we present a new structural causal model for characterizing complicated relationships among context, PLMs and stance relations to locate pretrained stance bias. Then, based on masked language model prediction, we present a target-aware relative stance sample generation method for obtaining relative bias. Finally, we use contrastive learning based on counterfactual theory to mitigate pretrained stance bias and preserve context stance relation. Experiments show that the proposed method is superior to stance detection and debiasing baselines

    Attribute Simulation for Item Embedding Enhancement in Multi-interest Recommendation

    Full text link
    Although multi-interest recommenders have achieved significant progress in the matching stage, our research reveals that existing models tend to exhibit an under-clustered item embedding space, which leads to a low discernibility between items and hampers item retrieval. This highlights the necessity for item embedding enhancement. However, item attributes, which serve as effective and straightforward side information for enhancement, are either unavailable or incomplete in many public datasets due to the labor-intensive nature of manual annotation tasks. This dilemma raises two meaningful questions: 1. Can we bypass manual annotation and directly simulate complete attribute information from the interaction data? And 2. If feasible, how to simulate attributes with high accuracy and low complexity in the matching stage? In this paper, we first establish an inspiring theoretical feasibility that the item-attribute correlation matrix can be approximated through elementary transformations on the item co-occurrence matrix. Then based on formula derivation, we propose a simple yet effective module, SimEmb (Item Embedding Enhancement via Simulated Attribute), in the multi-interest recommendation of the matching stage to implement our findings. By simulating attributes with the co-occurrence matrix, SimEmb discards the item ID-based embedding and employs the attribute-weighted summation for item embedding enhancement. Comprehensive experiments on four benchmark datasets demonstrate that our approach notably enhances the clustering of item embedding and significantly outperforms SOTA models with an average improvement of 25.59% on [email protected]: This paper has been accepted by the 17th ACM International Conference on Web Search and Data Mining (WSDM 2024). The camera-ready version will be available in the conference proceeding

    BiSup: Bidirectional Quantization Error Suppression for Large Language Models

    Full text link
    As the size and context length of Large Language Models (LLMs) grow, weight-activation quantization has emerged as a crucial technique for efficient deployment of LLMs. Compared to weight-only quantization, weight-activation quantization presents greater challenges due to the presence of outliers in activations. Existing methods have made significant progress by exploring mixed-precision quantization and outlier suppression. However, these methods primarily focus on optimizing the results of single matrix multiplication, neglecting the bidirectional propagation of quantization errors in LLMs. Specifically, errors accumulate vertically within the same token through layers, and diffuse horizontally across different tokens due to self-attention mechanisms. To address this issue, we introduce BiSup, a Bidirectional quantization error Suppression method. By constructing appropriate optimizable parameter spaces, BiSup utilizes a small amount of data for quantization-aware parameter-efficient fine-tuning to suppress the error vertical accumulation. Besides, BiSup employs prompt mixed-precision quantization strategy, which preserves high precision for the key-value cache of system prompts, to mitigate the error horizontal diffusion. Extensive experiments on Llama and Qwen families demonstrate that BiSup can improve performance over two state-of-the-art methods (the average WikiText2 perplexity decreases from 13.26 to 9.41 for Atom and from 14.33 to 7.85 for QuaRot under the W3A3-g128 configuration), further facilitating the practical applications of low-bit weight-activation quantization

    pSPARQL: A Querying Language for Probabilistic RDF (Extended Abstract)

    Get PDF
    Abstract. In this paper, we present a querying language for probabilistic RDF databases, where each triple has a probability, called pSRARQL, built on SPAR-QL, recommended by W3C as a querying language for RDF databases. Firstly, we present the syntax and semantics of pSPARQL. Secondly, we define the query problem of pSPARQL corresponding to probabilities of solutions. Finally, we show that the query evaluation of general pSPARQL patterns is PSPACEcomplete
    • …
    corecore