65 research outputs found

    MAPS-KB: A Million-scale Probabilistic Simile Knowledge Base

    Full text link
    The ability to understand and generate similes is an imperative step to realize human-level AI. However, there is still a considerable gap between machine intelligence and human cognition in similes, since deep models based on statistical distribution tend to favour high-frequency similes. Hence, a large-scale symbolic knowledge base of similes is required, as it contributes to the modeling of diverse yet unpopular similes while facilitating additional evaluation and reasoning. To bridge the gap, we propose a novel framework for large-scale simile knowledge base construction, as well as two probabilistic metrics which enable an improved understanding of simile phenomena in natural language. Overall, we construct MAPS-KB, a million-scale probabilistic simile knowledge base, covering 4.3 million triplets over 0.4 million terms from 70 GB corpora. We conduct sufficient experiments to justify the effectiveness and necessity of the methods of our framework. We also apply MAPS-KB on three downstream tasks to achieve state-of-the-art performance, further demonstrating the value of MAPS-KB.Comment: Accepted to AAAI 202

    Language Models as Knowledge Embeddings

    Full text link
    Knowledge embeddings (KE) represent a knowledge graph (KG) by embedding entities and relations into continuous vector spaces. Existing methods are mainly structure-based or description-based. Structure-based methods learn representations that preserve the inherent structure of KGs. They cannot well represent abundant long-tail entities in real-world KGs with limited structural information. Description-based methods leverage textual information and language models. Prior approaches in this direction barely outperform structure-based ones, and suffer from problems like expensive negative sampling and restrictive description demand. In this paper, we propose LMKE, which adopts Language Models to derive Knowledge Embeddings, aiming at both enriching representations of long-tail entities and solving problems of prior description-based methods. We formulate description-based KE learning with a contrastive learning framework to improve efficiency in training and evaluation. Experimental results show that LMKE achieves state-of-the-art performance on KE benchmarks of link prediction and triple classification, especially for long-tail entities.Comment: This revision corrects some texts after fixing a data leakage issu

    An Efficient Robust Eye Localization by Learning the Convolution Distribution Using Eye Template

    Get PDF
    Eye localization is a fundamental process in many facial analyses. In practical use, it is often challenged by illumination, head pose, facial expression, occlusion, and other factors. It remains great difficulty to achieve high accuracy with short prediction time and low training cost at the same time. This paper presents a novel eye localization approach which explores only one-layer convolution map by eye template using a BP network. Results showed that the proposed method is robust to handle many difficult situations. In experiments, accuracy of 98% and 96%, respectively, on the BioID and LFPW test sets could be achieved in 10ā€‰fps prediction rate with only 15-minute training cost. In comparison with other robust models, the proposed method could obtain similar best results with greatly reduced training time and high prediction speed

    Can Large Language Models Understand Real-World Complex Instructions?

    Full text link
    Large language models (LLMs) can understand human instructions, showing their potential for pragmatic applications beyond traditional NLP tasks. However, they still struggle with complex instructions, which can be either complex task descriptions that require multiple tasks and constraints, or complex input that contains long context, noise, heterogeneous information and multi-turn format. Due to these features, LLMs often ignore semantic constraints from task descriptions, generate incorrect formats, violate length or sample count constraints, and be unfaithful to the input text. Existing benchmarks are insufficient to assess LLMs' ability to understand complex instructions, as they are close-ended and simple. To bridge this gap, we propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically. We design eight features for complex instructions and construct a comprehensive evaluation dataset from real-world scenarios. We also establish four criteria and develop corresponding metrics, as current ones are inadequate, biased or too strict and coarse-grained. We compare the performance of representative Chinese-oriented and English-oriented models in following complex instructions through extensive experiments. Resources of CELLO are publicly available at https://github.com/Abbey4799/CELLO

    Genetic Properties of a Nested Association Mapping Population Constructed With Semi-Winter and Spring Oilseed Rapes

    Get PDF
    Nested association mapping (NAM) populations have been widely applied to dissect the genetic basis of complex quantitative traits in a variety of crops. In this study, we developed a Brassica napus NAM (BN-NAM) population consisting of 15 recombination inbred line (RIL) families with 2,425 immortal genotypes. Fifteen high-density genetic linkage maps were constructed by genotyping by sequencing (GBS) based on all RIL families, with further integration into a joint linkage map (JLM) having 30,209 unique markers in common with multiple linkage maps. Furthermore, an ultra-density whole-genome variation map was constructed by projecting 4,444,309 high-quality variants onto the JLM. The NAM population captured a total of 88,542 recombination events (REs). The uneven distribution of recombination rate along chromosomes is positively correlated with the densities of genes and markers, but negatively correlated with the density of transposable elements and linkage disequilibrium (LD). Analyses of population structure and principal components revealed that the BN-NAM population could be divided into three groups with weak stratification. The LD decay distance across genome varied between 170 and 2,400 Kb, with LD decay more rapid in the A than in the C sub-genome. The pericentromeric regions contained large LD blocks, especially in the C sub-genome. This NAM population provides a valuable resource for dissecting the genetic basis of important traits in rapeseed, especially in semi-winter oilseed rape

    Assouad Dimensions and Lower Dimensions of Some Moran Sets

    No full text
    We prove that the low dimensions of a class of Moran sets coincide with their Hausdorff dimensions and obtain a formula for the lower dimensions. Subsequently, we consider some homogeneous Cantor sets which belong to Moran sets and give the counterexamples in which their Assouad dimension is not equal to their upper box dimensions and packing dimensions under the case of not satisfying the condition of the smallest compression ratio cāˆ—>0

    Multifractal Structure of the Divergence Points of Some Homogeneous Moran Measures

    Get PDF
    The point x for which the limit limrā†’0ā”(logā”Ī¼Bx,r/logā”r) does not exist is called divergence point. Recently, multifractal structure of the divergence points of self-similar measures has been investigated by many authors. This paper is devoted to the study of some Moran measures with the support on the homogeneous Moran fractals associated with the sequences of which the frequency of the letter exists; the Moran measures associated with this kind of structure are neither Gibbs nor self-similar and than complex. Such measures possess singular features because of the existence of so-called divergence points. By the box-counting principle, we analyze multifractal structure of the divergence points of some homogeneous Moran measures and show that the Hausdorff dimension of the set of divergence points is the same as the dimension of the whole Moran set
    • ā€¦
    corecore