65 research outputs found
MAPS-KB: A Million-scale Probabilistic Simile Knowledge Base
The ability to understand and generate similes is an imperative step to
realize human-level AI. However, there is still a considerable gap between
machine intelligence and human cognition in similes, since deep models based on
statistical distribution tend to favour high-frequency similes. Hence, a
large-scale symbolic knowledge base of similes is required, as it contributes
to the modeling of diverse yet unpopular similes while facilitating additional
evaluation and reasoning. To bridge the gap, we propose a novel framework for
large-scale simile knowledge base construction, as well as two probabilistic
metrics which enable an improved understanding of simile phenomena in natural
language. Overall, we construct MAPS-KB, a million-scale probabilistic simile
knowledge base, covering 4.3 million triplets over 0.4 million terms from 70 GB
corpora. We conduct sufficient experiments to justify the effectiveness and
necessity of the methods of our framework. We also apply MAPS-KB on three
downstream tasks to achieve state-of-the-art performance, further demonstrating
the value of MAPS-KB.Comment: Accepted to AAAI 202
Language Models as Knowledge Embeddings
Knowledge embeddings (KE) represent a knowledge graph (KG) by embedding
entities and relations into continuous vector spaces. Existing methods are
mainly structure-based or description-based. Structure-based methods learn
representations that preserve the inherent structure of KGs. They cannot well
represent abundant long-tail entities in real-world KGs with limited structural
information. Description-based methods leverage textual information and
language models. Prior approaches in this direction barely outperform
structure-based ones, and suffer from problems like expensive negative sampling
and restrictive description demand. In this paper, we propose LMKE, which
adopts Language Models to derive Knowledge Embeddings, aiming at both enriching
representations of long-tail entities and solving problems of prior
description-based methods. We formulate description-based KE learning with a
contrastive learning framework to improve efficiency in training and
evaluation. Experimental results show that LMKE achieves state-of-the-art
performance on KE benchmarks of link prediction and triple classification,
especially for long-tail entities.Comment: This revision corrects some texts after fixing a data leakage issu
An Efficient Robust Eye Localization by Learning the Convolution Distribution Using Eye Template
Eye localization is a fundamental process in many facial analyses. In practical use, it is often challenged by illumination, head pose, facial expression, occlusion, and other factors. It remains great difficulty to achieve high accuracy with short prediction time and low training cost at the same time. This paper presents a novel eye localization approach which explores only one-layer convolution map by eye template using a BP network. Results showed that the proposed method is robust to handle many difficult situations. In experiments, accuracy of 98% and 96%, respectively, on the BioID and LFPW test sets could be achieved in 10āfps prediction rate with only 15-minute training cost. In comparison with other robust models, the proposed method could obtain similar best results with greatly reduced training time and high prediction speed
Can Large Language Models Understand Real-World Complex Instructions?
Large language models (LLMs) can understand human instructions, showing their
potential for pragmatic applications beyond traditional NLP tasks. However,
they still struggle with complex instructions, which can be either complex task
descriptions that require multiple tasks and constraints, or complex input that
contains long context, noise, heterogeneous information and multi-turn format.
Due to these features, LLMs often ignore semantic constraints from task
descriptions, generate incorrect formats, violate length or sample count
constraints, and be unfaithful to the input text. Existing benchmarks are
insufficient to assess LLMs' ability to understand complex instructions, as
they are close-ended and simple. To bridge this gap, we propose CELLO, a
benchmark for evaluating LLMs' ability to follow complex instructions
systematically. We design eight features for complex instructions and construct
a comprehensive evaluation dataset from real-world scenarios. We also establish
four criteria and develop corresponding metrics, as current ones are
inadequate, biased or too strict and coarse-grained. We compare the performance
of representative Chinese-oriented and English-oriented models in following
complex instructions through extensive experiments. Resources of CELLO are
publicly available at https://github.com/Abbey4799/CELLO
Genetic Properties of a Nested Association Mapping Population Constructed With Semi-Winter and Spring Oilseed Rapes
Nested association mapping (NAM) populations have been widely applied to dissect the genetic basis of complex quantitative traits in a variety of crops. In this study, we developed a Brassica napus NAM (BN-NAM) population consisting of 15 recombination inbred line (RIL) families with 2,425 immortal genotypes. Fifteen high-density genetic linkage maps were constructed by genotyping by sequencing (GBS) based on all RIL families, with further integration into a joint linkage map (JLM) having 30,209 unique markers in common with multiple linkage maps. Furthermore, an ultra-density whole-genome variation map was constructed by projecting 4,444,309 high-quality variants onto the JLM. The NAM population captured a total of 88,542 recombination events (REs). The uneven distribution of recombination rate along chromosomes is positively correlated with the densities of genes and markers, but negatively correlated with the density of transposable elements and linkage disequilibrium (LD). Analyses of population structure and principal components revealed that the BN-NAM population could be divided into three groups with weak stratification. The LD decay distance across genome varied between 170 and 2,400 Kb, with LD decay more rapid in the A than in the C sub-genome. The pericentromeric regions contained large LD blocks, especially in the C sub-genome. This NAM population provides a valuable resource for dissecting the genetic basis of important traits in rapeseed, especially in semi-winter oilseed rape
Assouad Dimensions and Lower Dimensions of Some Moran Sets
We prove that the low dimensions of a class of Moran sets coincide with their Hausdorff dimensions and obtain a formula for the lower dimensions. Subsequently, we consider some homogeneous Cantor sets which belong to Moran sets and give the counterexamples in which their Assouad dimension is not equal to their upper box dimensions and packing dimensions under the case of not satisfying the condition of the smallest compression ratio cā>0
Multifractal Structure of the Divergence Points of Some Homogeneous Moran Measures
The point x for which the limit limrā0ā”(logā”Ī¼Bx,r/logā”r) does not exist is called divergence point. Recently, multifractal structure of the divergence points of self-similar measures has been investigated by many authors. This paper is devoted to the study of some Moran measures with the support on the homogeneous Moran fractals associated with the sequences of which the frequency of the letter exists; the Moran measures associated with this kind of structure are neither Gibbs nor self-similar and than complex. Such measures possess singular features because of the existence of so-called divergence points. By the box-counting principle, we analyze multifractal structure of the divergence points of some homogeneous Moran measures and show that the Hausdorff dimension of the set of divergence points is the same as the dimension of the whole Moran set
- ā¦