16 research outputs found

    SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity

    Get PDF
    This paper introduces a new task on Multilingual and Cross-lingual Semantic Word Similarity which measures the semantic similarity of word pairs within and across five languages: English, Farsi, German, Italian and Spanish. High quality datasets were manually curated for the five languages with high inter-annotator agreements (consistently in the 0.9 ballpark). These were used for semi-automatic construction of ten cross-lingual datasets. 17 teams participated in the task, submitting 24 systems in subtask 1 and 14 systems in subtask 2. Results show that systems that combine statistical knowledge from text corpora, in the form of word embeddings, and external knowledge from lexical resources are best performers in both subtasks. More information can be found on the task website: http://alt.qcri.org/semeval2017/task2/

    Towards a seamless integration of word senses into downstream NLP applications

    Get PDF
    Lexical ambiguity can impede NLP systems from accurate understanding of semantics. Despite its potential benefits, the integration of sense-level information into NLP systems has remained understudied. By incorporating a novel disambiguation algorithm into a state-of-the-art classification model, we create a pipeline to integrate sense-level information into down-stream NLP applications. We show that a simple disambiguation of the input text can lead to consistent performance improvement on multiple topic categorization and polarity detection datasets, particularly when the fine granularity of the underlying sense inventory is reduced and the document is sufficiently large. Our results also point to the need for sense representation research to focus more on in vivo evaluations which target the performance in downstream NLP applications rather than artificial benchmarks

    Police districting problem: literature review and annotated bibliography

    Get PDF
    The police districting problem concerns the efficient and effective design of patrol sectors in terms of performance attributes. Effectiveness is particularly important as it directly influences the ability of police agencies to stop and prevent crime. However, in this problem, a homogeneous distribution of workload is also desirable to guarantee fairness to the police agents and an increase in their satisfaction. This chapter provides a systematic review of the literature related to the police districting problem, whose history dates back to almost 50 years ago. Contributions are categorized in terms of attributes and solution methodology adopted. Also, an annotated bibliography that presents the most relevant elements of each research is given

    AnaLog: Testing Analytical and Deductive Logic Learnability in Language Models

    No full text
    We investigate the extent to which pre-trained language models acquire analytical and deductive logical reasoning capabilities as a side effect of learning word prediction. We present AnaLog, a natural language inference task designed to probe models for these capabilities, controlling for different invalid heuristics the models may adopt instead of learning the desired generalisations. We test four languagemodels on AnaLog, finding that they have all learned, to a different extent, to encode information that is predictive of entailment beyond shallow heuristics such as lexical overlap and grammaticality. We closely analyse the best performing language model and show that while it performs more consistently than other language models across logical connectives and reasoning domains, it still is sensitive to lexical and syntactic variations in the realisation of logical statements
    corecore