3,955 research outputs found

    Multi-Source Spatial Entity Linkage

    Get PDF
    Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities, describe them with different attributes, and sometimes provide contradicting information. Hence, we introduce the spatial entity linkage problem, which finds which pairs of spatial entities belong to the same physical spatial entity. Our proposed solution (QuadSky) starts with a time-efficient spatial blocking technique (QuadFlex), compares pairwise the spatial entities in the same block, ranks the pairs using Pareto optimality with the SkyRank algorithm, and finally, classifies the pairs with our novel SkyEx-* family of algorithms that yield 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of 777,452 pairs. Moreover, we provide a theoretical guarantee and formalize the SkyEx-FES algorithm that explores only 27% of the skylines without any loss in F-measure. Furthermore, our fully unsupervised algorithm SkyEx-D approximates the optimal result with an F-measure loss of just 0.01. Finally, QuadSky provides the best trade-off between precision and recall, and the best F-measure compared to the existing baselines and clustering techniques, and approximates the results of supervised learning solutions

    Analyzing the behavioral profiles of sets of near-synonyms in American English from a diachronic perspective

    Get PDF
    It is a rather generalized assumption that synonymy is relatively straightforward and unproblematic, being the semantic relation which is familiar to most people, including nonlinguists. However, despite being a common linguistic phenomenon, synonymy is also a particularly complex one (Cruse, 2000; Liu, 2010)Consellería de Cultura, Educación e Ordenación Universitaria, Xunta de Galicia, ref. ED481A-2016/168; Programa Estatal de Fomento de la Investigación Científica y Técnica de Excelencia, Ministerio de Economía y Competitividad, ref. FFI2017-86884-

    Non-Compositional Term Dependence for Information Retrieval

    Full text link
    Modelling term dependence in IR aims to identify co-occurring terms that are too heavily dependent on each other to be treated as a bag of words, and to adapt the indexing and ranking accordingly. Dependent terms are predominantly identified using lexical frequency statistics, assuming that (a) if terms co-occur often enough in some corpus, they are semantically dependent; (b) the more often they co-occur, the more semantically dependent they are. This assumption is not always correct: the frequency of co-occurring terms can be separate from the strength of their semantic dependence. E.g. "red tape" might be overall less frequent than "tape measure" in some corpus, but this does not mean that "red"+"tape" are less dependent than "tape"+"measure". This is especially the case for non-compositional phrases, i.e. phrases whose meaning cannot be composed from the individual meanings of their terms (such as the phrase "red tape" meaning bureaucracy). Motivated by this lack of distinction between the frequency and strength of term dependence in IR, we present a principled approach for handling term dependence in queries, using both lexical frequency and semantic evidence. We focus on non-compositional phrases, extending a recent unsupervised model for their detection [21] to IR. Our approach, integrated into ranking using Markov Random Fields [31], yields effectiveness gains over competitive TREC baselines, showing that there is still room for improvement in the very well-studied area of term dependence in IR

    Similarity of Semantic Relations

    Get PDF
    There are at least two kinds of similarity. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason:stone is analogous to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, and information retrieval. Recently the Vector Space Model (VSM) of information retrieval has been adapted to measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data, and (3) automatically generated synonyms are used to explore variations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying semantic relations, LRA achieves similar gains over the VSM

    The polysemy of the Spanish verb sentir: a behavioral profile analysis

    Get PDF
    This study investigates the intricate polysemy of the Spanish perception verb sentir (‘feel’) which, analogous to the more-studied visual perception verbs ver (‘see’) and mirar (‘look’), also displays an ample gamut of semantic uses in various syntactic environments. The investigation is based on a corpus-based behavioral profile (BP) analysis. Besides its methodological merits as a quantitative, systematic and verifiable approach to the study of meaning and to polysemy in particular, the BP analysis offers qualitative usage-based evidence for cognitive linguistic theorizing. With regard to the polysemy of sentir, the following questions were addressed: (1) What is the prototype of each cluster of senses? (2) How are the different senses structured: how many senses should be distinguished – i.e. which senses cluster together and which senses should be kept separately? (3) Which senses are more related to each other and which are highly distinguishable? (4) What morphosyntactic variables make them more or less distinguishable? The results show that two significant meaning clusters can be distinguished, which coincide with the division between the middle voice uses (sentirse) and the other uses (sentir). Within these clusters, a number of meaningful subclusters emerge, which seem to coincide largely with the more general semantic categories of physical, cognitive and emotional perception

    A Study Of The Discriminative Function Of Six Variables In 9-12-Year-Old Males With Learning Disabilities

    Get PDF
    Problem Greater discriminative power to clarify the diagnostic category of learning disabilities is needed. Research identifies many types of learning disabled populations. Studies do not indicate that the six variables used in this project had been combined and used in a project prior to this. Using measures such as Sentence Repeat, Synonyms, Digits Forward/Backwards, Design Copy, Nonsense Words, and Visual Pattern Matching, this project studied the responses of an LD sample to these subtests and their ability to discriminate among a verbally impaired sample, a spatially impaired sample, and a control group. Method Six subtests were developed, which, according to the literature, measured auditory discrimination and memory (Sentence Repeat); auditory and verbal comprehension and general verbal background (Synonyms); immediate auditory memory, attention, concentration, double tracking, and reversal of mental operations (Digits Forward/Backwards); visual perceptual-motor functioning (Design Copy); lexical processing (Nonsense Words); and visual memory and visual-perceptual learning (Visual Pattern Matching). The basic null hypothesis was that there is no linear combination of six variables which significantly discriminates among the three groups. The instrument was subjected to a pilot study before the final data collection took place. The data were analyzed using one-way analysis of variance, multivariate analysis of variance, and discriminant function analysis. Results Two subtests dominated in their ability to discriminate among the groups--Synonyms and Digits Forward/Backwards. Both the verbal and spatial groups were found to have shared deficits, but differed significantly from the control group on most of the measures. The null hypothesis was rejected. Conclusion The 9-12-year-old males in this sample with learning disabilities expressed deficits only in verbally related areas--specifically auditory/verbal comprehension and short-term auditory memory, attention, and concentration. Based on the literature and data gathering experience, it was also revealed that students should not be placed in LD programs based on one test, and a home visit should take place

    Debugging Relational Declarative Models with Discriminating Examples

    Get PDF
    Models, especially those with mathematical or logical foundations, have proven valuable to engineering practice in a wide range of disciplines, including software engineering. Models, sometimes also referred to as logical specifications in this context, enable software engineers to focus on essential abstractions, while eliding less important details of their software design. Like any human-created artifact, a model might have imperfections at certain stages of the design process: it might have internal inconsistencies, or it might not properly express the engineer’s design intentions. Validating that the model is a true expression of the engineer’s intent is an important and difficult problem. One of the key challenges is that there is typically no other written artifact to compare the model to: the engineer’s intention is a mental object. One successful approach to this challenge has been automated example-generation tools, such as the Alloy Analyzer. These tools produce examples (satisfying valuations of the model) for the engineer to accept or reject. These examples, along with the engineer’s judgment of them, serve as crucial written artifacts of the engineer’s true intentions. Examples, like test-cases for programs, are more valuable if they reveal a discrepancy between the expressed model and the engineer’s design intentions. We propose the idea of discriminating examples for this purpose. A discriminating example is synthesized from a combination of the engineer’s expressed model and a machine-generated hypothesis of the engineer’s true intentions. A discriminating example either satisfies the model but not the hypothesis, or satisfies the hypothesis but not the model. It shows the difference between the model and the hypothesized alternative. The key to producing high-quality discriminating examples is to generate high-quality hypotheses. This dissertation explores three general forms of such hypotheses: mistakes that happen near borders; the expressed model is stronger than the engineer intends; or the expressed model is weaker than the engineer intends. We additionally propose a number of heuristics to guide the hypothesis-generation process. We demonstrate the usefulness of discriminating examples and our hypothesis-generation techniques through a case study of an Alloy model of Dijkstra’s Dining Philosophers problem. This model was written by Alloy experts and shipped with the Alloy Analyzer for several years. Previous researchers discovered the existence of a bug, but there has been no prior published account explaining how to fix it, nor has any prior tool been shown effective for assisting an engineer with this task. Generating high-quality discriminating examples and their underlying hypotheses is computationally demanding. This dissertation shows how to make it feasible

    A learning perspective on individual differences in skilled reading: Exploring and exploiting orthographic and semantic discrimination cues

    Get PDF
    The goal of the present study is to understand the role orthographic and semantic information play in the behaviour of skilled readers. Reading latencies from a self-paced sentence reading experiment in which Russian near-synonymous verbs were manipulated appear well-predicted by a combination of bottom-up sub-lexical letter triplets (trigraphs) and top-down semantic generalizations, modelled using the Naive Discrimination Learner. The results reveal a complex interplay of bottom-up and top-down support from orthography and semantics to the target verbs, whereby activations from orthography only are modulated by individual differences. Using performance on a serial reaction time task for a novel operationalization of the mental speed hypothesis, we explain the observed individual differences in reading behaviour in terms of the exploration/exploitation hypothesis from Reinforcement Learning, where initially slower and more variable behaviour leads to better performance overall

    How does music training predict cognitive abilities? A bifactor approach to musical expertise and intelligence

    Get PDF
    Many studies have found that variation in music training is associated with intellectual abilities, but research disagrees over whether music education should primarily correlate with general intelligence (g) or with specific lower-level cognitive abilities (e.g., fluid reasoning, verbal ability, or spatial reasoning). Past research, however, has not modeled the data in ways that can separate general abilities like g from specific abilities. To examine if the associations between music training and intelligence are general, specific, or both, a bifactor modeling approach was applied to data from a sample of 237 young adults who varied substantially in musical expertise. People completed a range of tasks that measured several lower-order abilities: fluid intelligence, crystallized intelligence (vocabulary knowledge), verbal fluency, and auditory discrimination ability. Simple correlations showed that music training correlated with all 4 lower-order abilities. A bifactor model, however, found that music training had both general (a strong association with g: ß = .74 [.50, .98]) and specific (a moderate association with auditory ability: ß = .37 [.08, .67]) relationships. The findings reconcile past research on the breadth of music training’s relationships and illustrate a fruitful method for identifying its links with cognitive abilities
    • …
    corecore