20,785 research outputs found

    Improving approximation of domain-focused, corpus-based, lexical semantic relatedness

    Get PDF
    Semantic relatedness is a measure that quantifies the strength of a semantic link between two concepts. Often, it can be efficiently approximated with methods that operate on words, which represent these concepts. Approximating semantic relatedness between texts and concepts represented by these texts is an important part of many text and knowledge processing tasks of crucial importance in many domain-specific scenarios. The problem of most state-of-the-art methods for calculating domain-specific semantic relatedness is their dependence on highly specialized, structured knowledge resources, which makes these methods poorly adaptable for many usage scenarios. On the other hand, the domain knowledge in the fields such as Life Sciences has become more and more accessible, but mostly in its unstructured form - as texts in large document collections, which makes its use more challenging for automated processing. In this dissertation, three new corpus-based methods for approximating domain-specific textual semantic relatedness are presented and evaluated with a set of standard benchmarks focused on the field of biomedicine. Nonetheless, the proposed measures are general enough to be adapted to other domain-focused scenarios. The evaluation involves comparisons with other relevant state-of-the-art measures for calculating semantic relatedness and the results suggest that the methods presented here perform comparably or better than other approaches. Additionally, the dissertation also presents an experiment, in which one of the proposed methods is applied within an ontology matching system, DisMatch. The performance of the system was evaluated externally on a biomedically themed ‘Phenotype’ track of the Ontology Alignment Evaluation Initiative 2016 campaign. The results of the track indicate, that the use distributional semantic relatedness for ontology matching is promising, as the system presented in this thesis did stand out in detecting correct mappings that were not detected by any other systems participating in the track. The work presented in the dissertation indicates an improvement achieved w.r.t. the stat-of-the-art through the domain adapted use of the distributional principle (i.e. the presented methods are corpus-based and do not require additional resources). The ontology matching experiment showcases practical implications of the presented theoretical body of work

    Learning Relatedness Measures for Entity Linking

    Get PDF
    Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowl- edge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant enti- ties selected for annotation, since this minimizes errors in disambiguating entity-linking. The definition of an e↵ective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high-quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of dif- ferent state-of-the-art entity-linking algorithms

    Optical tomography: Image improvement using mixed projection of parallel and fan beam modes

    Get PDF
    Mixed parallel and fan beam projection is a technique used to increase the quality images. This research focuses on enhancing the image quality in optical tomography. Image quality can be defined by measuring the Peak Signal to Noise Ratio (PSNR) and Normalized Mean Square Error (NMSE) parameters. The findings of this research prove that by combining parallel and fan beam projection, the image quality can be increased by more than 10%in terms of its PSNR value and more than 100% in terms of its NMSE value compared to a single parallel beam

    The MeSH-gram Neural Network Model: Extending Word Embedding Vectors with MeSH Concepts for UMLS Semantic Similarity and Relatedness in the Biomedical Domain

    Full text link
    Eliciting semantic similarity between concepts in the biomedical domain remains a challenging task. Recent approaches founded on embedding vectors have gained in popularity as they risen to efficiently capture semantic relationships The underlying idea is that two words that have close meaning gather similar contexts. In this study, we propose a new neural network model named MeSH-gram which relies on a straighforward approach that extends the skip-gram neural network model by considering MeSH (Medical Subject Headings) descriptors instead words. Trained on publicly available corpus PubMed MEDLINE, MeSH-gram is evaluated on reference standards manually annotated for semantic similarity. MeSH-gram is first compared to skip-gram with vectors of size 300 and at several windows contexts. A deeper comparison is performed with tewenty existing models. All the obtained results of Spearman's rank correlations between human scores and computed similarities show that MeSH-gram outperforms the skip-gram model, and is comparable to the best methods but that need more computation and external resources.Comment: 6 pages, 2 table

    An affect-based video retrieval system with open vocabulary querying

    Get PDF
    Content-based video retrieval systems (CBVR) are creating new search and browse capabilities using metadata describing significant features of the data. An often overlooked aspect of human interpretation of multimedia data is the affective dimension. Incorporating affective information into multimedia metadata can potentially enable search using this alternative interpretation of multimedia content. Recent work has described methods to automatically assign affective labels to multimedia data using various approaches. However, the subjective and imprecise nature of affective labels makes it difficult to bridge the semantic gap between system-detected labels and user expression of information requirements in multimedia retrieval. We present a novel affect-based video retrieval system incorporating an open-vocabulary query stage based on WordNet enabling search using an unrestricted query vocabulary. The system performs automatic annotation of video data with labels of well defined affective terms. In retrieval annotated documents are ranked using the standard Okapi retrieval model based on open-vocabulary text queries. We present experimental results examining the behaviour of the system for retrieval of a collection of automatically annotated feature films of different genres. Our results indicate that affective annotation can potentially provide useful augmentation to more traditional objective content description in multimedia retrieval

    Mining Missing Hyperlinks from Human Navigation Traces: A Case Study of Wikipedia

    Full text link
    Hyperlinks are an essential feature of the World Wide Web. They are especially important for online encyclopedias such as Wikipedia: an article can often only be understood in the context of related articles, and hyperlinks make it easy to explore this context. But important links are often missing, and several methods have been proposed to alleviate this problem by learning a linking model based on the structure of the existing links. Here we propose a novel approach to identifying missing links in Wikipedia. We build on the fact that the ultimate purpose of Wikipedia links is to aid navigation. Rather than merely suggesting new links that are in tune with the structure of existing links, our method finds missing links that would immediately enhance Wikipedia's navigability. We leverage data sets of navigation paths collected through a Wikipedia-based human-computation game in which users must find a short path from a start to a target article by only clicking links encountered along the way. We harness human navigational traces to identify a set of candidates for missing links and then rank these candidates. Experiments show that our procedure identifies missing links of high quality
    corecore