268,879 research outputs found

    Human information processing based information retrieval

    Get PDF
    This work focused on the investigation of the question how the concept of relevance in Information Retrieval can be validated. The work is motivated by the consistent difficulties of defining the meaning of the concept, and by advances in the field of cognitive science. Analytical and empirical investigations are carried out with the aim of devising a principled approach to the validation of the concept. The foundation for this work was set by interpreting relevance as a phenomenon occurring within the context of two systems: An IR system and the cognitive processing system of the user. In light of the cognitive interpretation of relevance, an analysis of the learnt lessons in cognitive science with regard to the validation of cognitive phenomena was conducted. It identified that construct validity constitutes the dominant approach to the validation of constructs in cognitive science. Construct validity constitutes a proposal for the conduction of validation in scenarios, where no direct observation of a phenomenon is possible. With regard to the limitations on direct observation of a construct (i.e. a postulated theoretic concept), it bases validation on the evaluation of its relations to other constructs. Based on the interpretation of relevance as a product of cognitive processing it was concluded, that the limitations with regard to direct observation apply to its investigation. The evaluation of its applicability to an IR context, focused on the exploration of the nomological network methodology. A nomological network constitutes an analytically constructed set of constructs and their relations. The construction of such a network forms the basis for establishing construct validity through investigation of the relations between constructs. An analysis focused on contemporary insights to the nomological network methodology identified two important aspects with regard to its application in IR. The first aspect is given by a choice of context and the identification of a pool of candidate constructs for the inclusion in the network. The second consists of identifying criteria for the selection of a set of constructs from the candidate pool. The identification of the pertinent constructs for the network was based on a review of the principles of cognitive exploration, and an analysis of the state of the art in text based discourse processing and reasoning. On that basis, a listing of known sub-processes contributing to the pertinent cognitive processing was presented. Based on the identification of a large number of potential candidates, the next step consisted of the inference of criteria for the selection of an initial set of constructs for the network. The investigation of these criteria focused on the consideration of pragmatic and meta-theoretical aspects. Based on a survey of experimental means in cognitive science and IR, five pragmatic criteria for the selection of constructs were presented. Consideration of meta-theoretically motivated criteria required to investigate what the specific challenges with regard to the validation of highly abstract constructs are. This question was explored based on the underlying considerations of the Information Processing paradigm and Newell’s (1994) cognitive bands. This led to the identification of a set of three meta-theoretical criteria for the selection of constructs. Based on the criteria and the demarcated candidate pool, an IR focused nomological network was defined. The network consists of the constructs of relevance and type and grade of word relatedness. A necessary prerequisite for making inferences based on a nomological network consists of the availability of validated measurement instruments for the constructs. To that cause, two validation studies targeting the measurement of the type and grade of relations between words were conducted. The clarification of the question of the validity of the measurement instruments enabled the application of the nomological network. A first step of the application consisted of testing if the constructs in the network are related to each other. Based on the alignment of measurements of relevance and the word related constructs it was concluded to be true. The relation between the constructs was characterized by varying the word related constructs over a large parameter space and observing the effect of this variation on relevance. Three hypotheses relating to different aspects of the relations between the word related constructs and relevance. It was concluded, that the conclusive confirmation of the hypotheses requires an extension of the experimental means underlying the study. Based on converging observations from the empirical investigation of the three hypotheses it was concluded, that semantic and associative relations distinctly differ with regard to their impact on relevance estimation

    Language Modeling Approaches to Information Retrieval

    Get PDF
    This article surveys recent research in the area of language modeling (sometimes called statistical language modeling) approaches to information retrieval. Language modeling is a formal probabilistic retrieval framework with roots in speech recognition and natural language processing. The underlying assumption of language modeling is that human language generation is a random process; the goal is to model that process via a generative statistical model. In this article, we discuss current research in the application of language modeling to information retrieval, the role of semantics in the language modeling framework, cluster-based language models, use of language modeling for XML retrieval and future trends

    Semantic Interaction in Web-based Retrieval Systems : Adopting Semantic Web Technologies and Social Networking Paradigms for Interacting with Semi-structured Web Data

    Get PDF
    Existing web retrieval models for exploration and interaction with web data do not take into account semantic information, nor do they allow for new forms of interaction by employing meaningful interaction and navigation metaphors in 2D/3D. This thesis researches means for introducing a semantic dimension into the search and exploration process of web content to enable a significantly positive user experience. Therefore, an inherently dynamic view beyond single concepts and models from semantic information processing, information extraction and human-machine interaction is adopted. Essential tasks for semantic interaction such as semantic annotation, semantic mediation and semantic human-computer interaction were identified and elaborated for two general application scenarios in web retrieval: Web-based Question Answering in a knowledge-based dialogue system and semantic exploration of information spaces in 2D/3D

    Multi modal multi-semantic image retrieval

    Get PDF
    PhDThe rapid growth in the volume of visual information, e.g. image, and video can overwhelm users’ ability to find and access the specific visual information of interest to them. In recent years, ontology knowledge-based (KB) image information retrieval techniques have been adopted into in order to attempt to extract knowledge from these images, enhancing the retrieval performance. A KB framework is presented to promote semi-automatic annotation and semantic image retrieval using multimodal cues (visual features and text captions). In addition, a hierarchical structure for the KB allows metadata to be shared that supports multi-semantics (polysemy) for concepts. The framework builds up an effective knowledge base pertaining to a domain specific image collection, e.g. sports, and is able to disambiguate and assign high level semantics to ‘unannotated’ images. Local feature analysis of visual content, namely using Scale Invariant Feature Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’ model (BVW) as an effective method to represent visual content information and to enhance its classification and retrieval. Local features are more useful than global features, e.g. colour, shape or texture, as they are invariant to image scale, orientation and camera angle. An innovative approach is proposed for the representation, annotation and retrieval of visual content using a hybrid technique based upon the use of an unstructured visual word and upon a (structured) hierarchical ontology KB model. The structural model facilitates the disambiguation of unstructured visual words and a more effective classification of visual content, compared to a vector space model, through exploiting local conceptual structures and their relationships. The key contributions of this framework in using local features for image representation include: first, a method to generate visual words using the semantic local adaptive clustering (SLAC) algorithm which takes term weight and spatial locations of keypoints into account. Consequently, the semantic information is preserved. Second a technique is used to detect the domain specific ‘non-informative visual words’ which are ineffective at representing the content of visual data and degrade its categorisation ability. Third, a method to combine an ontology model with xi a visual word model to resolve synonym (visual heterogeneity) and polysemy problems, is proposed. The experimental results show that this approach can discover semantically meaningful visual content descriptions and recognise specific events, e.g., sports events, depicted in images efficiently. Since discovering the semantics of an image is an extremely challenging problem, one promising approach to enhance visual content interpretation is to use any associated textual information that accompanies an image, as a cue to predict the meaning of an image, by transforming this textual information into a structured annotation for an image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct types of information representation and modality, there are some strong, invariant, implicit, connections between images and any accompanying text information. Semantic analysis of image captions can be used by image retrieval systems to retrieve selected images more precisely. To do this, a Natural Language Processing (NLP) is exploited firstly in order to extract concepts from image captions. Next, an ontology-based knowledge model is deployed in order to resolve natural language ambiguities. To deal with the accompanying text information, two methods to extract knowledge from textual information have been proposed. First, metadata can be extracted automatically from text captions and restructured with respect to a semantic model. Second, the use of LSI in relation to a domain-specific ontology-based knowledge model enables the combined framework to tolerate ambiguities and variations (incompleteness) of metadata. The use of the ontology-based knowledge model allows the system to find indirectly relevant concepts in image captions and thus leverage these to represent the semantics of images at a higher level. Experimental results show that the proposed framework significantly enhances image retrieval and leads to narrowing of the semantic gap between lower level machinederived and higher level human-understandable conceptualisation

    Gold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data

    Full text link
    Usage of online textual media is steadily increasing. Daily, more and more news stories, blog posts and scientific articles are added to the online volumes. These are all freely accessible and have been employed extensively in multiple research areas, e.g. automatic text summarization, information retrieval, information extraction, etc. Meanwhile, online debate forums have recently become popular, but have remained largely unexplored. For this reason, there are no sufficient resources of annotated debate data available for conducting research in this genre. In this paper, we collected and annotated debate data for an automatic summarization task. Similar to extractive gold standard summary generation our data contains sentences worthy to include into a summary. Five human annotators performed this task. Inter-annotator agreement, based on semantic similarity, is 36% for Cohen's kappa and 48% for Krippendorff's alpha. Moreover, we also implement an extractive summarization system for online debates and discuss prominent features for the task of summarizing online debate data automatically.Comment: accepted and presented at the CICLING 2017 - 18th International Conference on Intelligent Text Processing and Computational Linguistic

    Beyond procedure's content: Cognitive subjective experiences in procedural justice judgments

    Get PDF
    Procedural justice concerns play a critical role in economic settings, politics, and other domains of human life. Despite the vast evidence corroborating their relevance, considerably less is known about how procedural justice judgments are formed. Whereas earlier theorizing focused on the systematic integration of content information, the present contribution provides a new perspective on the formation of justice judgments by examining the influence of accessibility experiences. Specifically, we hypothesize that procedural justice judgments may be formed based on the ease or difficulty with which justice-relevant information comes to mind. Three experiments corroborate this prediction in that procedures were evaluated less positively when the retrieval of associated unfair aspects was easy compared to difficult. Presumably this is because when it feels easy (difficult) to retrieve unfair aspects, these are perceived as frequent (infrequent), and hence the procedure as unjust (just). In addition to demonstrating that ease-of-retrieval may influence justice judgments, the studies further revealed that reliance on accessibility experiences is high in conditions of personal certainty. We suggest that this is because personal uncertainty fosters systematic processing of content information, whereas personal certainty may invite less taxing judgmental strategies such as reliance on ease-of-retrieval
    corecore