2,834 research outputs found

    Interactive Search and Exploration in Online Discussion Forums Using Multimodal Embeddings

    Get PDF
    In this paper we present a novel interactive multimodal learning system, which facilitates search and exploration in large networks of social multimedia users. It allows the analyst to identify and select users of interest, and to find similar users in an interactive learning setting. Our approach is based on novel multimodal representations of users, words and concepts, which we simultaneously learn by deploying a general-purpose neural embedding model. We show these representations to be useful not only for categorizing users, but also for automatically generating user and community profiles. Inspired by traditional summarization approaches, we create the profiles by selecting diverse and representative content from all available modalities, i.e. the text, image and user modality. The usefulness of the approach is evaluated using artificial actors, which simulate user behavior in a relevance feedback scenario. Multiple experiments were conducted in order to evaluate the quality of our multimodal representations, to compare different embedding strategies, and to determine the importance of different modalities. We demonstrate the capabilities of the proposed approach on two different multimedia collections originating from the violent online extremism forum Stormfront and the microblogging platform Twitter, which are particularly interesting due to the high semantic level of the discussions they feature

    A model for information retrieval driven by conceptual spaces

    Get PDF
    A retrieval model describes the transformation of a query into a set of documents. The question is: what drives this transformation? For semantic information retrieval type of models this transformation is driven by the content and structure of the semantic models. In this case, Knowledge Organization Systems (KOSs) are the semantic models that encode the meaning employed for monolingual and cross-language retrieval. The focus of this research is the relationship between these meanings’ representations and their role and potential in augmenting existing retrieval models effectiveness. The proposed approach is unique in explicitly interpreting a semantic reference as a pointer to a concept in the semantic model that activates all its linked neighboring concepts. It is in fact the formalization of the information retrieval model and the integration of knowledge resources from the Linguistic Linked Open Data cloud that is distinctive from other approaches. The preprocessing of the semantic model using Formal Concept Analysis enables the extraction of conceptual spaces (formal contexts)that are based on sub-graphs from the original structure of the semantic model. The types of conceptual spaces built in this case are limited by the KOSs structural relations relevant to retrieval: exact match, broader, narrower, and related. They capture the definitional and relational aspects of the concepts in the semantic model. Also, each formal context is assigned an operational role in the flow of processes of the retrieval system enabling a clear path towards the implementations of monolingual and cross-lingual systems. By following this model’s theoretical description in constructing a retrieval system, evaluation results have shown statistically significant results in both monolingual and bilingual settings when no methods for query expansion were used. The test suite was run on the Cross-Language Evaluation Forum Domain Specific 2004-2006 collection with additional extensions to match the specifics of this model

    Combining a co-occurrence-based and a semantic measure for entity linking

    Get PDF
    One key feature of the Semantic Web lies in the ability to link related Web resources. However, while relations within particular datasets are often well-defined, links between disparate datasets and corpora of Web resources are rare. The increasingly widespread use of cross-domain reference datasets, such as Freebase and DBpedia for annotating and enriching datasets as well as documents, opens up opportunities to exploit their inherent semantic relationships to align disparate Web resources. In this paper, we present a combined approach to uncover relationships between disparate entities which exploits (a) graph analysis of reference datasets together with (b) entity co-occurrence on the Web with the help of search engines. In (a), we introduce a novel approach adopted and applied from social network theory to measure the connectivity between given entities in reference datasets. The connectivity measures are used to identify connected Web resources. Finally, we present a thorough evaluation of our approach using a publicly available dataset and introduce a comparison with established measures in the field. The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-38288-8_37

    Expertise Profiling in Evolving Knowledgecuration Platforms

    Get PDF
    Expertise modeling has been the subject of extensiveresearch in two main disciplines: Information Retrieval (IR) andSocial Network Analysis (SNA). Both IR and SNA approachesbuild the expertise model through a document-centric approachproviding a macro-perspective on the knowledge emerging fromlarge corpus of static documents. With the emergence of the Webof Data there has been a significant shift from static to evolvingdocuments, through micro-contributions. Thus, the existingmacro-perspective is no longer sufficient to track the evolution ofboth knowledge and expertise. In this paper we present acomprehensive, domain-agnostic model for expertise profiling inthe context of dynamic, living documents and evolving knowledgebases. We showcase its application in the biomedical domain andanalyze its performance using two manually created datasets

    Ontology driven information retrieval.

    Get PDF
    Ontology-driven information retrieval deals with the use of entities specified in domain ontologies to enhance search and browse. The entities or concepts of lightweight ontological resources are traditionally used to index resources in specialised domains. Indexing with concepts is often achieved manually and reusing them to enhance search remains a challenge. Other challenges range from the difficulty in merging multiple ontologies for use in retrieval to the problem of integrating concept-based search into existing search systems. We mainly encounter these challenges in enterprise search environments, which have not kept pace with Web search engines and mostly rely on full-text search systems. Full-text search systems are keyword-based and suffer from well-known vocabulary mismatch problems. Ontologies model domain knowledge and have the potential for use in understanding the unstructured content of documents. In this thesis, we investigate the challenges of using domain ontologies for enhancing search in enterprise systems. Firstly, we investigate methods for annotating documents by identifying the best concepts that represent their contents. We explore ways to overcome the challenges of insufficient textual features in lightweight ontologies and introduce an unsupervised method for annotating documents based on generating concept descriptors from external resources. Specifically, we augment concepts with descriptive textual content by exploiting the taxonomic structure of an ontology to ensure that we generate useful descriptors. Secondly, the need often arises for cross-ontology reasoning when using multiple ontologies in ontology-driven search. Once again, we attempt to overcome the absence of rich features in lightweight ontologies by exploring the use of background knowledge for the alignment process. We propose novel ontology alignment techniques which integrate string metrics, semantic features, and term weights for discovering diverse correspondence types in supervised and unsupervised ontology alignment. Thirdly, we investigate different representational schemes for queries and documents and explore semantic ranking models using conceptual representations. Accordingly, we propose a semantic ranking model that incorporates the knowledge of concept relatedness and a predictive model to apply semantic ranking only when it is deemed beneficial for retrieval. Finally, we conduct comprehensive evaluations of the proposed methods and discuss our findings

    A customized semantic service retrieval methodology for the digital ecosystems environment

    Get PDF
    With the emergence of the Web and its pervasive intrusion on individuals, organizations, businesses etc., people now realize that they are living in a digital environment analogous to the ecological ecosystem. Consequently, no individual or organization can ignore the huge impact of the Web on social well-being, growth and prosperity, or the changes that it has brought about to the world economy, transforming it from a self-contained, isolated, and static environment to an open, connected, dynamic environment. Recently, the European Union initiated a research vision in relation to this ubiquitous digital environment, known as Digital (Business) Ecosystems. In the Digital Ecosystems environment, there exist ubiquitous and heterogeneous species, and ubiquitous, heterogeneous, context-dependent and dynamic services provided or requested by species. Nevertheless, existing commercial search engines lack sufficient semantic supports, which cannot be employed to disambiguate user queries and cannot provide trustworthy and reliable service retrieval. Furthermore, current semantic service retrieval research focuses on service retrieval in the Web service field, which cannot provide requested service retrieval functions that take into account the features of Digital Ecosystem services. Hence, in this thesis, we propose a customized semantic service retrieval methodology, enabling trustworthy and reliable service retrieval in the Digital Ecosystems environment, by considering the heterogeneous, context-dependent and dynamic nature of services and the heterogeneous and dynamic nature of service providers and service requesters in Digital Ecosystems.The customized semantic service retrieval methodology comprises: 1) a service information discovery, annotation and classification methodology; 2) a service retrieval methodology; 3) a service concept recommendation methodology; 4) a quality of service (QoS) evaluation and service ranking methodology; and 5) a service domain knowledge updating, and service-provider-based Service Description Entity (SDE) metadata publishing, maintenance and classification methodology.The service information discovery, annotation and classification methodology is designed for discovering ubiquitous service information from the Web, annotating the discovered service information with ontology mark-up languages, and classifying the annotated service information by means of specific service domain knowledge, taking into account the heterogeneous and context-dependent nature of Digital Ecosystem services and the heterogeneous nature of service providers. The methodology is realized by the prototype of a Semantic Crawler, the aim of which is to discover service advertisements and service provider profiles from webpages, and annotating the information with service domain ontologies.The service retrieval methodology enables service requesters to precisely retrieve the annotated service information, taking into account the heterogeneous nature of Digital Ecosystem service requesters. The methodology is presented by the prototype of a Service Search Engine. Since service requesters can be divided according to the group which has relevant knowledge with regard to their service requests, and the group which does not have relevant knowledge with regard to their service requests, we respectively provide two different service retrieval modules. The module for the first group enables service requesters to directly retrieve service information by querying its attributes. The module for the second group enables service requesters to interact with the search engine to denote their queries by means of service domain knowledge, and then retrieve service information based on the denoted queries.The service concept recommendation methodology concerns the issue of incomplete or incorrect queries. The methodology enables the search engine to recommend relevant concepts to service requesters, once they find that the service concepts eventually selected cannot be used to denote their service requests. We premise that there is some extent of overlap between the selected concepts and the concepts denoting service requests, as a result of the impact of service requesters’ understandings of service requests on the selected concepts by a series of human-computer interactions. Therefore, a semantic similarity model is designed that seeks semantically similar concepts based on selected concepts.The QoS evaluation and service ranking methodology is proposed to allow service requesters to evaluate the trustworthiness of a service advertisement and rank retrieved service advertisements based on their QoS values, taking into account the contextdependent nature of services in Digital Ecosystems. The core of this methodology is an extended CCCI (Correlation of Interaction, Correlation of Criterion, Clarity of Criterion, and Importance of Criterion) metrics, which allows a service requester to evaluate the performance of a service provider in a service transaction based on QoS evaluation criteria in a specific service domain. The evaluation result is then incorporated with the previous results to produce the eventual QoS value of the service advertisement in a service domain. Service requesters can rank service advertisements by considering their QoS values under each criterion in a service domain.The methodology for service domain knowledge updating, service-provider-based SDE metadata publishing, maintenance, and classification is initiated to allow: 1) knowledge users to update service domain ontologies employed in the service retrieval methodology, taking into account the dynamic nature of services in Digital Ecosystems; and 2) service providers to update their service profiles and manually annotate their published service advertisements by means of service domain knowledge, taking into account the dynamic nature of service providers in Digital Ecosystems. The methodology for service domain knowledge updating is realized by a voting system for any proposals for changes in service domain knowledge, and by assigning different weights to the votes of domain experts and normal users.In order to validate the customized semantic service retrieval methodology, we build a prototype – a Customized Semantic Service Search Engine. Based on the prototype, we test the mathematical algorithms involved in the methodology by a simulation approach and validate the proposed functions of the methodology by a functional testing approach

    Use of Semantic Technology to Create Curated Data Albums

    Get PDF
    One of the continuing challenges in any Earth science investigation is the discovery and access of useful science content from the increasingly large volumes of Earth science data and related information available online. Current Earth science data systems are designed with the assumption that researchers access data primarily by instrument or geophysical parameter. Those who know exactly the data sets they need can obtain the specific files using these systems. However, in cases where researchers are interested in studying an event of research interest, they must manually assemble a variety of relevant data sets by searching the different distributed data systems. Consequently, there is a need to design and build specialized search and discovery tools in Earth science that can filter through large volumes of distributed online data and information and only aggregate the relevant resources needed to support climatology and case studies. This paper presents a specialized search and discovery tool that automatically creates curated Data Albums. The tool was designed to enable key elements of the search process such as dynamic interaction and sense-making. The tool supports dynamic interaction via different modes of interactivity and visual presentation of information. The compilation of information and data into a Data Album is analogous to a shoebox within the sense-making framework. This tool automates most of the tedious information/data gathering tasks for researchers. Data curation by the tool is achieved via an ontology-based, relevancy ranking algorithm that filters out non-relevant information and data. The curation enables better search results as compared to the simple keyword searches provided by existing data systems in Earth science

    Use of Semantic Technology to Create Curated Data Albums

    Get PDF
    One of the continuing challenges in any Earth science investigation is the discovery and access of useful science content from the increasingly large volumes of Earth science data and related information available online. Current Earth science data systems are designed with the assumption that researchers access data primarily by instrument or geophysical parameter. Those who know exactly the data sets they need can obtain the specific files using these systems. However, in cases where researchers are interested in studying an event of research interest, they must manually assemble a variety of relevant data sets by searching the different distributed data systems. Consequently, there is a need to design and build specialized search and discover tools in Earth science that can filter through large volumes of distributed online data and information and only aggregate the relevant resources needed to support climatology and case studies. This paper presents a specialized search and discovery tool that automatically creates curated Data Albums. The tool was designed to enable key elements of the search process such as dynamic interaction and sense-making. The tool supports dynamic interaction via different modes of interactivity and visual presentation of information. The compilation of information and data into a Data Album is analogous to a shoebox within the sense-making framework. This tool automates most of the tedious information/data gathering tasks for researchers. Data curation by the tool is achieved via an ontology-based, relevancy ranking algorithm that filters out nonrelevant information and data. The curation enables better search results as compared to the simple keyword searches provided by existing data systems in Earth science

    Multimedia ontology matching by using visual and textual modalities

    Get PDF
    International audienceOntologies have been intensively applied for improving multimedia search and retrieval by providing explicit meaning to visual content. Several multimedia ontologies have been recently proposed as knowledge models suitable for narrowing the well known semantic gap and for enabling the semantic interpretation of images. Since these ontologies have been created in different application contexts, establishing links between them, a task known as ontology matching, promises to fully unlock their potential in support of multimedia search and retrieval. This paper proposes and compares empirically two extensional ontology matching techniques applied to an important semantic image retrieval issue: automatically associating common-sense knowledge to multimedia concepts. First, we extend a previously introduced textual concept matching approach to use both textual and visual representation of images. In addition, a novel matching technique based on a multi-modal graph is proposed. We argue that the textual and visual modalities have to be seen as complementary rather than as exclusive sources of extensional information in order to improve the efficiency of the application of an ontology matching approach in the multimedia domain. An experimental evaluation is included in the paper
    • …
    corecore