171,422 research outputs found

    Weaving Entities into Relations: From Page Retrieval to Relation Mining on the Web

    Get PDF
    With its sheer amount of information, the Web is clearly an important frontier for data mining. While Web mining must start with content on the Web, there is no effective ``search-based'' mechanism to help sifting through the information on the Web. Our goal is to provide a such online search-based facility for supporting query primitives, upon which Web mining applications can be built. As a first step, this paper aims at entity-relation discovery, or E-R discovery, as a useful function-- to weave scattered entities on the Web into coherent relations. To begin with, as our proposal, we formalize the concept of E-R discovery. Further, to realize E-R discovery, as our main thesis, we abstract tuple ranking-- the essential challenge of E-R discovery-- as pattern-based cooccurrence analysis. Finally, as our key insight, we observe that such relation mining shares the same core functions as traditional page-retrieval systems, which enables us to build the new E-R discovery upon today's search engines, almost for free. We report our system prototype and testbed, WISDM-ER, with real Web corpus. Our case studies have demonstrated a high promise, achieving 83%-91% accuracy for real benchmark queries-- and thus the real possibilities of enabling ad-hoc Web mining tasks with online E-R discovery

    mARC: Memory by Association and Reinforcement of Contexts

    Full text link
    This paper introduces the memory by Association and Reinforcement of Contexts (mARC). mARC is a novel data modeling technology rooted in the second quantization formulation of quantum mechanics. It is an all-purpose incremental and unsupervised data storage and retrieval system which can be applied to all types of signal or data, structured or unstructured, textual or not. mARC can be applied to a wide range of information clas-sification and retrieval problems like e-Discovery or contextual navigation. It can also for-mulated in the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast to Conway approach, the objects evolve in a massively multidimensional space. In order to start evaluating the potential of mARC we have built a mARC-based Internet search en-gine demonstrator with contextual functionality. We compare the behavior of the mARC demonstrator with Google search both in terms of performance and relevance. In the study we find that the mARC search engine demonstrator outperforms Google search by an order of magnitude in response time while providing more relevant results for some classes of queries

    Report of ECol Workshop Report on the First International Workshop on the Evaluation on Collaborative Information Seeking and Retrieval (ECol'2015)

    Get PDF
    Report of the ECol Workshop @ CIKM 2015The workshop on the evaluation of collaborative information retrieval and seeking (ECol) was held in conjunction with the 24 th Conference on Information and Knowledge Management (CIKM) in Melbourne, Australia. The workshop featured three main elements. First, a keynote on the main dimensions, challenges, and opportunities in collaborative information retrieval and seeking by Chirag Shah. Second, an oral presentation session in which four papers were presented. Third, a discussion based on three seed research questions: (1) In what ways is collaborative search evaluation more challenging than individual interactive information retrieval (IIIR) evaluation? (2) Would it be possible and/or useful to standardise experimental designs and data for collaborative search evaluation? and (3) For evaluating collaborative search, can we leverage ideas from other tasks such as diversified search, subtopic mining and/or e-discovery? The discussion was intense and raised many points and issues, leading to the proposition that a new evaluation track focused on collaborative information retrieval/seeking tasks, would be worthwhile

    A Linked Data representation of the Nomenclature of Territorial Units for Statistics

    No full text
    The recent publication of public sector information (PSI) data sets has brought to the attention of the scientific community the redundant presence of location based context. At the same time it stresses the inadequacy of current Linked Data services for exploiting the semantics of such contextual dimensions for easing entity retrieval and browsing. In this paper describes our approach for supporting the publication of geographical subdivisions in Linked Data format for supporting the e-government and public sector in publishing their data sets. The topological knowledge published can be reused in order to enrich the geographical context of other data sets, in particular we propose an exploitation scenario using statistical data sets described with the SCOVO ontology. The topological knowledge is then exploited within a service that supports the navigation and retrieval of statistical geographical entities for the EU territory. Geographical entities, in the extent of this paper, are linked data resources that describe objects that have a geographical extension. The data and services presented in this paper allows the discovery of resources that contain or are contained by a given entity URI and their representation within map widgets. We present an approach for a geography based service that helps in querying qualitative spatial relations for the EU statistical geography (proper containment so far). We also provide a rationale for publishing geographical information in Linked Data format based on our experience, within the EnAKTing project, in publishing UK PSI data

    Mining the Medical and Patent Literature to Support Healthcare and Pharmacovigilance

    Get PDF
    Recent advancements in healthcare practices and the increasing use of information technology in the medical domain has lead to the rapid generation of free-text data in forms of scientific articles, e-health records, patents, and document inventories. This has urged the development of sophisticated information retrieval and information extraction technologies. A fundamental requirement for the automatic processing of biomedical text is the identification of information carrying units such as the concepts or named entities. In this context, this work focuses on the identification of medical disorders (such as diseases and adverse effects) which denote an important category of concepts in the medical text. Two methodologies were investigated in this regard and they are dictionary-based and machine learning-based approaches. Futhermore, the capabilities of the concept recognition techniques were systematically exploited to build a semantic search platform for the retrieval of e-health records and patents. The system facilitates conventional text search as well as semantic and ontological searches. Performance of the adapted retrieval platform for e-health records and patents was evaluated within open assessment challenges (i.e. TRECMED and TRECCHEM respectively) wherein the system was best rated in comparison to several other competing information retrieval platforms. Finally, from the medico-pharma perspective, a strategy for the identification of adverse drug events from medical case reports was developed. Qualitative evaluation as well as an expert validation of the developed system's performance showed robust results. In conclusion, this thesis presents approaches for efficient information retrieval and information extraction from various biomedical literature sources in the support of healthcare and pharmacovigilance. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. This can promote the literature-based knowledge discovery, improve the safety and effectiveness of medical practices, and drive the research and development in medical and healthcare arena

    Closing the loop: assisting archival appraisal and information retrieval in one sweep

    Get PDF
    In this article, we examine the similarities between the concept of appraisal, a process that takes place within the archives, and the concept of relevance judgement, a process fundamental to the evaluation of information retrieval systems. More specifically, we revisit selection criteria proposed as result of archival research, and work within the digital curation communities, and, compare them to relevance criteria as discussed within information retrieval's literature based discovery. We illustrate how closely these criteria relate to each other and discuss how understanding the relationships between the these disciplines could form a basis for proposing automated selection for archival processes and initiating multi-objective learning with respect to information retrieval

    Business Process Retrieval Based on Behavioral Semantics

    Get PDF
    This paper develops a framework for retrieving business processes considering search requirements based on behavioral semantics properties; it presents a framework called "BeMantics" for retrieving business processes based on structural, linguistics, and behavioral semantics properties. The relevance of the framework is evaluated retrieving business processes from a repository, and collecting a set of relevant business processes manually issued by human judges. The "BeMantics" framework scored high precision values (0.717) but low recall values (0.558), which implies that even when the framework avoided false negatives, it prone to false positives. The highest pre- cision value was scored in the linguistic criterion showing that using semantic inference in the tasks comparison allowed to reduce around 23.6 % the number of false positives. Using semantic inference to compare tasks of business processes can improve the precision; but if the ontologies are from narrow and specific domains, they limit the semantic expressiveness obtained with ontologies from more general domains. Regarding the perform- ance, it can be improved by using a filter phase which indexes business processes taking into account behavioral semantics propertie
    • …
    corecore