420 research outputs found

    Treatment of Semantic Heterogeneity in Information Retrieval

    Full text link
    "Nowadays, users of information services are faced with highly decentralised, heterogeneous document sources with different content analysis. Semantic heterogeneity occurs e.g. when resources using different systems for content description are searched using a single query system. This report describes several approaches of handling semantic heterogeneity used in projects of the German Social Science Information Centre." (author's abstract

    CRIS-IR 2006

    Get PDF
    The recognition of entities and their relationships in document collections is an important step towards the discovery of latent knowledge as well as to support knowledge management applications. The challenge lies on how to extract and correlate entities, aiming to answer key knowledge management questions, such as; who works with whom, on which projects, with which customers and on what research areas. The present work proposes a knowledge mining approach supported by information retrieval and text mining tasks in which its core is based on the correlation of textual elements through the LRD (Latent Relation Discovery) method. Our experiments show that LRD outperform better than other correlation methods. Also, we present an application in order to demonstrate the approach over knowledge management scenarios.Fundação para a Ciência e a Tecnologia (FCT) Denmark's Electronic Research Librar

    Text mining of biomedical literature: discovering new knowledge

    Get PDF
    Biomedical literature is increasing day by day. The present scenario shows that the volume of literature regarding “coronavirus” has expanded at a high rate. In this study, text mining technique has been employed to discover something new from the published literature. The main objectives of this study are to show the growth of literature (Jan-Jun, 2020), extract document section, identify latent topics, find the most frequent word, represent the bag of words, and the hierarchical clustering. We have collected 16500 documents from PubMed. This study finds most number of documents (11499) belong to May and June. We explore “betacoronavirus” as the leading document section (3837); “covid” (29890) as the most frequent word in the abstracts; and positive-negative weights of topics. Further, we measure the term frequency (TF) of a document title in the bag of words model. Then we compute a hierarchical clustering of document titles. It reveals that the lowest distance the selected cluster (C133) is 0.30. We also have made a discussion over future prospects and mentioned that this paper can be useful to researchers and library professionals for knowledge management

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Extending the 5S Framework of Digital Libraries to support Complex Objects, Superimposed Information, and Content-Based Image Retrieval Services

    Get PDF
    Advanced services in digital libraries (DLs) have been developed and widely used to address the required capabilities of an assortment of systems as DLs expand into diverse application domains. These systems may require support for images (e.g., Content-Based Image Retrieval), Complex (information) Objects, and use of content at fine grain (e.g., Superimposed Information). Due to the lack of consensus on precise theoretical definitions for those services, implementation efforts often involve ad hoc development, leading to duplication and interoperability problems. This article presents a methodology to address those problems by extending a precisely specified minimal digital library (in the 5S framework) with formal definitions of aforementioned services. The theoretical extensions of digital library functionality presented here are reinforced with practical case studies as well as scenarios for the individual and integrative use of services to balance theory and practice. This methodology has implications that other advanced services can be continuously integrated into our current extended framework whenever they are identified. The theoretical definitions and case study we present may impact future development efforts and a wide range of digital library researchers, designers, and developers

    Literary texts in an electronic age: Scholarly implications and library services [papers presented at the 1994 Clinic on Library applications of Data Processing, April 10-12, 1994]

    Get PDF
    Authors and readers in an age of electronic texts / Jay David Bolter -- Electronic texts in the humanities : a coming of age / Susan Hockey -- The Text Encoding Initiative : electronic text markup for research / C.M. Sperberg-McQueen -- Electronic texts and multimedia in the academic library : a view from the front line / Anita K. Lowry -- Humanizing information technology : cultural evolution and the institutionalization of electronic text processing / Mark Tyler Day -- Cohabiting with copyright on the nets / Mary Brandt Jensen -- The role of the scholarly publisher in an electronic environment / Lorrie LeJeune -- The feasibility of wide-area textual analysis systems in libraries : a practical analysis / John Price-Wilkin -- The scholar and his library in the computer age / James W. Marchand -- The challenges of electronic texts in the library : bibliographic control and access / Rebecca S. Guenther -- Durkheim???s imperative : the role of humanities faculty in the information technologies revolution / Robert Alun Jones -- The materiality of the book : another turn of the screw / Terry Belanger.published or submitted for publicatio

    Metalexicography as Knowledge Graph

    Get PDF
    This short paper presents preliminary considerations regarding LexBib, a corpus, bibliography, and domain ontology of Lexicography and Dictionary Research, which is currently being developed at University of Hildesheim. The LexBib project is intended to provide a bibliographic metadata collection made available through an online reference platform. The corresponding full texts are processed with text mining methods for the generation of additional metadata, such as term candidates, topic models, and citations. All LexBib content is represented and also publicly accessible as RDF Linked Open Data. We discuss a data model that includes metadata for publication details and for the text mining results, and that considers relevant standards for an integration into the LOD cloud

    From Index Locorum to Citation Network: an Approach to the Automatic Extraction of Canonical References and its Applications to the Study of Classical Texts

    Get PDF
    My research focusses on the automatic extraction of canonical references from publications in Classics. Such references are the standard way of citing classical texts and are found in great numbers throughout monographs, journal articles and commentaries. In chapters 1 and 2 I argue for the importance of canonical citations and for the need to capture them automatically. Their importance and function is to signal text passages that are studied and discussed, often in relation to one another as can be seen in parallel passages found in modern commentaries. Scholars in the field have long been exploiting this kind of information by manually creating indexes of cited passages, the so-called indices locorum. However, the challenge we now face is find new ways of indexing and retrieving information contained in the growing volume of digital archives and libraries. Chapters 3 and 4 look at how this problem can be tackled by translating the extraction of canonical citations into a computationally solvable problem. The approach I developed consists of treating the extraction of such citations as a problem of named entity extraction. This problem can be solved with some degree of accuracy by applying and adapting methods of Natural Language Processing. In this part of the dissertation I discuss the implementation of this approach as a working prototype and an evaluation of its performance. Once canonical references have been extracted from texts, the web of relations between documents that they create can be represented as a network. This network can then be searched, manipulated, visualised and analysed in various ways. In chapter 5 I focus specifically on how this network can be leveraged to search through bodies of secondary literature. Finally in chapter 6 I discuss how my work opens up new research perspectives in terms of visualisation, analysis and the application of such automatically extracted citation networks

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
    • …
    corecore