2,162 research outputs found

    The Art of Mathematics Retrieval

    Get PDF
    The design and architecture of MIaS (Math Indexer and Searcher), a system for mathematics retrieval is presented, and design decisions are discussed. We argue for an approach based on Presentation MathML using a similarity of math subformulae. The system was implemented as a math-aware search engine based on the state-of-the-art system Apache Lucene. Scalability issues were checked against more than 400,000 arXiv documents with 158 million mathematical formulae. Almost three billion MathML subformulae were indexed using a Solr-compatible Lucene.V člĂĄnku je navrĆŸena architektura novĂ©ho systĂ©mu, MIaS (Math Indexer and Searcher), a nĂĄvrh je zdĆŻvodněn. Byl zvolen pƙístup zaloĆŸenĂœ na podobnosti matematickĂœch formulĂ­ v prezentačnĂ­m MathML. SystĂ©m byl implementovĂĄn a nĂĄvrh verifikovĂĄn na ĆĄiroce pouĆŸĂ­vanĂ©m indexačnĂ­m systĂ©mu Apache Lucene. Ć kĂĄlovatelnost byla ověƙena na vĂ­ce neĆŸ 400,000 odbornĂœch matematickĂœch člĂĄncĂ­ch z archivu arXiv s 158 miliony matematickĂœmi formulemi. To pƙedstavovalo indexovĂĄnĂ­ tĂ©měƙ tƙí bilionĆŻ matematickĂœch podformulĂ­ v MathML pomocĂ­ Solr-kompatibilnĂ­ho rozơíƙenĂ­ Lucene

    Large Scale Distributed Knowledge Infrastructures

    Get PDF

    Making Math Searchable in Wikipedia

    Get PDF
    Wikipedia, the world largest encyclopedia contains a lot of knowledge that is expressed as formulae exclusively. Unfortunately, this knowledge is currently not fully accessible by intelligent information retrieval systems. This immense body of knowledge is hidden form value-added services, such as search. In this paper, we present our MathSearch implementation for Wikipedia that enables users to perform a combined text and fully unlock the potential benefits.Comment: 7 pages, 2 figures, Conference on Intelligent Computer Mathematics, July 9-14 2012, Bremen, Germany. To be published in Lecture Notes, Artificial Intelligence, Springe

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

    Federating Heterogeneous Digital Libraries by Metadata Harvesting

    Get PDF
    This dissertation studies the challenges and issues faced in federating heterogeneous digital libraries (DLs) by metadata harvesting. The objective of federation is to provide high-level services (e.g. transparent search across all DLs) on the collective metadata from different digital libraries. There are two main approaches to federate DLs: distributed searching approach and harvesting approach. As the distributed searching approach replies on executing queries to digital libraries in real time, it has problems with scalability. The difficulty of creating a distributed searching service for a large federation is the motivation behind Open Archives Initiatives Protocols for Metadata Harvesting (OAI-PMH). OAI-PMH supports both data providers (repositories, archives) and service providers. Service providers develop value-added services based on the information collected from data providers. Data providers are simply collections of harvestable metadata. This dissertation examines the application of the metadata harvesting approach in DL federations. It addresses the following problems: (1) Whether or not metadata harvesting provides a realistic and scalable solution for DL federation. (2) What is the status of and problems with current data provider implementations, and how to solve these problems. (3) How to synchronize data providers and service providers. (4) How to build different types of federation services over harvested metadata. (5) How to create a scalable and reliable infrastructure to support federation services. The work done in this dissertation is based on OAI-PMH, and the results have influenced the evolution of OAI-PMH. However, the results are not limited to the scope of OAI-PMH. Our approach is to design and build key services for metadata harvesting and to deploy them on the Web. Implementing a publicly available service allows us to demonstrate how these approaches are practical. The problems posed above are evaluated by performing experiments over these services. To summarize the results of this thesis, we conclude that the metadata harvesting approach is a realistic and scalable approach to federate heterogeneous DLs. We present two models of building federation services: a centralized model and a replicated model. Our experiments also demonstrate that the repository synchronization problem can be addressed by push, pull, and hybrid push/pull models; each model has its strengths and weaknesses and fits a specific scenario. Finally, we present a scalable and reliable infrastructure to support the applications of metadata harvesting

    Self-organizing distributed digital library supporting audio-video

    Get PDF
    The StreamOnTheFly network combines peer-to-peer networking and open-archive principles for community radio channels and TV stations in Europe. StreamOnTheFly demonstrates new methods of archive management and personalization technologies for both audio and video. It also provides a collaboration platform for community purposes that suits the flexible activity patterns of these kinds of broadcaster communities

    MIaS: Math-Aware Retrieval in Digital Mathematical Libraries

    Get PDF
    Digital mathematical libraries (DMLs) such as arXiv, Numdam, and EuDML contain mainly documents from STEM fields, where mathematical formulae are often more important than text for understanding. Conventional information retrieval (IR) systems are unable to represent formulae and they are therefore ill-suited for math information retrieval (MIR). To fill the gap, we have developed, and open-sourced the MIaS MIR system. MIaS is based on the full-text search engine Apache Lucene. On top of text retrieval, MIaS also incorporates a set of tools for preprocessing mathematical formulae. We describe the design of the system and present speed, and quality evaluation results. We show that MIaS is both efficient, and effective, as evidenced by our victory in the NTCIR-11 Math-2 task

    Core Services in the Architecture of the National Digital Library for Science Education (NSDL)

    Full text link
    We describe the core components of the architecture for the (NSDL) National Science, Mathematics, Engineering, and Technology Education Digital Library. Over time the NSDL will include heterogeneous users, content, and services. To accommodate this, a design for a technical and organization infrastructure has been formulated based on the notion of a spectrum of interoperability. This paper describes the first phase of the interoperability infrastructure including the metadata repository, search and discovery services, rights management services, and user interface portal facilities

    Multimedia search without visual analysis: the value of linguistic and contextual information

    Get PDF
    This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features
    • 

    corecore