2,162 research outputs found
The Art of Mathematics Retrieval
The design and architecture of MIaS (Math Indexer and Searcher), a system for mathematics retrieval is presented, and design decisions are discussed. We argue for an approach based on Presentation MathML using a similarity of math subformulae. The system was implemented as a math-aware search engine based on the state-of-the-art system Apache Lucene. Scalability issues were checked against more than 400,000 arXiv documents with 158 million mathematical formulae. Almost three billion MathML subformulae were indexed using a Solr-compatible Lucene.V ÄlĂĄnku je navrĆŸena architektura novĂ©ho systĂ©mu, MIaS (Math Indexer and Searcher), a nĂĄvrh je zdĆŻvodnÄn. Byl zvolen pĆĂstup zaloĆŸenĂœ na podobnosti matematickĂœch formulĂ v prezentaÄnĂm MathML. SystĂ©m byl implementovĂĄn a nĂĄvrh verifikovĂĄn na ĆĄiroce pouĆŸĂvanĂ©m indexaÄnĂm systĂ©mu Apache Lucene. Ć kĂĄlovatelnost byla ovÄĆena na vĂce neĆŸ 400,000 odbornĂœch matematickĂœch ÄlĂĄncĂch z archivu arXiv s 158 miliony matematickĂœmi formulemi. To pĆedstavovalo indexovĂĄnĂ tĂ©mÄĆ tĆĂ bilionĆŻ matematickĂœch podformulĂ v MathML pomocĂ Solr-kompatibilnĂho rozĆĄĂĆenĂ Lucene
Making Math Searchable in Wikipedia
Wikipedia, the world largest encyclopedia contains a lot of knowledge that is
expressed as formulae exclusively. Unfortunately, this knowledge is currently
not fully accessible by intelligent information retrieval systems. This immense
body of knowledge is hidden form value-added services, such as search. In this
paper, we present our MathSearch implementation for Wikipedia that enables
users to perform a combined text and fully unlock the potential benefits.Comment: 7 pages, 2 figures, Conference on Intelligent Computer Mathematics,
July 9-14 2012, Bremen, Germany. To be published in Lecture Notes, Artificial
Intelligence, Springe
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
Federating Heterogeneous Digital Libraries by Metadata Harvesting
This dissertation studies the challenges and issues faced in federating heterogeneous digital libraries (DLs) by metadata harvesting. The objective of federation is to provide high-level services (e.g. transparent search across all DLs) on the collective metadata from different digital libraries. There are two main approaches to federate DLs: distributed searching approach and harvesting approach. As the distributed searching approach replies on executing queries to digital libraries in real time, it has problems with scalability. The difficulty of creating a distributed searching service for a large federation is the motivation behind Open Archives Initiatives Protocols for Metadata Harvesting (OAI-PMH). OAI-PMH supports both data providers (repositories, archives) and service providers. Service providers develop value-added services based on the information collected from data providers. Data providers are simply collections of harvestable metadata. This dissertation examines the application of the metadata harvesting approach in DL federations. It addresses the following problems: (1) Whether or not metadata harvesting provides a realistic and scalable solution for DL federation. (2) What is the status of and problems with current data provider implementations, and how to solve these problems. (3) How to synchronize data providers and service providers. (4) How to build different types of federation services over harvested metadata. (5) How to create a scalable and reliable infrastructure to support federation services. The work done in this dissertation is based on OAI-PMH, and the results have influenced the evolution of OAI-PMH. However, the results are not limited to the scope of OAI-PMH. Our approach is to design and build key services for metadata harvesting and to deploy them on the Web. Implementing a publicly available service allows us to demonstrate how these approaches are practical. The problems posed above are evaluated by performing experiments over these services.
To summarize the results of this thesis, we conclude that the metadata harvesting approach is a realistic and scalable approach to federate heterogeneous DLs. We present two models of building federation services: a centralized model and a replicated model. Our experiments also demonstrate that the repository synchronization problem can be addressed by push, pull, and hybrid push/pull models; each model has its strengths and weaknesses and fits a specific scenario. Finally, we present a scalable and reliable infrastructure to support the applications of metadata harvesting
Self-organizing distributed digital library supporting audio-video
The StreamOnTheFly network combines peer-to-peer networking and open-archive principles for community radio channels and TV stations in Europe. StreamOnTheFly demonstrates new methods of archive management and personalization technologies for both audio and video. It also provides a collaboration platform for community purposes that suits the flexible activity patterns of these kinds of broadcaster communities
MIaS: Math-Aware Retrieval in Digital Mathematical Libraries
Digital mathematical libraries (DMLs) such as arXiv, Numdam, and EuDML contain mainly documents from STEM fields, where mathematical formulae are often more important than text for understanding. Conventional information retrieval (IR) systems are unable to represent formulae and they are therefore ill-suited for math information retrieval (MIR). To fill the gap, we have developed, and open-sourced the MIaS MIR system. MIaS is based on the full-text search engine Apache Lucene. On top of text retrieval, MIaS also incorporates a set of tools for preprocessing mathematical formulae. We describe the design of the system and present speed, and quality evaluation results. We show that MIaS is both efficient, and effective, as evidenced by our victory in the NTCIR-11 Math-2 task
Core Services in the Architecture of the National Digital Library for Science Education (NSDL)
We describe the core components of the architecture for the (NSDL) National
Science, Mathematics, Engineering, and Technology Education Digital Library.
Over time the NSDL will include heterogeneous users, content, and services. To
accommodate this, a design for a technical and organization infrastructure has
been formulated based on the notion of a spectrum of interoperability. This
paper describes the first phase of the interoperability infrastructure
including the metadata repository, search and discovery services, rights
management services, and user interface portal facilities
Multimedia search without visual analysis: the value of linguistic and contextual information
This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features
- âŠ