233 research outputs found
A Survey on Retrieval of Mathematical Knowledge
We present a short survey of the literature on indexing and retrieval of
mathematical knowledge, with pointers to 72 papers and tentative taxonomies of
both retrieval problems and recurring techniques.Comment: CICM 2015, 20 page
Math Search for the Masses: Multimodal Search Interfaces and Appearance-Based Retrieval
We summarize math search engines and search interfaces produced by the
Document and Pattern Recognition Lab in recent years, and in particular the min
math search interface and the Tangent search engine. Source code for both
systems are publicly available. "The Masses" refers to our emphasis on creating
systems for mathematical non-experts, who may be looking to define unfamiliar
notation, or browse documents based on the visual appearance of formulae rather
than their mathematical semantics.Comment: Paper for Invited Talk at 2015 Conference on Intelligent Computer
Mathematics (July, Washington DC
Symbolic and Visual Retrieval of Mathematical Notation using Formula Graph Symbol Pair Matching and Structural Alignment
Large data collections containing millions of math formulae in different formats are available on-line. Retrieving math expressions from these collections is challenging. We propose a framework for retrieval of mathematical notation using symbol pairs extracted from visual and semantic representations of mathematical expressions on the symbolic domain for retrieval of text documents. We further adapt our model for retrieval of mathematical notation on images and lecture videos. Graph-based representations are used on each modality to describe math formulas. For symbolic formula retrieval, where the structure is known, we use symbol layout trees and operator trees. For image-based formula retrieval, since the structure is unknown we use a more general Line of Sight graph representation. Paths of these graphs define symbol pairs tuples that are used as the entries for our inverted index of mathematical notation. Our retrieval framework uses a three-stage approach with a fast selection of candidates as the first layer, a more detailed matching algorithm with similarity metric computation in the second stage, and finally when relevance assessments are available, we use an optional third layer with linear regression for estimation of relevance using multiple similarity scores for final re-ranking. Our model has been evaluated using large collections of documents, and preliminary results are presented for videos and cross-modal search. The proposed framework can be adapted for other domains like chemistry or technical diagrams where two visually similar elements from a collection are usually related to each other
Which one is better: presentation-based or content-based math search?
Mathematical content is a valuable information source and retrieving this
content has become an important issue. This paper compares two searching
strategies for math expressions: presentation-based and content-based
approaches. Presentation-based search uses state-of-the-art math search system
while content-based search uses semantic enrichment of math expressions to
convert math expressions into their content forms and searching is done using
these content-based expressions. By considering the meaning of math
expressions, the quality of search system is improved over presentation-based
systems
Recommended from our members
Using Multiple Choice Questions to Assist Learning for Information Retrieval
A key issue in teaching and learning in information retrieval – particularly for library and information science students – is the gap in prior knowledge compared with the need for mathematics to conduct and evaluate searches. In this chapter, we examine the use of online Multiple Choice Questions to support these type of students, and narrow this gap between experience and knowledge. We provide some background in terms of related work and the use of MCQ’s for assessment. The key areas of search which can be supported by this form of assessment are defined, and these are used to outline a proposed strategy for defining a series of questions to support learning
01 Text Processing 1 - Data Mining - Ingegneria e Scienze Informatiche, Cesena
dati strutturati, semi-strutturati e destrutturati, information retrieval e text mining, rappresentazione di documenti, modelli di ricerca booleani, il processo di indicizzazione di documenti, tokenizzazione, normalizzazione, lemmatizzazione, algoritmi di stemming, ricerche con indici, altre ottimizzazioni nella ricerc
- …