97,637 research outputs found
Making Math Searchable in Wikipedia
Wikipedia, the world largest encyclopedia contains a lot of knowledge that is
expressed as formulae exclusively. Unfortunately, this knowledge is currently
not fully accessible by intelligent information retrieval systems. This immense
body of knowledge is hidden form value-added services, such as search. In this
paper, we present our MathSearch implementation for Wikipedia that enables
users to perform a combined text and fully unlock the potential benefits.Comment: 7 pages, 2 figures, Conference on Intelligent Computer Mathematics,
July 9-14 2012, Bremen, Germany. To be published in Lecture Notes, Artificial
Intelligence, Springe
Information Compression, Intelligence, Computing, and Mathematics
This paper presents evidence for the idea that much of artificial
intelligence, human perception and cognition, mainstream computing, and
mathematics, may be understood as compression of information via the matching
and unification of patterns. This is the basis for the "SP theory of
intelligence", outlined in the paper and fully described elsewhere. Relevant
evidence may be seen: in empirical support for the SP theory; in some
advantages of information compression (IC) in terms of biology and engineering;
in our use of shorthands and ordinary words in language; in how we merge
successive views of any one thing; in visual recognition; in binocular vision;
in visual adaptation; in how we learn lexical and grammatical structures in
language; and in perceptual constancies. IC via the matching and unification of
patterns may be seen in both computing and mathematics: in IC via equations; in
the matching and unification of names; in the reduction or removal of
redundancy from unary numbers; in the workings of Post's Canonical System and
the transition function in the Universal Turing Machine; in the way computers
retrieve information from memory; in systems like Prolog; and in the
query-by-example technique for information retrieval. The chunking-with-codes
technique for IC may be seen in the use of named functions to avoid repetition
of computer code. The schema-plus-correction technique may be seen in functions
with parameters and in the use of classes in object-oriented programming. And
the run-length coding technique may be seen in multiplication, in division, and
in several other devices in mathematics and computing. The SP theory resolves
the apparent paradox of "decompression by compression". And computing and
cognition as IC is compatible with the uses of redundancy in such things as
backup copies to safeguard data and understanding speech in a noisy
environment
Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context
Mathematical formulae represent complex semantic information in a concise
form. Especially in Science, Technology, Engineering, and Mathematics,
mathematical formulae are crucial to communicate information, e.g., in
scientific papers, and to perform computations using computer algebra systems.
Enabling computers to access the information encoded in mathematical formulae
requires machine-readable formats that can represent both the presentation and
content, i.e., the semantics, of formulae. Exchanging such information between
systems additionally requires conversion methods for mathematical
representation formats. We analyze how the semantic enrichment of formulae
improves the format conversion process and show that considering the textual
context of formulae reduces the error rate of such conversions. Our main
contributions are: (1) providing an openly available benchmark dataset for the
mathematical format conversion task consisting of a newly created test
collection, an extensive, manually curated gold standard and task-specific
evaluation metrics; (2) performing a quantitative evaluation of
state-of-the-art tools for mathematical format conversions; (3) presenting a
new approach that considers the textual context of formulae to reduce the error
rate for mathematical format conversions. Our benchmark dataset facilitates
future research on mathematical format conversions as well as research on many
problems in mathematical information retrieval. Because we annotated and linked
all components of formulae, e.g., identifiers, operators and other entities, to
Wikidata entries, the gold standard can, for instance, be used to train methods
for formula concept discovery and recognition. Such methods can then be applied
to improve mathematical information retrieval systems, e.g., for semantic
formula search, recommendation of mathematical content, or detection of
mathematical plagiarism.Comment: 10 pages, 4 figure
Pedagogic challenges in Information Retrieval – teaching mathematics to Postgraduate Information Science students
Understanding of mathematics is needed to underpin the process of search, either explicitly with Exact Match (Boolean logic, adjacency) or implicitly with Best match natural language search. In this paper I outline some pedagogical challenges in teaching mathematics for information retrieval to postgraduate information science students. The aim is to take these challenges either found by experience or in the literature, to identify both theoretical and practical ideas in order to improve the delivery of the material and positively affect the learning of the target audience. Some ideas are put forward to resolve these issues and to promote discussion
VMEXT: A Visualization Tool for Mathematical Expression Trees
Mathematical expressions can be represented as a tree consisting of terminal
symbols, such as identifiers or numbers (leaf nodes), and functions or
operators (non-leaf nodes). Expression trees are an important mechanism for
storing and processing mathematical expressions as well as the most frequently
used visualization of the structure of mathematical expressions. Typically,
researchers and practitioners manually visualize expression trees using
general-purpose tools. This approach is laborious, redundant, and error-prone.
Manual visualizations represent a user's notion of what the markup of an
expression should be, but not necessarily what the actual markup is. This paper
presents VMEXT - a free and open source tool to directly visualize expression
trees from parallel MathML. VMEXT simultaneously visualizes the presentation
elements and the semantic structure of mathematical expressions to enable users
to quickly spot deficiencies in the Content MathML markup that does not affect
the presentation of the expression. Identifying such discrepancies previously
required reading the verbose and complex MathML markup. VMEXT also allows one
to visualize similar and identical elements of two expressions. Visualizing
expression similarity can support support developers in designing retrieval
approaches and enable improved interaction concepts for users of mathematical
information retrieval systems. We demonstrate VMEXT's visualizations in two
web-based applications. The first application presents the visualizations
alone. The second application shows a possible integration of the
visualizations in systems for mathematical knowledge management and
mathematical information retrieval. The application converts LaTeX input to
parallel MathML, computes basic similarity measures for mathematical
expressions, and visualizes the results using VMEXT.Comment: 15 pages, 4 figures, Intelligent Computer Mathematics - 10th
International Conference CICM 2017, Edinburgh, UK, July 17-21, 2017,
Proceeding
A Graph theoretical approach to study the organization of the cortical networks during different mathematical tasks.
The two core systems of mathematical processing (subitizing and retrieval) as well as their functionality are already known and published. In this study we have used graph theory to compare the brain network organization of these two core systems in the cortical layer during difficult calculations. We have examined separately all the EEG frequency bands in healthy young individuals and we found that the network organization at rest, as well as during mathematical tasks has the characteristics of Small World Networks for all the bands, which is the optimum organization required for efficient information processing. The different mathematical stimuli provoked changes in the graph parameters of different frequency bands, especially the low frequency bands. More specific, in Delta band the induced network increases it's local and global efficiency during the transition from subitizing to retrieval system, while results suggest that difficult mathematics provoke networks with higher cliquish organization due to more specific demands. The network of the Theta band follows the same pattern as before, having high nodal and remote organization during difficult mathematics. Also the spatial distribution of the network's weights revealed more prominent connections in frontoparietal regions, revealing the working memory load due to the engagement of the retrieval system. The cortical networks of the alpha brainwaves were also more efficient, both locally and globally, during difficult mathematics, while the fact that alpha's network was more dense on the frontparietal regions as well, reveals the engagement of the retrieval system again. Concluding, this study gives more evidences regarding the interaction of the two core systems, exploiting the produced functional networks of the cerebral cortex, especially for the difficult mathematics
Extended Combinatorial Constructions for Peer-to-peer User-Private Information Retrieval
We consider user-private information retrieval (UPIR), an interesting
alternative to private information retrieval (PIR) introduced by Domingo-Ferrer
et al. In UPIR, the database knows which records have been retrieved, but does
not know the identity of the query issuer. The goal of UPIR is to disguise user
profiles from the database. Domingo-Ferrer et al.\ focus on using a
peer-to-peer community to construct a UPIR scheme, which we term P2P UPIR. In
this paper, we establish a strengthened model for P2P UPIR and clarify the
privacy goals of such schemes using standard terminology from the field of
privacy research. In particular, we argue that any solution providing privacy
against the database should attempt to minimize any corresponding loss of
privacy against other users. We give an analysis of existing schemes, including
a new attack by the database. Finally, we introduce and analyze two new
protocols. Whereas previous work focuses on a special type of combinatorial
design known as a configuration, our protocols make use of more general
designs. This allows for flexibility in protocol set-up, allowing for a choice
between having a dynamic scheme (in which users are permitted to enter and
leave the system), or providing increased privacy against other users.Comment: Updated version, which reflects reviewer comments and includes
expanded explanations throughout. Paper is accepted for publication by
Advances in Mathematics of Communication
Recommended from our members
Teaching mathematics for search using a tutorial style of delivery
Understanding of mathematics is needed to underpin the process of search, either explicitly with Exact Match (Boolean logic, adjacency) or implicitly with Best match natural language search. In this paper we outline some pedagogical challenges in teaching mathematics for information retrieval (IR) to postgraduate information science students. The aim is to take these challenges either found by experience or in the literature, to identify both theoretical and practical ideas in order to improve the delivery of the material and positively affect the learning of the target audience by using a tutorial style of teaching. Results show that there is evidence to support the notion that a more pro-active style of teaching using tutorials yield benefits both in terms of assessment results and student satisfaction
- …