97,637 research outputs found

    Making Math Searchable in Wikipedia

    Get PDF
    Wikipedia, the world largest encyclopedia contains a lot of knowledge that is expressed as formulae exclusively. Unfortunately, this knowledge is currently not fully accessible by intelligent information retrieval systems. This immense body of knowledge is hidden form value-added services, such as search. In this paper, we present our MathSearch implementation for Wikipedia that enables users to perform a combined text and fully unlock the potential benefits.Comment: 7 pages, 2 figures, Conference on Intelligent Computer Mathematics, July 9-14 2012, Bremen, Germany. To be published in Lecture Notes, Artificial Intelligence, Springe

    Information Compression, Intelligence, Computing, and Mathematics

    Full text link
    This paper presents evidence for the idea that much of artificial intelligence, human perception and cognition, mainstream computing, and mathematics, may be understood as compression of information via the matching and unification of patterns. This is the basis for the "SP theory of intelligence", outlined in the paper and fully described elsewhere. Relevant evidence may be seen: in empirical support for the SP theory; in some advantages of information compression (IC) in terms of biology and engineering; in our use of shorthands and ordinary words in language; in how we merge successive views of any one thing; in visual recognition; in binocular vision; in visual adaptation; in how we learn lexical and grammatical structures in language; and in perceptual constancies. IC via the matching and unification of patterns may be seen in both computing and mathematics: in IC via equations; in the matching and unification of names; in the reduction or removal of redundancy from unary numbers; in the workings of Post's Canonical System and the transition function in the Universal Turing Machine; in the way computers retrieve information from memory; in systems like Prolog; and in the query-by-example technique for information retrieval. The chunking-with-codes technique for IC may be seen in the use of named functions to avoid repetition of computer code. The schema-plus-correction technique may be seen in functions with parameters and in the use of classes in object-oriented programming. And the run-length coding technique may be seen in multiplication, in division, and in several other devices in mathematics and computing. The SP theory resolves the apparent paradox of "decompression by compression". And computing and cognition as IC is compatible with the uses of redundancy in such things as backup copies to safeguard data and understanding speech in a noisy environment

    Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

    Full text link
    Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.Comment: 10 pages, 4 figure

    Pedagogic challenges in Information Retrieval – teaching mathematics to Postgraduate Information Science students

    Get PDF
    Understanding of mathematics is needed to underpin the process of search, either explicitly with Exact Match (Boolean logic, adjacency) or implicitly with Best match natural language search. In this paper I outline some pedagogical challenges in teaching mathematics for information retrieval to postgraduate information science students. The aim is to take these challenges either found by experience or in the literature, to identify both theoretical and practical ideas in order to improve the delivery of the material and positively affect the learning of the target audience. Some ideas are put forward to resolve these issues and to promote discussion

    VMEXT: A Visualization Tool for Mathematical Expression Trees

    Full text link
    Mathematical expressions can be represented as a tree consisting of terminal symbols, such as identifiers or numbers (leaf nodes), and functions or operators (non-leaf nodes). Expression trees are an important mechanism for storing and processing mathematical expressions as well as the most frequently used visualization of the structure of mathematical expressions. Typically, researchers and practitioners manually visualize expression trees using general-purpose tools. This approach is laborious, redundant, and error-prone. Manual visualizations represent a user's notion of what the markup of an expression should be, but not necessarily what the actual markup is. This paper presents VMEXT - a free and open source tool to directly visualize expression trees from parallel MathML. VMEXT simultaneously visualizes the presentation elements and the semantic structure of mathematical expressions to enable users to quickly spot deficiencies in the Content MathML markup that does not affect the presentation of the expression. Identifying such discrepancies previously required reading the verbose and complex MathML markup. VMEXT also allows one to visualize similar and identical elements of two expressions. Visualizing expression similarity can support support developers in designing retrieval approaches and enable improved interaction concepts for users of mathematical information retrieval systems. We demonstrate VMEXT's visualizations in two web-based applications. The first application presents the visualizations alone. The second application shows a possible integration of the visualizations in systems for mathematical knowledge management and mathematical information retrieval. The application converts LaTeX input to parallel MathML, computes basic similarity measures for mathematical expressions, and visualizes the results using VMEXT.Comment: 15 pages, 4 figures, Intelligent Computer Mathematics - 10th International Conference CICM 2017, Edinburgh, UK, July 17-21, 2017, Proceeding

    A Graph theoretical approach to study the organization of the cortical networks during different mathematical tasks.

    Get PDF
    The two core systems of mathematical processing (subitizing and retrieval) as well as their functionality are already known and published. In this study we have used graph theory to compare the brain network organization of these two core systems in the cortical layer during difficult calculations. We have examined separately all the EEG frequency bands in healthy young individuals and we found that the network organization at rest, as well as during mathematical tasks has the characteristics of Small World Networks for all the bands, which is the optimum organization required for efficient information processing. The different mathematical stimuli provoked changes in the graph parameters of different frequency bands, especially the low frequency bands. More specific, in Delta band the induced network increases it's local and global efficiency during the transition from subitizing to retrieval system, while results suggest that difficult mathematics provoke networks with higher cliquish organization due to more specific demands. The network of the Theta band follows the same pattern as before, having high nodal and remote organization during difficult mathematics. Also the spatial distribution of the network's weights revealed more prominent connections in frontoparietal regions, revealing the working memory load due to the engagement of the retrieval system. The cortical networks of the alpha brainwaves were also more efficient, both locally and globally, during difficult mathematics, while the fact that alpha's network was more dense on the frontparietal regions as well, reveals the engagement of the retrieval system again. Concluding, this study gives more evidences regarding the interaction of the two core systems, exploiting the produced functional networks of the cerebral cortex, especially for the difficult mathematics

    Extended Combinatorial Constructions for Peer-to-peer User-Private Information Retrieval

    Get PDF
    We consider user-private information retrieval (UPIR), an interesting alternative to private information retrieval (PIR) introduced by Domingo-Ferrer et al. In UPIR, the database knows which records have been retrieved, but does not know the identity of the query issuer. The goal of UPIR is to disguise user profiles from the database. Domingo-Ferrer et al.\ focus on using a peer-to-peer community to construct a UPIR scheme, which we term P2P UPIR. In this paper, we establish a strengthened model for P2P UPIR and clarify the privacy goals of such schemes using standard terminology from the field of privacy research. In particular, we argue that any solution providing privacy against the database should attempt to minimize any corresponding loss of privacy against other users. We give an analysis of existing schemes, including a new attack by the database. Finally, we introduce and analyze two new protocols. Whereas previous work focuses on a special type of combinatorial design known as a configuration, our protocols make use of more general designs. This allows for flexibility in protocol set-up, allowing for a choice between having a dynamic scheme (in which users are permitted to enter and leave the system), or providing increased privacy against other users.Comment: Updated version, which reflects reviewer comments and includes expanded explanations throughout. Paper is accepted for publication by Advances in Mathematics of Communication
    • …
    corecore