73,771 research outputs found
Discovering Mathematical Objects of Interest -- A Study of Mathematical Notations
Mathematical notation, i.e., the writing system used to communicate concepts
in mathematics, encodes valuable information for a variety of information
search and retrieval systems. Yet, mathematical notations remain mostly
unutilized by today's systems. In this paper, we present the first in-depth
study on the distributions of mathematical notation in two large scientific
corpora: the open access arXiv (2.5B mathematical objects) and the mathematical
reviewing service for pure and applied mathematics zbMATH (61M mathematical
objects). Our study lays a foundation for future research projects on
mathematical information retrieval for large scientific corpora. Further, we
demonstrate the relevance of our results to a variety of use-cases. For
example, to assist semantic extraction systems, to improve scientific search
engines, and to facilitate specialized math recommendation systems. The
contributions of our presented research are as follows: (1) we present the
first distributional analysis of mathematical formulae on arXiv and zbMATH; (2)
we retrieve relevant mathematical objects for given textual search queries
(e.g., linking with `Jacobi
polynomial'); (3) we extend zbMATH's search engine by providing relevant
mathematical formulae; and (4) we exemplify the applicability of the results by
presenting auto-completion for math inputs as the first contribution to math
recommendation systems. To expedite future research projects, we have made
available our source code and data.Comment: Proceedings of The Web Conference 2020 (WWW'20), April 20--24, 2020,
Taipei, Taiwa
Fisher's exact test explains a popular metric in information retrieval
Term frequency-inverse document frequency, or tf-idf for short, is a
numerical measure that is widely used in information retrieval to quantify the
importance of a term of interest in one out of many documents. While tf-idf was
originally proposed as a heuristic, much work has been devoted over the years
to placing it on a solid theoretical foundation. Following in this tradition,
we here advance the first justification for tf-idf that is grounded in
statistical hypothesis testing. More precisely, we first show that the
one-tailed version of Fisher's exact test, also known as the hypergeometric
test, corresponds well with a common tf-idf variant on selected real-data
information retrieval tasks. We then set forth a mathematical argument that
suggests the tf-idf variant approximates the negative logarithm of the
one-tailed Fisher's exact test P-value (i.e., a hypergeometric distribution
tail probability). The Fisher's exact test interpretation of this common tf-idf
variant furnishes the working statistician with a ready explanation of tf-idf's
long-established effectiveness.Comment: 26 pages, 4 figures, 1 tables, minor revision
Node discovery in a networked organization
In this paper, I present a method to solve a node discovery problem in a
networked organization. Covert nodes refer to the nodes which are not
observable directly. They affect social interactions, but do not appear in the
surveillance logs which record the participants of the social interactions.
Discovering the covert nodes is defined as identifying the suspicious logs
where the covert nodes would appear if the covert nodes became overt. A
mathematical model is developed for the maximal likelihood estimation of the
network behind the social interactions and for the identification of the
suspicious logs. Precision, recall, and F measure characteristics are
demonstrated with the dataset generated from a real organization and the
computationally synthesized datasets. The performance is close to the
theoretical limit for any covert nodes in the networks of any topologies and
sizes if the ratio of the number of observation to the number of possible
communication patterns is large
Making Math Searchable in Wikipedia
Wikipedia, the world largest encyclopedia contains a lot of knowledge that is
expressed as formulae exclusively. Unfortunately, this knowledge is currently
not fully accessible by intelligent information retrieval systems. This immense
body of knowledge is hidden form value-added services, such as search. In this
paper, we present our MathSearch implementation for Wikipedia that enables
users to perform a combined text and fully unlock the potential benefits.Comment: 7 pages, 2 figures, Conference on Intelligent Computer Mathematics,
July 9-14 2012, Bremen, Germany. To be published in Lecture Notes, Artificial
Intelligence, Springe
VMEXT: A Visualization Tool for Mathematical Expression Trees
Mathematical expressions can be represented as a tree consisting of terminal
symbols, such as identifiers or numbers (leaf nodes), and functions or
operators (non-leaf nodes). Expression trees are an important mechanism for
storing and processing mathematical expressions as well as the most frequently
used visualization of the structure of mathematical expressions. Typically,
researchers and practitioners manually visualize expression trees using
general-purpose tools. This approach is laborious, redundant, and error-prone.
Manual visualizations represent a user's notion of what the markup of an
expression should be, but not necessarily what the actual markup is. This paper
presents VMEXT - a free and open source tool to directly visualize expression
trees from parallel MathML. VMEXT simultaneously visualizes the presentation
elements and the semantic structure of mathematical expressions to enable users
to quickly spot deficiencies in the Content MathML markup that does not affect
the presentation of the expression. Identifying such discrepancies previously
required reading the verbose and complex MathML markup. VMEXT also allows one
to visualize similar and identical elements of two expressions. Visualizing
expression similarity can support support developers in designing retrieval
approaches and enable improved interaction concepts for users of mathematical
information retrieval systems. We demonstrate VMEXT's visualizations in two
web-based applications. The first application presents the visualizations
alone. The second application shows a possible integration of the
visualizations in systems for mathematical knowledge management and
mathematical information retrieval. The application converts LaTeX input to
parallel MathML, computes basic similarity measures for mathematical
expressions, and visualizes the results using VMEXT.Comment: 15 pages, 4 figures, Intelligent Computer Mathematics - 10th
International Conference CICM 2017, Edinburgh, UK, July 17-21, 2017,
Proceeding
An algebra and conceptual model for semantic tagging of collaborative digital libraries
Cost-effective semantic description and annotation of shared knowledge resources has always been of great importance for digital libraries and large scale information systems in general. With the emergence of the Social Web and Web 2.0 technologies, a more effective semantic description and annotation, e.g., folksonomies, of digital library contents is envisioned to take place in collaborative and personalised environments. However, there is a lack of foundation and mathematical rigour for coping with contextualised management and retrieval of semantic annotations throughout their evolution as well as diversity in users and user communities. In this paper, we propose an ontological foundation for semantic annotations of digital libraries in terms of flexonomies. The proposed theoretical model relies on a high dimensional space with algebraic operators for contextualised access of semantic tags and annotations. The set of the proposed algebraic operators, however, is an adaptation of the set theoretic operators
selection, projection, difference, intersection, union in database theory. To this extent, the proposed model is meant to lay the ontological foundation for a Digital
Library 2.0 project in terms of geometric spaces rather than logic (description) based formalisms as a more efficient and scalable solution to the semantic annotation
problem in large scale
On the selection of secondary indices in relational databases
An important problem in the physical design of databases is the selection of secondary indices. In general, this problem cannot be solved in an optimal way due to the complexity of the selection process. Often use is made of heuristics such as the well-known ADD and DROP algorithms. In this paper it will be shown that frequently used cost functions can be classified as super- or submodular functions. For these functions several mathematical properties have been derived which reduce the complexity of the index selection problem. These properties will be used to develop a tool for physical database design and also give a mathematical foundation for the success of the before-mentioned ADD and DROP algorithms
- …