33,733 research outputs found
Math Search for the Masses: Multimodal Search Interfaces and Appearance-Based Retrieval
We summarize math search engines and search interfaces produced by the
Document and Pattern Recognition Lab in recent years, and in particular the min
math search interface and the Tangent search engine. Source code for both
systems are publicly available. "The Masses" refers to our emphasis on creating
systems for mathematical non-experts, who may be looking to define unfamiliar
notation, or browse documents based on the visual appearance of formulae rather
than their mathematical semantics.Comment: Paper for Invited Talk at 2015 Conference on Intelligent Computer
Mathematics (July, Washington DC
Painting Analysis Using Wavelets and Probabilistic Topic Models
In this paper, computer-based techniques for stylistic analysis of paintings
are applied to the five panels of the 14th century Peruzzi Altarpiece by Giotto
di Bondone. Features are extracted by combining a dual-tree complex wavelet
transform with a hidden Markov tree (HMT) model. Hierarchical clustering is
used to identify stylistic keywords in image patches, and keyword frequencies
are calculated for sub-images that each contains many patches. A generative
hierarchical Bayesian model learns stylistic patterns of keywords; these
patterns are then used to characterize the styles of the sub-images; this in
turn, permits to discriminate between paintings. Results suggest that such
unsupervised probabilistic topic models can be useful to distill characteristic
elements of style.Comment: 5 pages, 4 figures, ICIP 201
Automatic Palaeographic Exploration of Genizah Manuscripts
The Cairo Genizah is a collection of hand-written documents containing approximately
350,000 fragments of mainly Jewish texts discovered in the late 19th
century. The
fragments are today spread out in some 75 libraries and private collections worldwide,
but there is an ongoing effort to document and catalogue all extant fragments.
Palaeographic information plays a key role in the study of the Genizah collection.
Script style, and–more specifically–handwriting, can be used to identify fragments that
might originate from the same original work. Such matched fragments, commonly
referred to as “joins”, are currently identified manually by experts, and presumably only
a small fraction of existing joins have been discovered to date. In this work, we show
that automatic handwriting matching functions, obtained from non-specific features
using a corpus of writing samples, can perform this task quite reliably. In addition, we
explore the problem of grouping various Genizah documents by script style, without
being provided any prior information about the relevant styles. The automatically
obtained grouping agrees, for the most part, with the palaeographic taxonomy. In cases
where the method fails, it is due to apparent similarities between related scripts
A Taxonomy of Hyperlink Hiding Techniques
Hidden links are designed solely for search engines rather than visitors. To
get high search engine rankings, link hiding techniques are usually used for
the profitability of black industries, such as illicit game servers, false
medical services, illegal gambling, and less attractive high-profit industry,
etc. This paper investigates hyperlink hiding techniques on the Web, and gives
a detailed taxonomy. We believe the taxonomy can help develop appropriate
countermeasures. Study on 5,583,451 Chinese sites' home pages indicate that
link hidden techniques are very prevalent on the Web. We also tried to explore
the attitude of Google towards link hiding spam by analyzing the PageRank
values of relative links. The results show that more should be done to punish
the hidden link spam.Comment: 12 pages, 2 figure
Detecting Large Concept Extensions for Conceptual Analysis
When performing a conceptual analysis of a concept, philosophers are
interested in all forms of expression of a concept in a text---be it direct or
indirect, explicit or implicit. In this paper, we experiment with topic-based
methods of automating the detection of concept expressions in order to
facilitate philosophical conceptual analysis. We propose six methods based on
LDA, and evaluate them on a new corpus of court decision that we had annotated
by experts and non-experts. Our results indicate that these methods can yield
important improvements over the keyword heuristic, which is often used as a
concept detection heuristic in many contexts. While more work remains to be
done, this indicates that detecting concepts through topics can serve as a
general-purpose method for at least some forms of concept expression that are
not captured using naive keyword approaches
- …