11,140 research outputs found
Math Search for the Masses: Multimodal Search Interfaces and Appearance-Based Retrieval
We summarize math search engines and search interfaces produced by the
Document and Pattern Recognition Lab in recent years, and in particular the min
math search interface and the Tangent search engine. Source code for both
systems are publicly available. "The Masses" refers to our emphasis on creating
systems for mathematical non-experts, who may be looking to define unfamiliar
notation, or browse documents based on the visual appearance of formulae rather
than their mathematical semantics.Comment: Paper for Invited Talk at 2015 Conference on Intelligent Computer
Mathematics (July, Washington DC
Turkish handwritten text recognition: a case of agglutinative languages
We describe a system for recognizing unconstrained Turkish handwritten text. Turkish has agglutinative morphology and theoretically an infinite number of words that can be generated by adding more suffixes to the word. This makes lexicon-based recognition approaches, where the most likely word is selected among all the alternatives in a lexicon, unsuitable for Turkish. We describe our approach to the problem using a Turkish prefix recognizer. First results of the system demonstrates the promise of this approach, with top-10 word recognition rate of about 40% for a small test data of mixed handprint and cursive writing. The lexicon-based approach with a 17,000 word-lexicon (with test words added) achieves 56% top-10 word recognition rate
Shape-Based Plagiarism Detection for Flowchart Figures in Texts
Plagiarism detection is well known phenomenon in the academic arena. Copying
other people is considered as serious offence that needs to be checked. There
are many plagiarism detection systems such as turn-it-in that has been
developed to provide this checks. Most, if not all, discard the figures and
charts before checking for plagiarism. Discarding the figures and charts
results in look holes that people can take advantage. That means people can
plagiarized figures and charts easily without the current plagiarism systems
detecting it. There are very few papers which talks about flowcharts plagiarism
detection. Therefore, there is a need to develop a system that will detect
plagiarism in figures and charts. This paper presents a method for detecting
flow chart figure plagiarism based on shape-based image processing and
multimedia retrieval. The method managed to retrieve flowcharts with ranked
similarity according to different matching sets.Comment: 12 page
Implementation of a Human-Computer Interface for Computer Assisted Translation and Handwritten Text Recognition
A human-computer interface is developed to provide services of computer assisted machine translation (CAT) and computer assisted transcription of handwritten text images (CATTI). The back-end machine translation (MT) and handwritten text recognition (HTR) systems are provided by the Pattern Recognition and Human Language Technology (PRHLT) research group. The idea is to provide users with easy to use tools to convert interactive translation and transcription feasible tasks. The assisted service is provided by remote servers with CAT or CATTI capabilities. The interface supplies the user with tools for efficient local edition: deletion, insertion and substitution.Ocampo Sepúlveda, JC. (2009). Implementation of a Human-Computer Interface for Computer Assisted Translation and Handwritten Text Recognition. http://hdl.handle.net/10251/14318Archivo delegad
A Survey on Retrieval of Mathematical Knowledge
We present a short survey of the literature on indexing and retrieval of
mathematical knowledge, with pointers to 72 papers and tentative taxonomies of
both retrieval problems and recurring techniques.Comment: CICM 2015, 20 page
- …