Search CORE

74 research outputs found

Probabilistic management of OCR data using an RDBMS

Author: Allauzen C.
Baeza-Yates R. A.
Bishop C. M.
Cho J.
Cowell R. G.
Gupta R.
Hopcroft J. E.
Jordan M. I.
Kimura H.
Lafferty J.
Mori S.
Widom J.
Yen J. Y.
Zobel J.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Gaining insight into clinical pathway with process discovery techniques.

Author: Poelmans Jonas
Publication venue
Publication date
Field of study

Research Papers in Economics

Strange bedfellows? Keyword and conceptual search unite to make sense of relevant ESI in electronic discovery

Author: Baron D.
Black I.
Publication venue
Publication date: 25/06/2008
Field of study

In the brief history of electronic discovery, the latter part of the twentieth century witnessed the demise of paper by a digital hero that emancipated the content of paper documents with OCR and TIFF. This technology added a third dimension to the realm of 2D paper document review and production that lead to a sea change in discovery methods. By many accounts what we have before us is a three-stage evolution from paper to digital to clustering in order to overcome the problems of volume and complexity of ESI. The intent of this position paper is to describe the development of the digital hero and methodology that is emancipating the content and context of ESI – conceptual search that spans file formats, languages and technique, and includes keyword search on a common, shared index

UCL Discovery

The TXM Portal Software giving access to Old French Manuscripts Online

Author: Heiden Serge
Lavrentiev Alexei
Publication venue: HAL CCSD
Publication date: 21/05/2012
Field of study

Texte intégral en ligne : http://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdfInternational audiencehttp://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdf This paper presents the new TXM software platform giving online access to Old French Text Manuscripts images and tagged transcriptions for concordancing and text mining. This platform is able to import medieval sources encoded in XML according to the TEI Guidelines for linking manuscript images to transcriptions, encode several diplomatic levels of transcription including abbreviations and word level corrections. It includes a sophisticated tokenizer able to deal with TEI tags at different levels of linguistic hierarchy. Words are tagged on the fly during the import process using IMS TreeTagger tool with a specific language model. Synoptic editions displaying side by side manuscript images and text transcriptions are automatically produced during the import process. Texts are organized in a corpus with their own metadata (title, author, date, genre, etc.) and several word properties indexes are produced for the CQP search engine to allow efficient word patterns search to build different type of frequency lists or concordances. For syntactically annotated texts, special indexes are produced for the Tiger Search engine to allow efficient syntactic concordances building. The platform has also been tested on classical Latin, ancient Greek, Old Slavonic and Old Hieroglyphic Egyptian corpora (including various types of encoding and annotations)

HAL-ENS-LYON

RFID Middleware Design and Architecture

Author: Ajana El Khaddar Mehdia
Boulmalf Mohammed
Elkoutbi Mohammed
Harroud Hamid
Publication venue: 'IntechOpen'
Publication date: 15/06/2011
Field of study

IntechOpen

Crossref

Improving search engines with open Web-based SKOS vocabularies

Author: Martins Flávio Nuno Fernandes
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2012
Field of study

Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThe volume of digital information is increasingly larger and even though organiza-tions are making more of this information available, without the proper tools users have great difficulties in retrieving documents about subjects of interest. Good infor-mation retrieval mechanisms are crucial for answering user information needs. Nowadays, search engines are unavoidable - they are an essential feature in docu-ment management systems. However, achieving good relevancy is a difficult problem particularly when dealing with specific technical domains where vocabulary mismatch problems can be prejudicial. Numerous research works found that exploiting the lexi-cal or semantic relations of terms in a collection attenuates this problem. In this dissertation, we aim to improve search results and user experience by inves-tigating the use of potentially connected Web vocabularies in information retrieval en-gines. In the context of open Web-based SKOS vocabularies we propose a query expan-sion framework implemented in a widely used IR system (Lucene/Solr), and evaluated using standard IR evaluation datasets. The components described in this thesis were applied in the development of a new search system that was integrated with a rapid applications development tool in the context of an internship at Quidgest S.A.Fundação para a Ciência e Tecnologia - ImTV research project, in the context of the UTAustin-Portugal collaboration (UTA-Est/MAI/0010/2009); QSearch project (FCT/Quidgest

Repositório da Universidade Nova de Lisboa

Information retrieval (Part I):Introduction

Author: Paijmans J.J.
Publication venue: Institute for Language Technology and Artifical IntelIigence, Tilburg University
Publication date: 01/01/1992
Field of study

Tilburg University Repository

Online Deception Detection Using BDI Agents

Author: Merritts Richard Alan
Publication venue: NSUWorks
Publication date: 01/01/2013
Field of study

This research has two facets within separate research areas. The research area of Belief, Desire and Intention (BDI) agent capability development was extended. Deception detection research has been advanced with the development of automation using BDI agents. BDI agents performed tasks automatically and autonomously. This study used these characteristics to automate deception detection with limited intervention of human users. This was a useful research area resulting in a capability general enough to have practical application by private individuals, investigators, organizations and others. The need for this research is grounded in the fact that humans are not very effective at detecting deception whether in written or spoken form. This research extends the deception detection capability research in that typical deception detection tools are labor intensive and require extraction of the text in question following ingestion into a deception detection tool. A neural network capability module was incorporated to lend the resulting prototype Machine Learning attributes. The prototype developed as a result of this research was able to classify online data as either deceptive or not deceptive with 85% accuracy. The false discovery rate for deceptive online data entries was 20% while the false discovery rate for not deceptive was 10%. The system showed stability during test runs. No computer crashes or other anomalous system behavior were observed during the testing phase. The prototype successfully interacted with an online data communications server database and processed data using Neural Network input vector generation algorithms within second

NSU Works