Search CORE

1,362 research outputs found

Human-competitive automatic topic indexing

Author: Medelyan Olena
Publication venue: The University of Waikato
Publication date: 01/01/2009
Field of study

Topic indexing is the task of identifying the main topics covered by a document. These are useful for many purposes: as subject headings in libraries, as keywords in academic publications and as tags on the web. Knowing a document's topics helps people judge its relevance quickly. However, assigning topics manually is labor intensive. This thesis shows how to generate them automatically in a way that competes with human performance. Three kinds of indexing are investigated: term assignment, a task commonly performed by librarians, who select topics from a controlled vocabulary; tagging, a popular activity of web users, who choose topics freely; and a new method of keyphrase extraction, where topics are equated to Wikipedia article names. A general two-stage algorithm is introduced that first selects candidate topics and then ranks them by significance based on their properties. These properties draw on statistical, semantic, domain-specific and encyclopedic knowledge. They are combined using a machine learning algorithm that models human indexing behavior from examples. This approach is evaluated by comparing automatically generated topics to those assigned by professional indexers, and by amateurs. We claim that the algorithm is human-competitive because it chooses topics that are as consistent with those assigned by humans as their topics are with each other. The approach is generalizable, requires little training data and applies across different domains and languages

Research Commons@Waikato

CERN Document Server

Supporting E-Health Information Seekers: From Simple Strategies to Knowledge-Based Methods

Author: Dahamna Badisse
Darmoni Stéfan J.
Soualmia Lina F.
Publication venue: 'IntechOpen'
Publication date: 12/09/2012
Field of study

IntechOpen

Learning to Build a Semantic Thesaurus from Free Text Corpora without External Help

Author: Katia Lida Kermanidis
Publication venue: 'IntechOpen'
Publication date: 01/01/2009
Field of study

IntechOpen

Focused image search in the social Web.

Author: Zhang Zhiyong
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/12/2008
Field of study

Recently, social multimedia-sharing websites, which allow users to upload, annotate, and share online photo or video collections, have become increasingly popular. The user tags or annotations constitute the new multimedia meta-data . We present an image search system that exploits both image textual and visual information. First, we use focused crawling and DOM Tree based web data extraction methods to extract image textual features from social networking image collections. Second, we propose the concept of visual words to handle the image\u27s visual content for fast indexing and searching. We also develop several user friendly search options to allow users to query the index using words and image feature descriptions (visual words). The developed image search system tries to bridge the gap between the scalable industrial image search engines, which are based on keyword search, and the slower content based image retrieval systems developed mostly in the academic field and designed to search based on image content only. We have implemented a working prototype by crawling and indexing over 16,056 images from flickr.com, one of the most popular image sharing websites. Our experimental results on a working prototype confirm the efficiency and effectiveness of the methods, that we proposed

University of Louisville

Information extraction from the web using a search engine

Author: Geleijnse G.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2008
Field of study

Repository TU/e

Pure OAI Repository

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Identifying Synonymous Terms in Preparation for Technology Mining

Author: Trumbach Cherie C
Publication venue: ScholarWorks@UNO
Publication date: 01/12/2007
Field of study

In this research, the development of a `concept-clumping algorithm\u27 designed to improve the clustering of technical concepts is demonstrated . The algorithm developed first identifies a list of technically relevant noun phrases from a cleaned extracted list and then applies a rule-based algorithm for identifying synonymous terms based on shared words in each term. An assessment of the algorithm found that the algorithm has an 89—91% precision rate, was successful in moving technically important terms higher in the term frequency list, and improved the technical specificity of term clusters

University of New Orleans

Matching health information seekers' queries to medical terms

Author: A Gaudinat
A Keselman
A Mykowiecka
A Stanier
AT McCray
C Boyer
C Grouin
C Senger
E Brill
Elise Prieur-Gaston
F Abad Garcia
F Brouard
G Stoilos
J Crowell
JW Wilbur
K Kuckich
L Peters
L Yujian
LF Soualmia
Lina F Soualmia
LJ Peterson
M Douyère
M Kernigham
P Ruch
SJ Grannis
SJ Nelson
SM Meystre
Stéfan J Darmoni
T Koch
T Yarkoni
Thierry Lecroq
VI Levenshtein
VJ Hodge
W Winkler
Zied Moalla
Ö Uzuner
Ö Uzuner
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Source Code Retrieval using Case Based Reasoning

Author: Pitu Mihai
Publication venue
Publication date: 01/07/2013
Field of study

Formal verification of source code has been extensively used in the past few years in order to create dependable software systems. However, although formal languages like Spec# or JML are getting more and more popular, the set of verified implementations is very small and only growing slowly. Our work aims to automate some of the steps involved in writing specifications and their implementations, by reusing existing verified programs. That is, for a given implementation we seek to retrieve similar verified code and then reapply the missing specification that accompanies that code. In this thesis, I present the retrieval system that is part of the Arís (Analogical Reasoning for reuse of Implementation & Specification) project. The overall methodology of the Arís project is very similar to Case-Based Reasoning (CBR) and its parent discipline of Analogical Reasoning (AR), centered on the activities of solution retrieval and reuse. CBR’s retrieval phase is achieved using semantic and structural characteristics of source code. API calls are used as semantic anchors and characteristics of conceptual graphs are used to express the structure of implementations. Finally, we transfer the knowledge (i.e. formal specification) between the input implementation and the retrieved code artefacts to produce a specification for a given implementation. The evaluation results are promising and our experiments show that the proposed approach has real potential in generating formal specifications using past solutions

MURAL - Maynooth University Research Archive Library