10,933 research outputs found
Automatic extraction of knowledge from web documents
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper
Exploiting Image-trained CNN Architectures for Unconstrained Video Classification
We conduct an in-depth exploration of different strategies for doing event
detection in videos using convolutional neural networks (CNNs) trained for
image classification. We study different ways of performing spatial and
temporal pooling, feature normalization, choice of CNN layers as well as choice
of classifiers. Making judicious choices along these dimensions led to a very
significant increase in performance over more naive approaches that have been
used till now. We evaluate our approach on the challenging TRECVID MED'14
dataset with two popular CNN architectures pretrained on ImageNet. On this
MED'14 dataset, our methods, based entirely on image-trained CNN features, can
outperform several state-of-the-art non-CNN models. Our proposed late fusion of
CNN- and motion-based features can further increase the mean average precision
(mAP) on MED'14 from 34.95% to 38.74%. The fusion approach achieves the
state-of-the-art classification performance on the challenging UCF-101 dataset
A reproducible approach with R markdown to automatic classification of medical certificates in French
In this paper, we report the ongoing developments of our first participation to the Cross-Language Evaluation Forum (CLEF) eHealth Task 1: âMultilingual Information Extraction - ICD10 codingâ (NĂ©vĂ©ol et al., 2017). The task consists in labelling death certificates, in French with international standard codes. In particular, we wanted to accomplish the goal of the âReplication trackâ of this Task which promotes the sharing of tools and the dissemination of solid, reproducible results.In questo articolo presentiamo gli sviluppi del lavoro iniziato con la partecipazione al Laboratorio CrossLanguage Evaluation Forum (CLEF) eHealth denominato: âMultilingual Information Extraction - ICD10 codingâ (NĂ©vĂ©ol et al., 2017) che ha come obiettivo quello di classificare certificati di morte in lingua francese con dei codici standard internazionali. In particolare, abbiamo come obiettivo quello proposto dalla âReplication trackâ di questo Task, che promuove la condivisione di strumenti e la diffusione di risultati riproducibili
Web based knowledge extraction and consolidation for automatic ontology instantiation
The Web is probably the largest and richest information repository available today. Search engines are the common access routes to this valuable source. However, the role of these search engines is often limited to the retrieval of lists of potentially relevant documents. The burden of analysing the returned documents and identifying the knowledge of interest is therefore left to the user. The Artequakt system aims to deploy natural language tools to automatically ex-tract and consolidate knowledge from web documents and instantiate a given ontology, which dictates the type and form of knowledge to extract. Artequakt focuses on the domain of artists, and uses the harvested knowledge to gen-erate tailored biographies. This paper describes the latest developments of the system and discusses the problem of knowledge consolidation
Vocal Access to a Newspaper Archive: Design Issues and Preliminary Investigation
This paper presents the design and the current prototype implementation of an
interactive vocal Information Retrieval system that can be used to access
articles of a large newspaper archive using a telephone. The results of
preliminary investigation into the feasibility of such a system are also
presented
The effects of topic familiarity on user search behavior in question answering systems
This paper reports on experiments that attempt
to characterize the relationship between users
and their knowledge of the search topic in a
Question Answering (QA) system. It also
investigates user search behavior with respect
to the length of answers presented by a QA
system. Two lengths of answers were
compared; snippets (one to two sentences of
text) and exact answers. A user test was
conducted, 92 factoid questions were judged
by 44 participants, to explore the participantsâ
preferences, feelings and opinions about QA
system tasks. The conclusions drawn from the
results were that participants preferred and
obtained higher accuracy in finding answers
from the snippets set. However, accuracy
varied according to usersâ topic familiarity;
users were only substantially helped by the
wider context of a snippet if they were already
familiar with the topic of the question, without
such familiarity, users were about as accurate
at locating answers from the snippets as they
were in exact set
- âŠ