Search CORE

16,678 research outputs found

Thematic Annotation: extracting concepts out of documents

Author: Andrews Pierre
Rajman Martin
Publication venue
Publication date: 29/12/2004
Field of study

Contrarily to standard approaches to topic annotation, the technique used in this work does not centrally rely on some sort of -- possibly statistical -- keyword extraction. In fact, the proposed annotation algorithm uses a large scale semantic database -- the EDR Electronic Dictionary -- that provides a concept hierarchy based on hyponym and hypernym relations. This concept hierarchy is used to generate a synthetic representation of the document by aggregating the words present in topically homogeneous document segments into a set of concepts best preserving the document's content. This new extraction technique uses an unexplored approach to topic selection. Instead of using semantic similarity measures based on a semantic resource, the later is processed to extract the part of the conceptual hierarchy relevant to the document content. Then this conceptual hierarchy is searched to extract the most relevant set of concepts to represent the topics discussed in the document. Notice that this algorithm is able to extract generic concepts that are not directly present in the document.Comment: Technical report EPFL/LIA. 81 pages, 16 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Topic Map Generation Using Text Mining

Author: Böhm Karsten
Heyer Gerhard
Quasthoff Uwe
Wolff Christian
Publication venue: Springer Verlag
Publication date: 28/06/2002
Field of study

Starting from text corpus analysis with linguistic and statistical analysis algorithms, an infrastructure for text mining is described which uses collocation analysis as a central tool. This text mining method may be applied to different domains as well as languages. Some examples taken form large reference databases motivate the applicability to knowledge management using declarative standards of information structuring and description. The ISO/IEC Topic Map standard is introduced as a candidate for rich metadata description of information resources and it is shown how text mining can be used for automatic topic map generation

University of Regensburg Publication Server

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Data Mining

Author: Parker Julian
Sloan Terence
Yau Hon
Publication venue
Publication date: 01/01/1998
Field of study

Edinburgh Research Explorer

A Survey on Deep Learning in Medical Image Analysis

Author: Bejnordi Babak Ehteshami
Ciompi Francesco
Ghafoorian Mohsen
Kooi Thijs
Litjens Geert
Setio Arnaud Arindra Adiyoso
Sánchez Clara I.
van der Laak Jeroen A. W. M.
van Ginneken Bram
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Deep learning algorithms, in particular convolutional networks, have rapidly become a methodology of choice for analyzing medical images. This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year. We survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks and provide concise overviews of studies per application area. Open challenges and directions for future research are discussed.Comment: Revised survey includes expanded discussion section and reworked introductory section on common deep architectures. Added missed papers from before Feb 1st 201

arXiv.org e-Print Archive

Radboud Repository

Clear Visual Separation of Temporal Event Sequences

Author: Grønbæk Kaj
Mathisen Andreas
Publication venue
Publication date: 17/10/2017
Field of study

Extracting and visualizing informative insights from temporal event sequences becomes increasingly difficult when data volume and variety increase. Besides dealing with high event type cardinality and many distinct sequences, it can be difficult to tell whether it is appropriate to combine multiple events into one or utilize additional information about event attributes. Existing approaches often make use of frequent sequential patterns extracted from the dataset, however, these patterns are limited in terms of interpretability and utility. In addition, it is difficult to assess the role of absolute and relative time when using pattern mining techniques. In this paper, we present methods that addresses these challenges by automatically learning composite events which enables better aggregation of multiple event sequences. By leveraging event sequence outcomes, we present appropriate linked visualizations that allow domain experts to identify critical flows, to assess validity and to understand the role of time. Furthermore, we explore information gain and visual complexity metrics to identify the most relevant visual patterns. We compare composite event learning with two approaches for extracting event patterns using real world company event data from an ongoing project with the Danish Business Authority.Comment: In Proceedings of the 3rd IEEE Symposium on Visualization in Data Science (VDS), 201

arXiv.org e-Print Archive

Crossref

A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments

Author: Adda G.
Adda-Decker M.
Benjumea J.
Besacier L.
Cooper-Leavitt J.
Godard P.
Kouarata G-N.
Lamel L.
Maynard H.
Mueller M.
Rialland A.
Stueker S.
Yvon F.
Zanon-Boito M.
Publication venue
Publication date: 15/02/2018
Field of study

Most speech and language technologies are trained with massive amounts of speech and text information. However, most of the world languages do not have such resources or stable orthography. Systems constructed under these almost zero resource conditions are not only promising for speech technology but also for computational language documentation. The goal of computational language documentation is to help field linguists to (semi-)automatically analyze and annotate audio recordings of endangered and unwritten languages. Example tasks are automatic phoneme discovery or lexicon discovery from the speech signal. This paper presents a speech corpus collected during a realistic language documentation process. It is made up of 5k speech utterances in Mboshi (Bantu C25) aligned to French text translations. Speech transcriptions are also made available: they correspond to a non-standard graphemic form close to the language phonology. We present how the data was collected, cleaned and processed and we illustrate its use through a zero-resource task: spoken term discovery. The dataset is made available to the community for reproducible computational language documentation experiments and their evaluation.Comment: accepted to LREC 201

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Fourteenth Biennial Status Report: März 2017 - February 2019

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2019
Field of study

MPG.PuRe