Search CORE

9,142 research outputs found

OpenMinTeD: A Platform Facilitating Text Mining of Scholarly Content

Author: Anastasiou Lucas
Eckart de Castilho Richard
Galanis Dimitrios
Georgantopoulos Byron
Greenwood Mark
Katerina Gkirtzou
Knoth Petr
Labropoulou Penny
Lempesis Antonis
Manola Natalia
Martziou Stefania
Piperidis Stelios
Sachtouris Stavros
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/06/2018
Field of study

The OpenMinTeD platform aims to bring full text Open Access scholarly content from a wide range of providers together with Text and Data Mining (TDM) tools from various Natural Language Processing frameworks and TDM developers in an integrated environment. In this way, it supports users who want to mine scientific literature with easy access to relevant content and allows running scalable TDM workflows in the cloud

TUbiblio

Open Research Online (The Open University)

Variation of word frequencies across genre classification tasks

Author: Kim Y.
Ross S.
Publication venue: GEIE-ERCIM
Publication date: 01/01/2007
Field of study

This paper examines automated genre classification of text documents and its role in enabling the effective management of digital documents by digital libraries and other repositories. Genre classification, which narrows down the possible structure of a document, is a valuable step in realising the general automatic extraction of semantic metadata essential to the efficient management and use of digital objects. In the present report, we present an analysis of word frequencies in different genre classes in an effort to understand the distinction between independent classification tasks. In particular, we examine automated experiments on thirty-one genre classes to determine the relationship between the word frequency metrics and the degree of its significance in carrying out classification in varying environments

Enlighten

Searching for Ground Truth: a stepping stone in automating genre classification

Author: A. Finn
D. Biber
G. Giuffrida
H.I. Witten
J. Karlgren
L. Breiman
M.P. Marcus
S.W. Ke
Y. Kim
Y. Kim
Publication venue
Publication date: 01/01/2007
Field of study

This paper examines genre classification of documents and its role in enabling the effective automated management of digital documents by digital libraries and other repositories. We have previously presented genre classification as a valuable step toward achieving automated extraction of descriptive metadata for digital material. Here, we present results from experiments using human labellers, conducted to assist in genre characterisation and the prediction of obstacles which need to be overcome by an automated system, and to contribute to the process of creating a solid testbed corpus for extending automated genre classification and testing metadata extraction tools across genres. We also describe the performance of two classifiers based on image and stylistic modeling features in labelling the data resulting from the agreement of three human labellers across fifteen genre classes.

Crossref

Enlighten

Multimedia search without visual analysis: the value of linguistic and contextual information

Author: Jong Franciska M.G. de
Vries Arjen P. de
Westerveld Thijs
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2007
Field of study

This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features

CiteSeerX

CWI's Institutional Repository

University of Twente Research Information