Search CORE

4,205 research outputs found

Text Mining Infrastructure in R

Author: David Meyer
Ingo Feinerer
Kurt Hornik
Publication venue
Publication date
Field of study

During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present the tm package which provides a framework for text mining applications within R. We give a survey on text mining facilities in R and explain how typical application tasks can be carried out using our framework. We present techniques for count-based analysis methods, text clustering, text classification and string kernels.

Research Papers in Economics

Enhanced Integrated Scoring for Cleaning Dirty Texts

Author: Bennamoun Mohammed
Liu Wei
Wong Wilson
Publication venue
Publication date: 06/02/2008
Field of study

An increasing number of approaches for ontology engineering from text are gearing towards the use of online sources such as company intranet and the World Wide Web. Despite such rise, not much work can be found in aspects of preprocessing and cleaning dirty texts from online sources. This paper presents an enhancement of an Integrated Scoring for Spelling error correction, Abbreviation expansion and Case restoration (ISSAC). ISSAC is implemented as part of a text preprocessing phase in an ontology engineering system. New evaluations performed on the enhanced ISSAC using 700 chat records reveal an improved accuracy of 98% as compared to 96.5% and 71% based on the use of only basic ISSAC and of Aspell, respectively.Comment: More information is available at http://explorer.csse.uwa.edu.au/reference

arXiv.org e-Print Archive

CiteSeerX

IMPROVED INTEGRATED MINING OF HETEROGENEOUS DATA IN DECISION SUPPORT SYSTEMS

Author: Afolabi I. T.
Publication venue
Publication date: 01/03/2012
Field of study

Covenant University Repository

Matching Subsequences in Trees

Author: D. Harel
H. Yang
P. Kilpeläinen
P. Zezula
R.A. Baeza-Yates
T. Hagerup
T. Schlieder
W. Chen
Publication venue
Publication date: 01/01/2006
Field of study

Given two rooted, labeled trees

P

and

T

the tree path subsequence problem is to determine which paths in

P

are subsequences of which paths in

T

. Here a path begins at the root and ends at a leaf. In this paper we propose this problem as a useful query primitive for XML data, and provide new algorithms improving the previously best known time and space bounds.Comment: Minor correction of typos, et

arXiv.org e-Print Archive

CiteSeerX

Crossref

Online Research Database In Technology

Preparing Laboratory and Real-World EEG Data for Large-Scale Analysis: A Containerized Approach.

Author: Bigdely-Shamlo Nima
Makeig Scott
Robbins Kay A
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Large-scale analysis of EEG and other physiological measures promises new insights into brain processes and more accurate and robust brain-computer interface models. However, the absence of standardized vocabularies for annotating events in a machine understandable manner, the welter of collection-specific data organizations, the difficulty in moving data across processing platforms, and the unavailability of agreed-upon standards for preprocessing have prevented large-scale analyses of EEG. Here we describe a "containerized" approach and freely available tools we have developed to facilitate the process of annotating, packaging, and preprocessing EEG data collections to enable data sharing, archiving, large-scale machine learning/data mining and (meta-)analysis. The EEG Study Schema (ESS) comprises three data "Levels," each with its own XML-document schema and file/folder convention, plus a standardized (PREP) pipeline to move raw (Data Level 1) data to a basic preprocessed state (Data Level 2) suitable for application of a large class of EEG analysis methods. Researchers can ship a study as a single unit and operate on its data using a standardized interface. ESS does not require a central database and provides all the metadata data necessary to execute a wide variety of EEG processing pipelines. The primary focus of ESS is automated in-depth analysis and meta-analysis EEG studies. However, ESS can also encapsulate meta-information for the other modalities such as eye tracking, that are increasingly used in both laboratory and real-world neuroimaging. ESS schema and tools are freely available at www.eegstudy.org and a central catalog of over 850 GB of existing data in ESS format is available at studycatalog.org. These tools and resources are part of a larger effort to enable data sharing at sufficient scale for researchers to engage in truly large-scale EEG analysis and data mining (BigEEG.org)

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

eScholarship - University of California