Search CORE

7,591 research outputs found

Machine Learning in Automated Text Categorization

Author: ANDROUTSOPOULOS I.
ATTARDI G.
BAKER L.D.
BIEBRICHER P.
CAROPRESO M.F.
CAVNAR W.B.
CHAKRABARTI S.
CLACK C.
CLEVERDON C.
COHEN W. W.
COHEN W. W.
COHEN W.W.
DAGAN I.
DEERWESTER S.
DENOYER L.
DIAZ ESTEBAN A.
DRUCKER H.
DUMAIS S.T.
DUMAIS S.T.
ESCUDERO G.
Fabrizio Sebastiani
FIELD B.
FORSYTH R. S.
FUHR N.
FUHR N.
FUHR N.
FURNKRANZ J.
GALAVOTTI L.
GALE W. A.
GOVERT N.
GRAY W.A.
GUTHRIE L.
HAYES P.J.
HEAPS H.
HERSH W.
HULL D. A.
HULL D. A.
ITTNER D.J.
IWAYAMA M.
IYER R.D.
JOACHIMS T.
JOACHIMS T.
JOACHIMS T.
JOHN G. H.
JUNKER M.
JUNKER M.
KESSLER B.
KIM Y.-H.
KLINKENBERG R.
KNORZ G.
KOLLER D.
LAM S.L.
LAM W.
LAM W.
LANG K.
LARKEY L. S.
LARKEY L. S.
LARKEY L.S.
LEWIS D. D.
LEWIS D. D.
LEWIS D. D.
LEWIS D. D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LI H.
LI Y.H.
LIERE R.
LIM J. H.
MASAND B.
MASAND B.
MCCALLUM A. K.
MCCALLUM A.K.
MLADENIC D.
MLADENIC D.
MOULINIER I.
MOULINIER I.
MYERS K.
NG H.T.
OH H.-J.
PAZIENZA M. T.
RILOFF E.
ROBERTSON S.E.
ROBERTSON S.E.
ROTH D.
RUIZ M.E.
SABLE C.L.
SARACEVIC T.
SCHAPIRE R. E.
SCHUTZE H.
SCHUTZE H.
SCOTT S.
SEBASTIANI F.
SINGHAL A.
SLONIM N.
TAIRA H.
TUMER K.
TZERAS K.
VAN RIJSBERGEN C. J.
WIENER E.D.
YANG Y.
YANG Y.
YANG Y.
YANG Y.
YU K.L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2001
Field of study

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

arXiv.org e-Print Archive

CiteSeerX

Crossref

A probabilistic threshold model: Analyzing semantic categorization data with the Rasch model

Author: Baker
Barsalou
Birnbaum
Bjorklund
Braeken
Braisby
Braisby
Clark
De Boeck
Dry
Fischer
Gardner
Gert Storms
Hampton
Hampton
Hampton
Hampton
Hampton
James A. Hampton
Janssen
Johnson
Keil
Kim
Komatsu
Lee
Lin
Lunn
McCloskey
Mervis
Mervis
Nelson
Osherson
Palmeri
Rasch
Rizopoulos
Smith
Steven Verheyen
Stukken
Thissen
Tuerlinckx
Van den Noortgate
Vanpaemel
Verheyen
Verheyen
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

According to the Threshold Theory (Hampton, 1995, 2007) semantic categorization decisions come about through the placement of a threshold criterion along a dimension that represents items' similarity to the category representation. The adequacy of this theory is assessed by applying a formalization of the theory, known as the Rasch model (Rasch, 1960; Thissen & Steinberg, 1986), to categorization data for eight natural language categories and subjecting it to a formal test. In validating the model special care is given to its ability to account for inter- and intra-individual differences in categorization and their relationship with item typicality. Extensions of the Rasch model that can be used to uncover the nature of category representations and the sources of categorization differences are discussed

Lirias

City Research Online

Crossref

Optimal Feature Subset Selection Based on Combining Document Frequency and Term Frequency for Text Classification

Author: Karpagalingam Thirumoorthy
Karuppaiah Muneeswaran
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 25/03/2021
Field of study

Feature selection plays a vital role to reduce the high dimension of the feature space in the text document classification problem. The dimension reduction of feature space reduces the computation cost and improves the text classification system accuracy. Hence, the identification of a proper subset of the significant features of the text corpus is needed to classify the data in less computational time with higher accuracy. In this proposed research, a novel feature selection method which combines the document frequency and the term frequency (FS-DFTF) is used to measure the significance of a term. The optimal feature subset which is selected by our proposed work is evaluated using Naive Bayes and Support Vector Machine classifier with various popular benchmark text corpus datasets. The experimental outcome confirms that the proposed method has a better classification accuracy when compared with other feature selection techniques

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Semantic Sort: A Supervised Approach to Personalized Semantic Relatedness

Author: El-Yaniv Ran
Yanay David
Publication venue
Publication date: 10/11/2013
Field of study

We propose and study a novel supervised approach to learning statistical semantic relatedness models from subjectively annotated training examples. The proposed semantic model consists of parameterized co-occurrence statistics associated with textual units of a large background knowledge corpus. We present an efficient algorithm for learning such semantic models from a training sample of relatedness preferences. Our method is corpus independent and can essentially rely on any sufficiently large (unstructured) collection of coherent texts. Moreover, the approach facilitates the fitting of semantic models for specific users or groups of users. We present the results of extensive range of experiments from small to large scale, indicating that the proposed method is effective and competitive with the state-of-the-art.Comment: 37 pages, 8 figures A short version of this paper was already published at ECML/PKDD 201

arXiv.org e-Print Archive

CiteSeerX

A pattern mining approach for information filtering systems

Author: Algarni Abdulmohsen
Li Yuefeng
Xu Yue
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

It is a big challenge to clearly identify the boundary between positive and negative streams for information filtering systems. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on the RCV1 data collection, and substantial experiments show that the proposed approach achieves encouraging performance and the performance is also consistent for adaptive filtering as well

Queensland University of Technology ePrints Archive

The role of the frontal cortex in memory: an investigation of the Von Restorff effect

Author: Alexander
Anat Elhalal
Anderson
Atkinson
Baldo
Becker
Bjork
Bone
Bousfield
Cattell
Craik
Daffner
Davelaar
Davelaar
Davelaar
Davelaar
Deese
Delis
Duncan
Duncan
Duncan
Eddy J. Davelaar
Elhalal
Estes
Fabiani
Fabiani
Fletcher
Frith
Gershberg
Glanzer
Glanzer
Greene
Grossberg
Grossberg
Grossberg
Haarmann
Haarmann
Howard
Howard
Hunt
Incisa della Rocchetta
Kelley
Kishiyama
LÃ¸vstad
Marius Usher
McLaughlin
Melo
Mensink
Moscovitch
Moscovitch
O'Reilly
Polyn
Polyn
Raaijmakers
Ranganath
Roediger
Savage
Sederberg
Shimamura
Shimamura
Stricker
Tulving
Van Overschelde
Von Restorff
Ward
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

Evidence from neuropsychology and neuroimaging indicate that the pre-frontal cortex (PFC) plays an important role in human memory. Although frontal patients are able to form new memories, these memories appear qualitatively different from those of controls by lacking distinctiveness. Neuroimaging studies of memory indicate activation in the PFC under deep encoding conditions, and under conditions of semantic elaboration. Based on these results, we hypothesize that the PFC enhances memory by extracting differences and commonalities in the studied material. To test this hypothesis, we carried out an experimental investigation to test the relationship between the PFC-dependent factors and semantic factors associated with common and specific features of words. These experiments were performed using Free-Recall of word lists with healthy adults, exploiting the correlation between PFC function and fluid intelligence. As predicted, a correlation was found between fluid intelligence and the Von-Restorff effect (better memory for semantic isolates, e.g., isolate “cat” within category members of “fruit”). Moreover, memory for the semantic isolate was found to depend on the isolate's serial position. The isolate item tends to be recalled first, in comparison to non-isolates, suggesting that the process interacts with short term memory. These results are captured within a computational model of free recall, which includes a PFC mechanism that is sensitive to both commonality and distinctiveness, sustaining a trade-off between the two

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Birkbeck Institutional Research Online