Search CORE

48 research outputs found

Automatic classification of new articles in Spanish

Author: Calvo Rafael A.
Ceccatto Hermenegildo Alejandro
Cerviño Beresi U.
García Adeva Juan José
Publication venue
Publication date: 01/01/2004
Field of study

We apply machine learning techniques to the automatic classification of news articles from the local newspaper La Capital of Rosario, Argentina. The corpus (LCC) is an archive of approximately 75,000 manually categorized articles in Spanish published in 1991. We benchmark on LCC three widely used supervised learning methods: k-Nearest Neighbors, Na¨ ve Bayes and Arti ficial Neural Networks, illustrating the corpus properties.Eje: V - Workshop de agentes y sistemas inteligentesRed de Universidades con Carreras en Informátic

Automatic classification of new articles in Spanish

Author: Calvo Rafael A.
Ceccatto Hermenegildo Alejandro
Cerviño Beresi U.
García Adeva Juan José
Publication venue
Publication date: 17/10/2012
Field of study

Servicio de Difusión de la Creación Intelectual

Yapay Sinir Ağları ile Web İçeriklerini Sınıflandırma

Author: Güven Esra Nergis
Onur Hakan
Sağıroğlu Şeref
Publication venue: ÜNAK
Publication date: 01/01/2008
Field of study

Recent developments and widespread usage of the Internet have made business and processes to be completed faster and easily in electronic media. The increasing size of the stored, transferred and processed data brings many problems that affect access to information on the Web. Because of users’ need get to access to the information in electronic environment quickly, correctly and appropriately, different methods of classification and categorization of data are strictly needed. Millions of search engines should be supported with new approaches every day in order for users to get access to relevant information quickly. In this study, Multilayered Perceptrons (MLP) artificial neural network model is used to classify the web sites according to the specified subjects. A software is developed to select the feature vector, to train the neural network and finally to categorize the web sites correctly. It is considered that this intelligent approach will provide more accurate and secure platform to the Internet users for classifying web contents precisely

E-LIS

Bilgi Dünyası / Information World (E-Journal, University and Research Librarians Association, UNAK)

Using online linear classifiers to filter spam Emails

Author: Jones Gareth J.F.
Wang Bin
Wenfeng Pan
Publication venue: Springer Verlag
Publication date: 01/11/2007
Field of study

The performance of two online linear classifiers - the Perceptron and Littlestone’s Winnow – is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard Naïve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering

Irish Universities

DCU Online Research Access Service

Comparing Extant Story Classifiers: Results & New Directions

Author: Eisenberg Joshua D.
Finlayson Mark A.
Yarlott W. Victor H.
Publication venue: OASIcs - OpenAccess Series in Informatics. 7th Workshop on Computational Models of Narrative (CMN 2016)
Publication date: 01/01/2016
Field of study

Having access to a large set of stories is a necessary first step for robust and wide-ranging computational narrative modeling; happily, language data - including stories - are increasingly available in electronic form. Unhappily, the process of automatically separating stories from other forms of written discourse is not straightforward, and has resulted in a data collection bottleneck. Therefore researchers have sought to develop reliable, robust automatic algorithms for identifying story text mixed with other non-story text. In this paper we report on the reimplementation and experimental comparison of the two approaches to this task: Gordon\u27s unigram classifier, and Corman\u27s semantic triplet classifier. We cross-analyze their performance on both Gordon\u27s and Corman\u27s corpora, and discuss similarities, differences, and gaps in the performance of these classifiers, and point the way forward to improving their approaches

Dagstuhl Research Online Publication Server

Feature Reduction Using Ensemble Approach

Author: Hou Cuiqin
Sun Jun
Xia Yingju
Xu Zhuoran
Publication venue
Publication date: 01/01/2015
Field of study

Waseda University Repository

Text Classification in an Under-Resourced Language via Lexical Normalization and Feature Pooling

Author: Elahi Inam
Ijaz Ahsan
Kamiran Faisal
Karim Asim
Sohail Omayya
Publication venue: AIS Electronic Library (AISeL)
Publication date: 26/06/2018
Field of study

Automatic classification of textual content in an under-resourced language is challenging, since lexical resources and preprocessing tools are not available for such languages. Their bag-of-words (BoW) representation is usually highly sparse and noisy, and text classification built on such a representation yields poor performance. In this paper, we explore the effectiveness of lexical normalization of terms and statistical feature pooling for improving text classification in an under-resourced language. We focus on classifying citizen feedback on government services provided through SMS texts which are written predominantly in Roman Urdu (an informal forward transliterated version of the Urdu language). Our proposed methodology performs normalization of lexical variations of terms using phonetic and string similarity. It subsequently employs a supervised feature extraction technique to obtain category-specific highly discriminating features. Our experiments with classifiers reveal that significant improvement in classification performance is achieved by lexical normalization plus feature pooling over standard representations

AIS Electronic Library (AISeL)

Clasificación de prescripciones médicas en español

Author: Calot Enrique
Merlino Hernán
Rodríguez Juan Manuel
Publication venue
Publication date: 01/10/2014
Field of study

El siguiente trabajo describe la problemática de la clasificación de textos médicos libres en español. Y propone una solución basada en los algoritmos de clasificación de texto: Naïve Bayes Multinomial (NBM) y Support Vector Machines (SVMs) justificando dichas decisiones y mostrando los resultados obtenidos con ambos métodos.Eje: XV Workshop de Agentes y Sistemas InteligentesRed de Universidades con Carreras de Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Clasificación de distintos conjuntos de datos utilizados en evaluación de métodos de extracción de conocimiento creados para la web

Author: Bianco Santiago
García Martínez Ramón
Martins Sebastián
Publication venue
Publication date: 01/10/2016
Field of study

En varios artículos se han utilizado distintos textos de prueba, como datos de entrada para medir el desempeño de los métodos de extracción de relaciones semánticas para la Web (OIE): ReVerb y ClausIE. Sin embargo estos textos nunca han sido analizados para entender si ellos guardan o no similitudes o para saber si existe entre ellos un lenguaje común o pertenecen a un mismo dominio. Es la intención de este trabajo analizar dichos textos utilizando distintos algoritmos de clasificación. Y comprender si se pueden agrupar de una forma coherente, de tal suerte que a priori uno pueda identificar que textos son los que trabajan mejor con ClausIE y cuales con ReVerb.XIII Workshop Bases de datos y Minería de Datos (WBDMD).Red de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual