Search CORE

10 research outputs found

Evolving rules for document classification

Author: A. Bergström
C. Apté
C.M. Tan
D. Montana
D.R. Tauritz
F. Sebastiani
G. Salton
H. Lodhi
J.R. Koza
K. Bennet
M. Damashek
T. Joachims
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

We describe a novel method for using Genetic Programming to create compact classification rules based on combinations of N-Grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that because the induced rules are meaningful to a human analyst they may have a number of other uses beyond classification and provide a basis for text mining applications

CiteSeerX

Crossref

Sheffield Hallam University Research Archive

UCL Discovery

Evolving Lucene search queries for text classification

Author: Hirsch Laurence
Hirsch R
Saeedi M
Publication venue
Publication date: 01/01/2007
Field of study

We describe a method for generating accurate, compact, human understandable text classifiers. Text datasets are indexed using Apache Lucene and Genetic Programs are used to construct Lucene search queries. Genetic programs acquire fitness by producing queries that are effective binary classifiers for a particular category when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from classification tasks

CiteSeerX

Crossref

Sheffield Hallam University Research Archive

Evolving text classification rules with genetic programming

Author: Anthony N.
Ebert D.
Hirsch L.
Joachims T.
Karanikas H.
Koza J. R.
Koza J. R.
Langdon W.B.
Laurence Hirsch
Lodhi H.
Masoud Saeedi
Montana D
Robin Hirsch
Salton G.
Van Rijsbergen C. J.
Publication venue: 'Informa UK Limited'
Publication date: 07/09/2005
Field of study

We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications

Crossref

Sheffield Hallam University Research Archive

Document clustering with evolved search queries

Author: Di Nuovo Alessandro
Hirsch Laurence
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/07/2017
Field of study

Search queries define a set of documents located in a collection and can be used to rank the documents by assigning each document a score according to their closeness to the query in the multidimensional space of weighted terms. In this paper, we describe a system whereby an island model genetic algorithm (GA) creates individuals which can generate a set of Apache Lucene search queries for the purpose of text document clustering. A cluster is specified by the documents returned by a single query in the set. Each document that is included in only one of the clusters adds to the fitness of the individual and each document that is included in more than one cluster will reduce the fitness. The method can be refined by using the ranking score of each document in the fitness test. The system has a number of advantages; in particular, the final search queries are easily understood and offer a simple explanation of the clusters, meaning that an extra cluster labelling stage is not required. We describe how the GA can be used to build queries and show results for clustering on various data sets and with different query sizes. Results are also compared with clusters built using the widely used k-means algorithm

Crossref

Sheffield Hallam University Research Archive

SPAM detection: Naïve bayesian classification and RPN expression-based LGP approaches compared

Author: A Guven
A Khorsi
AH Gandomi
AW Burks
C Sangeetha
Carlton Downey
CL Hamblin
E Stamatatos
GV Cormack
I Kononenko
J Pearl
L Hirsch
Lorrie Faith Cranor
M Basavaraju
M Brameier
M Matsumoto
M Zhang
PE Bennett
S Mukkamala
VA Yatsko
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/07/2016
Field of study

An investigation is performed of a machine learning algorithm and the Bayesian classifier in the spam-filtering context. The paper shows the advantage of the use of Reverse Polish Notation (RPN) expressions with feature extraction compared to the traditional Naïve Bayesian classifier used for spam detection assuming the same features. The performance of the two is investigated using a public corpus and a recent private spam collection, concluding that the system based on RPN LGP (Linear Genetic Programming) gave better results compared to two popularly used open source Bayesian spam filters. © Springer International Publishing Switzerland 2016

Crossref

Institutional repository of Tomas Bata University Library

Document Clustering with Evolved Multi-Word Search Queries Where the Number of Classes is Unknown

Author: Hirsch Laurence
Hirsch Robin
Ogunleye Bayode
Publication venue: SSRN
Publication date: 25/08/2023
Field of study

University of Brighton Research Portal

Document Clustering with Evolved Single Word Search Queries

Author: Di Nuovo Alessandro
Hirsch Laurence
Prasanna Haddela
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/08/2021
Field of study

We present a novel, hybrid approach for clustering text databases. We use a genetic algorithm to generate and evolve a set of single word search queries in Apache Lucene format. Clusters are formed as the set of documents matching a search query. The queries are optimized to maximize the number of documents returned and to minimize the overlap between clusters (documents returned by more than one query in a set). Optionally, the number of clusters can be specified in advance, which will normally result in an improvement in performance. Not all documents in a collection are returned by any of the search queries in a set, so once the search query evolution is completed a second stage is performed whereby a KNN algorithm is applied to assign all unassigned documents to their nearest cluster. We describe the method and compare effectiveness with other well-known existing systems on 8 different text datasets. We note that search query format has the qualitative benefits of being interpretable and providing an explanation of cluster construction

Sheffield Hallam University Research Archive

Categorización de texto usando técnicas de machine learning aplicado a la clasificación de reclamos en los procesos de la Universidad Tecnológica de Bolívar /

Author: Florián Noriega Jorge Andrés
Publication venue: Universidad Tecnológica de Bolívar
Publication date: 01/01/2013
Field of study

En la mayoría de las organizaciones no solo es suficiente con lograr los objetivos propuestos, también es relevante la forma como estos son alcanzados, es por esto que las organizaciones se encuentran en una constante búsqueda de mecanismos que les permitan mejorar los procesos utilizados en la consecución de los objetivos. Uno de estos procesos es el relacionado con el manejo de las quejas internas que se presentan en las empresas. Todos los procesos que existen en una organización deben ser de una u otra forma evaluados con el fin de garantizar la calidad de los mismos y evaluar las oportunidades de mejoras y las necesidades de cambio de estos. Mediante la recolección de reclamos las organizaciones se aseguran de mantener la calidad de todos sus procesos y asegurar la continua conveniencia, adecuación y eficacia de estos. Es por eso que los procesos de gestión de reclamos y quejas juegan un papel importante en el funcionamiento de cualquier tipo de empresa, y la Universidad Tecnológica de Bolívar no es la excepciónIncluye bibliografía, anexo

Universidad Tecnológica de Bolívar: Repositorio Digital

Management, Technology and Learning for Individuals, Organisations and Society in Turbulent Environments

Author: Putnik Goran
Ávila Paulo
Publication venue: 'Polytechnic of Porto'
Publication date: 01/01/2010
Field of study

This book presents the collection of fifty two papers which were presented on the First International Conference on BUSINESS SUSTAINABILITY ’08 - Management, Technology and Learning for Individuals, Organisations and Society in Turbulent Environments, held in Ofir, Portugal, from 25th to 27th of June, 2008. The main motive of the meeting was the growing awareness of the importance of the sustainability issue. This importance had emerged from the growing uncertainty of the market behaviour that leads to the characterization of the market, i.e. environment, as turbulent. Actually, the characterization of the environment as uncertain and turbulent reflects the fact that the traditional technocratic and/or socio-technical approaches cannot effectively and efficiently lead with the present situation. In other words, the rise of the sustainability issue means the quest for new instruments to deal with uncertainty and/or turbulence. The sustainability issue has a complex nature and solutions are sought in a wide range of domains and instruments to achieve and manage it. The domains range from environmental sustainability (referring to natural environment) through organisational and business sustainability towards social sustainability. Concerning the instruments for sustainability, they range from traditional engineering and management methodologies towards “soft” instruments such as knowledge, learning, creativity. The papers in this book address virtually whole sustainability problems space in a greater or lesser extent. However, although the uncertainty and/or turbulence, or in other words the dynamic properties, come from coupling of management, technology, learning, individuals, organisations and society, meaning that everything is at the same time effect and cause, we wanted to put the emphasis on business with the intention to address primarily the companies and their businesses. From this reason, the main title of the book is “Business Sustainability” but with the approach of coupling Management, Technology and Learning for individuals, organisations and society in Turbulent Environments. Concerning the First International Conference on BUSINESS SUSTAINABILITY, its particularity was that it had served primarily as a learning environment in which the papers published in this book were the ground for further individual and collective growth in understanding and perception of sustainability and capacity for building new instruments for business sustainability. In that respect, the methodology of the conference work was basically dialogical, meaning promoting dialog on the papers, but also including formal paper presentations. In this way, the conference presented a rich space for satisfying different authors’ and participants’ needs. Additionally, promoting the widest and global learning environment and participativeness, the Conference Organisation provided the broadcasting over Internet of the Conference sessions, dialogical and formal presentations, for all authors’ and participants’ institutions, as an innovative Conference feature. In these terms, this book could also be understood as a complementary instrument to the Conference authors’ and participants’, but also to the wider readerships’ interested in the sustainability issues. The book brought together 97 authors from 10 countries, namely from Australia, Finland, France, Germany, Ireland, Portugal, Russia, Serbia, Sweden and United Kingdom. The authors “ranged” from senior and renowned scientists to young researchers providing a rich and learning environment. At the end, the editors hope and would like that this book will be useful, meeting the expectation of the authors and wider readership and serving for enhancing the individual and collective learning, and to incentive further scientific development and creation of new papers. Also, the editors would use this opportunity to announce the intention to continue with new editions of the conference and subsequent editions of accompanying books on the subject of BUSINESS SUSTAINABILITY, the second of which is planned for year 2011.info:eu-repo/semantics/publishedVersio

Repositório Científico do Instituto Politécnico do Porto

Autonomous Document Classification for Business

Author: Chris Clack
Jonny Farringdon
Peter Lidwell
Tina Yu
Publication venue: ACM Press
Publication date: 01/01/1997
Field of study

With the continuing exponential growth of the Internet and the more recent growth of business Intranets, the commercial world is becoming increasingly aware of the problem of electronic information overload. This has encouraged interest in developing agents/softbots that can act as electronic personal assistants and can develop and adapt representations of users information needs, commonly known as profiles. As th

CiteSeerX