Search CORE

4,011 research outputs found

Cache Hierarchy Inspired Compression: a Novel Architecture for Data Streams

Author: Holmes Geoffrey
Kirkby Richard Brendon
Pfahringer Bernhard
Publication venue
Publication date: 01/01/2006
Field of study

We present an architecture for data streams based on structures typically found in web cache hierarchies. The main idea is to build a meta level analyser from a number of levels constructed over time from a data stream. We present the general architecture for such a system and an application to classification. This architecture is an instance of the general wrapper idea allowing us to reuse standard batch learning algorithms in an inherently incremental learning environment. By artificially generating data sources we demonstrate that a hierarchy containing a mixture of models is able to adapt over time to the source of the data. In these experiments the hierarchies use an elementary performance based replacement policy and unweighted voting for making classification decisions

Research Commons@Waikato

Stacking classifiers for anti-spam filtering of e-mail

Author: Androutsopoulos I.
Karkaletsis V.
Paliouras G.
Sakkis G.
Spyropoulos C. D.
Stamatopoulos P.
Publication venue
Publication date: 01/01/2001
Field of study

We evaluate empirically a scheme for combining classifiers, known as stacked generalization, in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial e-mail, or "spam", floods mailboxes, causing frustration, wasting bandwidth, and exposing minors to unsuitable content. Using a public corpus, we show that stacking can improve the efficiency of automatically induced anti-spam filters, and that such filters can be used in real-life applications

arXiv.org e-Print Archive

CiteSeerX

Feature extraction and classification of movie reviews

Author: Awukam Awukam Ojang
Mtetwa Nhamo
Yousefi Mehdi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/05/2019
Field of study

ResearchOnline@GCU

Testing Market Response to Auditor Change Filings: a comparison of machine learning classifiers

Author: Holowczak Richard
Louton David
Saraoglu Hakan
Publication venue: Bryant Digital Repository
Publication date: 23/08/2018
Field of study

The use of textual information contained in company filings with the Securities Exchange Commission (SEC), including annual reports on Form 10-K, quarterly reports on Form 10-Q, and current reports on Form 8-K, has gained the increased attention of finance and accounting researchers. In this paper we use a set of machine learning methods to predict the market response to changes in a firm\u27s auditor as reported in public filings. We vectorize the text of 8-K filings to test whether the resulting feature matrix can explain the sign of the market response to the filing. Specifically, using classification algorithms and a sample consisting of the Item 4.01 text of 8-K documents, which provides information on changes in auditors of companies that are registered with the SEC, we predict the sign of the cumulative abnormal return (CAR) around 8-K filing dates. We report the correct classification performance and time efficiency of the classification algorithms. Our results show some improvement over the naïve classification method

DigitalCommons@Bryant University

Stemming text-based web page classification using machine learning algorithms: a comparison

Author: Daud S. M.
Razali A.
Shahidi F.
Zin N. A. M.
Publication venue: 'The Science and Information Organization'
Publication date: 01/01/2020
Field of study

The research aim is to determine the effect of word-stemming in web pages classification using different machine learning classifiers, namely Naive Bayes (NB), k-Nearest Neighbour (k-NN), Support Vector Machine (SVM) and Multilayer Perceptron (MP). Each classifiers' performance is evaluated in term of accuracy and processing time. This research uses BBC dataset that has five predefined categories. The result demonstrates that classifiers' performance is better without word stemming, whereby all classifiers show higher classification accuracy, with the highest accuracy produced by NB and SVM at 97% for F1 score, while NB takes shorter training time than SVM. With word stemming, the effect on training and classification time is negligible, except on Multilayer Perceptron in which word stemming has effectively reduced the training time

Universiti Teknologi Malaysia Institutional Repository

Proceedings of the 2nd Computer Science Student Workshop: Microsoft Istanbul, Turkey, April 9, 2011

Author
Publication venue: 'Sabanci University Information Center'
Publication date: 01/01/2011
Field of study

Sabanci University Research Database