Search CORE

4,490 research outputs found

Text classification stream-based R-measure approach using frequency of substring repetition

Author: Ashurov Mikhail F.
Poddubny Vasiliy V.
Publication venue
Publication date: 01/01/2015
Field of study

Tomsk State University Repository

Verifying a Chinese collection for text categorization

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Rcv1: A new benchmark collection for text categorization research

Author: Lewis David D
Li Fan
Russell-Rose Tony
Yang Yiming
Publication venue
Publication date: 01/01/2004
Field of study

Reuters Corpus Volume I (RCV1) is an archive of over 800,000 manually categorized newswire stories recently made available by Reuters, Ltd. for research purposes. Use of this data for research on text categorization requires a detailed understanding of the real world constraints under which the data was produced. Drawing on interviews with Reuters personnel and access to Reuters documentation, we describe the coding policy and quality control procedures used in producing the RCV1 data, the intended semantics of the hierarchical category taxonomies, and the corrections necessary to remove errorful data. We refer to the original data as RCV1-v1, and the corrected data as RCV1-v2. We benchmark several widely used supervised learning methods on RCV1-v2, illustrating the collection’s properties, suggesting new directions for research, and providing baseline results for future studies. We make available detailed, per-category experimental results, as well a

CiteSeerX

Goldsmiths Research Online

A robust authorship attribution on big period

Author: Prasad Rajesh
Tamboli Mubin Shoukat
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/08/2019
Field of study

Authorship attribution is a task to identify the writer of unknown text and categorize it to known writer. Writing style of each author is distinct and can be used for the discrimination. There are different parameters responsible for rectifying such changes. When the writing samples collected for an author when it belongs to small period, it can participate efficiently for identification of unknown sample. In this paper author identification problem considered where writing sample is not available on the same time period. Such evidences collected over long period of time. And character n-gram, word n-gram and pos n-gram features used to build the model. As they are contributing towards style of writer in terms of content as well as statistic characteristic of writing style. We applied support vector machine algorithm for classification. Effective results and outcome came out from the experiments. While discriminating among multiple authors, corpus selection and construction were the most tedious task which was implemented effectively. It is observed that accuracy varied on feature type. Word and character n-gram have shown good accuracy than PoS n-gram

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Music Similarity Estimation

Author: Sridharan Anusha
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2018
Field of study

Music is a complicated form of communication, where creators and culture communicate and expose their individuality. After music digitalization took place, recommendation systems and other online services have become indispensable in the field of Music Information Retrieval (MIR). To build these systems and recommend the right choice of song to the user, classification of songs is required. In this paper, we propose an approach for finding similarity between music based on mid-level attributes like pitch, midi value corresponding to pitch, interval, contour and duration and applying text based classification techniques. Our system predicts jazz, metal and ragtime for western music. The experiment to predict the genre of music is conducted based on 450 music files and maximum accuracy achieved is 95.8% across different n-grams. We have also analyzed the Indian classical Carnatic music and are classifying them based on its raga. Our system predicts Sankarabharam, Mohanam and Sindhubhairavi ragas. The experiment to predict the raga of the song is conducted based on 95 music files and the maximum accuracy achieved is 90.3% across different n-grams. Performance evaluation is done by using the accuracy score of scikit-learn

SJSU ScholarWorks

Legal Documents Categorization by Compression

Author: Mastropaolo Antonio
Pallante Francesco
Radicioni Daniele Paolo
Publication venue: ACM - Association for Computing Machinery
Publication date: 01/01/2013
Field of study

Institutional Research Information System University of Turin

Analyzing spatial data from mouse tracker methodology: An entropic approach

Author: A Calcagnì
A Di Crescenzo
A Fishbach
A Johnson
A Resulaj
A Voss
Antonio Calcagnì
AP Georgopoulos
BT McClintock
C O’Really
CM Bergman
CN White
D Mottet
D Norris
DE Meyer
E Ciavolino
E Hehman
F Hwang
F Wang
G Chen
G Flodgren
GJ Koop
H Dillen
H Shimazaki
I Kapsouras
ID Jonsen
J Freeman
J Friedman
J Wang
J-H Song
JB Freeman
JC Lucero
JG Phillips
K McRae
L Barca
L Birgé
LJ Rips
LM Morett
Luigi Lombardi
M Rao
MG Glaholt
MJ Spivey
MJ Yap
MW Smith
N Hogan
N Walker
ND Duran
O Hauk
PH Eilers
R Dale
R Plamondon
R Plamondon
S Baratpour
S Brown
SE Engelbrecht
Simone Sulpizio
T Flash
TJ Faulkenberry
U Demšar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Archivio istituzionale della ricerca - Università di Padova