Search CORE

27,350 research outputs found

Two-layer classification and distinguished representations of users and documents for grouping and authorship identification

Author: Ahmed Amr
Mohtasseb Haytham
Publication venue
Publication date: 01/11/2009
Field of study

Most studies on authorship identification reported a drop in the identification result when the number of authors exceeds 20-25. In this paper, we introduce a new user representation to address this problem and split classification across two layers. There are at least 3 novelties in this paper. First, the two-layer approach allows applying authorship identification over larger number of authors (tested over 100 authors), and it is extendable. The authors are divided into groups that contain smaller number of authors. Given an anonymous document, the primary layer detects the group to which the document belongs. Then, the secondary layer determines the particular author inside the selected group. In order to extract the groups linking similar authors, clustering is applied over users rather than documents. Hence, the second novelty of this paper is introducing a new user representation that is different from document representation. Without the proposed user representation, the clustering over documents will result in documents of author(s) distributed over several clusters, instead of a single cluster membership for each author. Third, the extracted clusters are descriptive and meaningful of their users as the dimensions have psychological backgrounds. For authorship identification, the documents are labelled with the extracted groups and fed into machine learning to build classification models that predicts the group and author of a given document. The results show that the documents are highly correlated with the extracted corresponding groups, and the proposed model can be accurately trained to determine the group and the author identity

University of Lincoln Institutional Repository

Crossref

Edge Hill University Research Information Repository

The Five Factor Model of personality and evaluation of drug consumption risk

Author: A. Terracciano
A.N. Gorban
A.N. Gorban
A.N. Kopstein
C.A. Ventura
D.N. Gujarati
D.W. Hosmer Jr
D.W. Scott
E.M. Mirkes
E.M. Mirkes
F. Bulut
G. Biau
G.P. McCabe
H.F. Kaiser
I.D. Dinov
J. Hoare
J.R. Quinlan
K. Pearson
K.L. Clarkson
L. Guttman
M. Linting
M. Zuckerman
M.J. Cleveland
M.S. Stanford
P.T. Costa
Q. Li
R. Beaglehole
R.A. Fisher
R.R. McCrae
S. Arlot
S. Russell
S. Valeroa
S.Y. Lee
T. Bogg
T. Hastie
T. Hastie
V. Egan
Y. Benjamini
Y. Koren
Publication venue
Publication date: 15/01/2017
Field of study

The problem of evaluating an individual's risk of drug consumption and misuse is highly important. An online survey methodology was employed to collect data including Big Five personality traits (NEO-FFI-R), impulsivity (BIS-11), sensation seeking (ImpSS), and demographic information. The data set contained information on the consumption of 18 central nervous system psychoactive drugs. Correlation analysis demonstrated the existence of groups of drugs with strongly correlated consumption patterns. Three correlation pleiades were identified, named by the central drug in the pleiade: ecstasy, heroin, and benzodiazepines pleiades. An exhaustive search was performed to select the most effective subset of input features and data mining methods to classify users and non-users for each drug and pleiad. A number of classification methods were employed (decision tree, random forest,

k

-nearest neighbors, linear discriminant analysis, Gaussian mixture, probability density function estimation, logistic regression and na{\"i}ve Bayes) and the most effective classifier was selected for each drug. The quality of classification was surprisingly high with sensitivity and specificity (evaluated by leave-one-out cross-validation) being greater than 70\% for almost all classification tasks. The best results with sensitivity and specificity being greater than 75\% were achieved for cannabis, crack, ecstasy, legal highs, LSD, and volatile substance abuse (VSA).Comment: Significantly extended report with 67 pages, 27 tables, 21 figure

arXiv.org e-Print Archive

Crossref

Leicester Research Archive

ADBSCAN: Adaptive Density-Based Spatial Clustering of Applications with Noise for Identifying Clusters with Varying Densities

Author: Arif Rezoana Bente
Khan Mohammad Mahmudur Rahman
Oishe Mahjabin Rahman
Siddique Md. Abu Bakr
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/11/2018
Field of study

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm which has the high-performance rate for dataset where clusters have the constant density of data points. One of the significant attributes of this algorithm is noise cancellation. However, DBSCAN demonstrates reduced performances for clusters with different densities. Therefore, in this paper, an adaptive DBSCAN is proposed which can work significantly well for identifying clusters with varying densities.Comment: To be published in the 4th IEEE International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT 2018

arXiv.org e-Print Archive

Crossref

Typical Phone Use Habits: Intense Use Does Not Predict Negative Well-Being

Author: Arapakis Ioannis
Katevas Kleomenis
Pielot Martin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/07/2018
Field of study

Not all smartphone owners use their device in the same way. In this work, we uncover broad, latent patterns of mobile phone use behavior. We conducted a study where, via a dedicated logging app, we collected daily mobile phone activity data from a sample of 340 participants for a period of four weeks. Through an unsupervised learning approach and a methodologically rigorous analysis, we reveal five generic phone use profiles which describe at least 10% of the participants each: limited use, business use, power use, and personality- & externally induced problematic use. We provide evidence that intense mobile phone use alone does not predict negative well-being. Instead, our approach automatically revealed two groups with tendencies for lower well-being, which are characterized by nightly phone use sessions.Comment: 10 pages, 6 figures, conference pape

arXiv.org e-Print Archive

Crossref

Experiences in Mining Educational Data to Analyze Teacher's Performance: A Case Study with High Educational Teachers

Author: Almasri Abdelbaset
Publication venue
Publication date: 01/01/2017
Field of study

Educational Data Mining (EDM) is a new paradigm aiming to mine and extract knowledge necessary to optimize the effectiveness of teaching process. With normal educational system work it’s often unlikely to accomplish fine system optimizing due to large amount of data being collected and tangled throughout the system. EDM resolves this problem by its capability to mine and explore these raw data and as a consequence of extracting knowledge. This paper describes several experiments on real educational data wherein the effectiveness of Data Mining is explained in migration the educational data into knowledge. The experiments goal at first to identify important factors of teacher behaviors influencing student satisfaction. In addition to presenting experiences gained through the experiments, the paper aims to provide practical guidance of Data Mining solutions in a real application

PhilPapers