4 research outputs found
Application of a simple likelihood ratio approximant to protein sequence classification
Abstract
Motivation: Likelihood ratio approximants (LRA) have been widely used for model comparison in statistics. The present study was undertaken in order to explore their utility as a scoring (ranking) function in the classification of protein sequences.
Results: We used a simple LRA-based on the maximal similarity (or minimal distance) scores of the two top ranking sequence classes. The scoring methods (Smith–Waterman, BLAST, local alignment kernel and compression based distances) were compared on datasets designed to test sequence similarities between proteins distantly related in terms of structure or evolution. It was found that LRA-based scoring can significantly outperform simple scoring methods.
Contact: [email protected].
Supplementary information:
A multi-granularity pattern-based sequence classification framework for educational data
In many application domains, such as education, sequences of events occurring over time need to be studied in order to understand the generative process behind these sequences, and hence classify new examples. In this paper, we propose a novel multi-granularity sequence lassification framework that generates features based on frequent patterns at multiple levels of time granularity. Feature selection techniques are applied to identify the most informative features that are then used to construct the classification model. We show the applicability and suitability of the proposed framework to the area of educational data mining by experimenting on an educational dataset collected from an asynchronous communication
tool in which students interact to accomplish an underlying
group project. The experimental results showed that our model can achieve competitive performance in detecting the students' roles in their corresponding projects, compared to a baseline similarity-based approach
Detecting hierarchical relationships and roles from online interaction networks
In social networks, analysing the explicit interactions among users can help in
inferring hierarchical relationships and roles that may be implicit. In this thesis,
we focus on two objectives: detecting hierarchical relationships between users and
inferring the hierarchical roles of users interacting via the same online communication
medium. In both cases, we show that considering the temporal dimension of
interaction substantially improves the detection of relationships and roles.
The first focus of this thesis is on the problem of inferring implicit relationships
from interactions between users. Based on promising results obtained by standard
link-analysis methods such as PageRank and Rooted-PageRank (RPR), we introduce
three novel time-based approaches, \Time-F" based on a defined time function,
Filter and Refine (FiRe) which is a hybrid approach based on RPR and Time-F,
and Time-sensitive Rooted-PageRank (T-RPR) which applies RPR in a way that
takes into account the time-dimension of interactions in the process of detecting
hierarchical ties.
We experiment on two datasets, the Enron email dataset to infer managersubordinate
relationships from email exchanges, and a scientific publication coauthorship
dataset to detect PhD advisor-advisee relationships from paper co-authorships.
Our experiments demonstrate that time-based methods perform better in terms of
recall. In particular T-RPR turns out to be superior over most recent competitor
methods as well as all other approaches we propose.
The second focus of this thesis is examining the online communication behaviour
of users working on the same activity in order to identify the different hierarchical
roles played by the users. We propose two approaches. In the first approach, supervised
learning is used to train different classification algorithms. In the second
approach, we address the problem as a sequence classification problem. A novel
sequence classification framework is defined that generates time-dependent features based on frequent patterns at multiple levels of time granularity. Our framework is
a
exible technique for sequence classification to be applied in different domains.
We experiment on an educational dataset collected from an asynchronous communication
tool used by students to accomplish an underlying group project. Our
experimental findings show that the first supervised approach achieves the best mapping
of students to their roles when the individual attributes of the students, information
about the reply relationships among them as well as quantitative time-based
features are considered. Similarly, our multi-granularity pattern-based framework
shows competitive performance in detecting the students' roles. Both approaches
are significantly better than the baselines considered