4 research outputs found

    Application of a simple likelihood ratio approximant to protein sequence classification

    Get PDF
    Abstract Motivation: Likelihood ratio approximants (LRA) have been widely used for model comparison in statistics. The present study was undertaken in order to explore their utility as a scoring (ranking) function in the classification of protein sequences. Results: We used a simple LRA-based on the maximal similarity (or minimal distance) scores of the two top ranking sequence classes. The scoring methods (Smith–Waterman, BLAST, local alignment kernel and compression based distances) were compared on datasets designed to test sequence similarities between proteins distantly related in terms of structure or evolution. It was found that LRA-based scoring can significantly outperform simple scoring methods. Contact: [email protected]. Supplementary information:

    A multi-granularity pattern-based sequence classification framework for educational data

    Get PDF
    In many application domains, such as education, sequences of events occurring over time need to be studied in order to understand the generative process behind these sequences, and hence classify new examples. In this paper, we propose a novel multi-granularity sequence lassification framework that generates features based on frequent patterns at multiple levels of time granularity. Feature selection techniques are applied to identify the most informative features that are then used to construct the classification model. We show the applicability and suitability of the proposed framework to the area of educational data mining by experimenting on an educational dataset collected from an asynchronous communication tool in which students interact to accomplish an underlying group project. The experimental results showed that our model can achieve competitive performance in detecting the students' roles in their corresponding projects, compared to a baseline similarity-based approach

    Detecting hierarchical relationships and roles from online interaction networks

    Get PDF
    In social networks, analysing the explicit interactions among users can help in inferring hierarchical relationships and roles that may be implicit. In this thesis, we focus on two objectives: detecting hierarchical relationships between users and inferring the hierarchical roles of users interacting via the same online communication medium. In both cases, we show that considering the temporal dimension of interaction substantially improves the detection of relationships and roles. The first focus of this thesis is on the problem of inferring implicit relationships from interactions between users. Based on promising results obtained by standard link-analysis methods such as PageRank and Rooted-PageRank (RPR), we introduce three novel time-based approaches, \Time-F" based on a defined time function, Filter and Refine (FiRe) which is a hybrid approach based on RPR and Time-F, and Time-sensitive Rooted-PageRank (T-RPR) which applies RPR in a way that takes into account the time-dimension of interactions in the process of detecting hierarchical ties. We experiment on two datasets, the Enron email dataset to infer managersubordinate relationships from email exchanges, and a scientific publication coauthorship dataset to detect PhD advisor-advisee relationships from paper co-authorships. Our experiments demonstrate that time-based methods perform better in terms of recall. In particular T-RPR turns out to be superior over most recent competitor methods as well as all other approaches we propose. The second focus of this thesis is examining the online communication behaviour of users working on the same activity in order to identify the different hierarchical roles played by the users. We propose two approaches. In the first approach, supervised learning is used to train different classification algorithms. In the second approach, we address the problem as a sequence classification problem. A novel sequence classification framework is defined that generates time-dependent features based on frequent patterns at multiple levels of time granularity. Our framework is a exible technique for sequence classification to be applied in different domains. We experiment on an educational dataset collected from an asynchronous communication tool used by students to accomplish an underlying group project. Our experimental findings show that the first supervised approach achieves the best mapping of students to their roles when the individual attributes of the students, information about the reply relationships among them as well as quantitative time-based features are considered. Similarly, our multi-granularity pattern-based framework shows competitive performance in detecting the students' roles. Both approaches are significantly better than the baselines considered
    corecore