Search CORE

35 research outputs found

Analyzing User Behavior in Collaborative Environments

Author: Saadat Samaneh
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2020
Field of study

Discrete sequences are the building blocks for many real-world problems in domains including genomics, e-commerce, and social sciences. While there are machine learning methods to classify and cluster sequences, they fail to explain what makes groups of sequences distinguishable. Although in some cases having a black box model is sufficient, there is a need for increased explainability in research areas focused on human behaviors. For example, psychologists are less interested in having a model that predicts human behavior with high accuracy and more concerned with identifying differences between actions that lead to divergent human behavior. This dissertation presents techniques for understanding differences between classes of discrete sequences. We leveraged our developed approaches to study two online collaborative environments: GitHub, a software development platform, and Minecraft, a multiplayer online game. The first approach measures the differences between groups of sequences by comparing k-gram representations of sequences using the silhouette score and characterizing the differences by analyzing the distance matrix of subsequences. The second approach discovers subsequences that are significantly more similar to one set of sequences vs. other sets. This approach, which is called contrast motif discovery, first finds a set of motifs for each group of sequences and then refines them to include the motifs that distinguish that group from other groups of sequences. Compared to existing methods, our technique is scalable and capable of handling long event sequences. Our first case study is GitHub. GitHub is a social coding platform that facilitates distributed, asynchronous collaborations in open source software development. It has an open API to collect metadata about users, repositories, and the activities of users on repositories. To study the dynamics of teams on GitHub, we focused on discrete event sequences that are generated when GitHub users perform actions on this platform. Specifically, we studied the differences that automated accounts (aka bots) make on software development processes and outcomes. We trained black box supervised learning methods to classify sequences of GitHub teams and then utilized our sequence analysis techniques to measure and characterize differences between event sequences of teams with bots and teams without bots. Teams with bots have relatively distinct event sequences from teams without bots in terms of the existence and frequency of short subsequences. Moreover, teams with bots have more novel and less repetitive sequences compared to teams with no bots. In addition, we discovered contrast motifs for human-bot and human-only teams. Our analysis of contrast motifs shows that in human-bot teams, discussions are scattered throughout other activities while in human-only teams discussions tend to cluster together. For our second case study, we applied our sequence mining approaches to analyze player behavior in Minecraft, a multiplayer online game that supports many forms of player collaboration. As a sandbox game, it provides players with a large amount of flexibility in deciding how to complete tasks; this lack of goal-orientation makes the problem of analyzing Minecraft event sequences more challenging than event sequences from more structured games. Using our approaches, we were able to measure and characterize differences between low-level sequences of high-level actions and despite variability in how different players accomplished the same tasks, we discovered contrast motifs for many player actions. Finally, we explored how the level of player collaboration affects the contrast motifs

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Recommended from our members

Composing Deep Learning and Bayesian Nonparametric Methods

Author: Zhang Aonan
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Recent progress in Bayesian methods largely focus on non-conjugate models featured with extensive use of black-box functions: continuous functions implemented with neural networks. Using deep neural networks, Bayesian models can reasonably fit big data while at the same time capturing model uncertainty. This thesis targets at a more challenging problem: how do we model general random objects, including discrete ones, using random functions? Our conclusion is: many (discrete) random objects are in nature a composition of Poisson processes and random functions}. Thus, all discreteness is handled through the Poisson process while random functions captures the rest complexities of the object. Thus the title: composing deep learning and Bayesian nonparametric methods. This conclusion is not a conjecture. In spacial cases such as latent feature models , we can prove this claim by working on infinite dimensional spaces, and that is how Bayesian nonparametric kicks in. Moreover, we will assume some regularity assumptions on random objects such as exchangeability. Then the representations will show up magically using representation theorems. We will see this two times throughout this thesis. One may ask: when a random object is too simple, such as a non-negative random vector in the case of latent feature models, how can we exploit exchangeability? The answer is to aggregate infinite random objects and map them altogether onto an infinite dimensional space. And then assume exchangeability on the infinite dimensional space. We demonstrate two examples of latent feature models by (1) concatenating them as an infinite sequence (Section 2,3) and (2) stacking them as a 2d array (Section 4). Besides, we will see that Bayesian nonparametric methods are useful to model discrete patterns in time series data. We will showcase two examples: (1) using variance Gamma processes to model change points (Section 5), and (2) using Chinese restaurant processes to model speech with switching speakers (Section 6). We also aware that the inference problem can be non-trivial in popular Bayesian nonparametric models. In Section 7, we find a novel solution of online inference for the popular HDP-HMM model

Columbia University Academic Commons

The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE)

Author
Publication venue: Joint Conference on Language Evolution (JCoLE)
Publication date: 01/01/2022
Field of study

MPG.PuRe

Advances in Evolutionary Algorithms

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

With the recent trends towards massive data sets and significant computational power, combined with evolutionary algorithmic advances evolutionary computation is becoming much more relevant to practice. Aim of the book is to present recent improvements, innovative ideas and concepts in a part of a huge EA field

Directory of Open Access Books (DOAB)

Proceedings of the Fifth Workshop on Information Theoretic Methods in Science and Engineering

Author
Publication venue: CWI
Publication date: 01/12/2012
Field of study

These are the online proceedings of the Fifth Workshop on Information Theoretic Methods in Science and Engineering (WITMSE), which was held in the Trippenhuis, Amsterdam, in August 2012

CWI's Institutional Repository

Some theoretical and applied developments to support cognitive learning and adaptive testing

Author: Wang Shiyu
Publication venue
Publication date: 01/05/2016
Field of study

Cognitive diagnostic Modeling (CDM) and Computerized Adaptive Testing (CAT) are useful tools to measure subjects' latent abilities from two different aspects. CDM plays a very important role in the fine-grained assessment, where the primary purpose is to accurately classify subjects according to the skills or attributes they possess, while CAT is a useful tool for coarse-grained assessment, which provides a single number to indicate the student's overall ability. This thesis discusses and solves several theoretical and applied issues related to these two areas. The first problem we investigate related to a nonparametric classifier in Cognitive Diagnosis. Latent Class models for cognitive diagnosis have been developed to classify examinees into one of the 2K attribute profiles arising from a K-dimensional vector of binary skill indicators. These models recognize that response patterns tend to deviate from the ideal responses that would arise if skills and items generated item responses through a purely deterministic conjunctive process. An alternative to employing these latent class models is to minimize the distance between observed item response patterns and ideal response patterns, in a nonparametric fashion that utilizes no stochastic terms for these deviations. Theorems are presented that show the consistency of this approach, when the true model is one of several common latent class models for cognitive diagnosis. Consistency of classification is independent of sample size, because no model parameters need to be estimated. Simultaneous consistency for a large group of subjects can also be shown given some conditions on how sample size and test length grow with one another. The second issue we consider is still within CDM framework, however our focus is about the model misspecification. The maximum likelihood classification rule is a standard method to classify examinee attribute profiles in cognitive diagnosis models. Its asymptotic behavior is well understood when the model is assumed to be correct, but has not been explored in the case of misspecified latent class models. We investigate the consequences of using a simple model when the true model is different. In general, when a CDM is misspecified as a conjunctive model, the MLE for attribute profiles is not necessarily consistent. A sufficient condition for the MLE to be a consistent estimator under a misspecified DINA model is found. The true model can be any conjunctive models or even a compensatory model. Two examples are provided to show the consistency and inconsistency of the MLE under a misspecified DINA model. A Robust DINA MLE technique is proposed to overcome the inconsistency issue, and theorems are presented to show that it is a consistent estimator for attribute profile as long as the true model is a conjunctive model. Simulation results indicate that when the true model is a conjunctive model, the Robust DINA MLE and the DINA MLE based on the simulated item parameters can result in relatively good classification results even when the test length is short. These findings demonstrate that simple models can be fitted without severely affecting classification accuracy in some cases. The last one discusses and solves a controversial issue related to CAT. In Computerized Adaptive Testing (CAT), items are selected in real time and are adjusted to the test-taker's ability. A long debated question related to CAT is that they do not allow test-takers to review and revise their responses. The last chapter of this thesis presents a CAT design that preserves the efficiency of a conventional CAT, but allows test takers to revise their previous answers at any time during the test, and the only imposed restriction is on the number of revisions to the same item. The proposed method relies on a polytomous Item Response Theory model that is used to describe the first response to each item, as well as any subsequent revisions to it. The test-taker's ability is updated on-line with the maximizer of a partial likelihood function. I have established the strong consistency and asymptotic normality of the final ability estimator under minimal conditions on the test-taker's revision behavior. Simulation results also indicated this proposed design can reduce measurement error and is robust against several well-known test-taking strategies

Illinois Digital Environment for Access to Learning and Scholarship Repository

Action recognition in depth videos using nonparametric probabilistic graphical models

Author: Raman Natraj
Publication venue
Publication date
Field of study

Action recognition involves automatically labelling videos that contain human motion with action classes. It has applications in diverse areas such as smart surveillance, human computer interaction and content retrieval. The recent advent of depth sensing technology that produces depth image sequences has offered opportunities to solve the challenging action recognition problem. The depth images facilitate robust estimation of a human skeleton’s 3D joint positions and a high level action can be inferred from a sequence of these joint positions. A natural way to model a sequence of joint positions is to use a graphical model that describes probabilistic dependencies between the observed joint positions and some hidden state variables. A problem with these models is that the number of hidden states must be fixed a priori even though for many applications this number is not known in advance. This thesis proposes nonparametric variants of graphical models with the number of hidden states automatically inferred from data. The inference is performed in a full Bayesian setting by using the Dirichlet Process as a prior over the model’s infinite dimensional parameter space. This thesis describes three original constructions of nonparametric graphical models that are applied in the classification of actions in depth videos. Firstly, the action classes are represented by a Hidden Markov Model (HMM) with an unbounded number of hidden states. The formulation enables information sharing and discriminative learning of parameters. Secondly, a hierarchical HMM with an unbounded number of actions and poses is used to represent activities. The construction produces a simplified model for activity classification by using logistic regression to capture the relationship between action states and activity labels. Finally, the action classes are modelled by a Hidden Conditional Random Field (HCRF) with the number of intermediate hidden states learned from data. Tractable inference procedures based on Markov Chain Monte Carlo (MCMC) techniques are derived for all these constructions. Experiments with multiple benchmark datasets confirm the efficacy of the proposed approaches for action recognition

Birkbeck Institutional Research Online

ISIPTA'07: Proceedings of the Fifth International Symposium on Imprecise Probability: Theories and Applications

Author: De Cooman Gert
Vejnarová Jirina
Zaffalon Marco
Publication venue: SIPTA - International Society for Imprecise Probability: Theories and Applications
Publication date: 01/01/2007
Field of study

Ghent University Academic Bibliography

Archivsystem Ask23

Constructivism Learning: A Learning Paradigm for Transparent Predictive Analytics

Author: Li Xiaoli
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2018
Field of study

Aiming to achieve the learning capabilities possessed by intelligent beings, especially human, researchers in machine learning field have the long-standing tradition of bor- rowing ideas from human learning, such as reinforcement learning, active learning, and curriculum learning. Motivated by a philosophical theory called "constructivism", in this work, we propose a new machine learning paradigm, constructivism learning. The constructivism theory has had wide-ranging impact on various human learning theories about how human acquire knowledge. To adapt this human learning theory to the context of machine learning, we first studied how to improve leaning perfor- mance by exploring inductive bias or prior knowledge from multiple learning tasks with multiple data sources, that is multi-task multi-view learning, both in offline and lifelong setting. Then we formalized a Bayesian nonparametric approach using se- quential Dirichlet Process Mixture Models to support constructivism learning. To fur- ther exploit constructivism learning, we also developed a constructivism deep learning method utilizing Uniform Process Mixture Models

KU ScholarWorks

Sparse Classification - Methods & Applications

Author: Einarsson Gudmundur
Publication venue: DTU Compute
Publication date: 01/01/2018
Field of study

Online Research Database In Technology