15 research outputs found

    Analyzing User Behavior in Collaborative Environments

    Get PDF
    Discrete sequences are the building blocks for many real-world problems in domains including genomics, e-commerce, and social sciences. While there are machine learning methods to classify and cluster sequences, they fail to explain what makes groups of sequences distinguishable. Although in some cases having a black box model is sufficient, there is a need for increased explainability in research areas focused on human behaviors. For example, psychologists are less interested in having a model that predicts human behavior with high accuracy and more concerned with identifying differences between actions that lead to divergent human behavior. This dissertation presents techniques for understanding differences between classes of discrete sequences. We leveraged our developed approaches to study two online collaborative environments: GitHub, a software development platform, and Minecraft, a multiplayer online game. The first approach measures the differences between groups of sequences by comparing k-gram representations of sequences using the silhouette score and characterizing the differences by analyzing the distance matrix of subsequences. The second approach discovers subsequences that are significantly more similar to one set of sequences vs. other sets. This approach, which is called contrast motif discovery, first finds a set of motifs for each group of sequences and then refines them to include the motifs that distinguish that group from other groups of sequences. Compared to existing methods, our technique is scalable and capable of handling long event sequences. Our first case study is GitHub. GitHub is a social coding platform that facilitates distributed, asynchronous collaborations in open source software development. It has an open API to collect metadata about users, repositories, and the activities of users on repositories. To study the dynamics of teams on GitHub, we focused on discrete event sequences that are generated when GitHub users perform actions on this platform. Specifically, we studied the differences that automated accounts (aka bots) make on software development processes and outcomes. We trained black box supervised learning methods to classify sequences of GitHub teams and then utilized our sequence analysis techniques to measure and characterize differences between event sequences of teams with bots and teams without bots. Teams with bots have relatively distinct event sequences from teams without bots in terms of the existence and frequency of short subsequences. Moreover, teams with bots have more novel and less repetitive sequences compared to teams with no bots. In addition, we discovered contrast motifs for human-bot and human-only teams. Our analysis of contrast motifs shows that in human-bot teams, discussions are scattered throughout other activities while in human-only teams discussions tend to cluster together. For our second case study, we applied our sequence mining approaches to analyze player behavior in Minecraft, a multiplayer online game that supports many forms of player collaboration. As a sandbox game, it provides players with a large amount of flexibility in deciding how to complete tasks; this lack of goal-orientation makes the problem of analyzing Minecraft event sequences more challenging than event sequences from more structured games. Using our approaches, we were able to measure and characterize differences between low-level sequences of high-level actions and despite variability in how different players accomplished the same tasks, we discovered contrast motifs for many player actions. Finally, we explored how the level of player collaboration affects the contrast motifs

    Evaluation of Hydrological Processes and Environmental Impacts of Free and Controlled Subsurface Drainage

    No full text
    Controlled drainage is a management strategy designed to mitigate water quality issues caused by subsurface drainage. To improve controlled drainage system management and better understand its hydrological and environmental effects, this study analyzed water table recession rate, as well as drain flow, nitrate and phosphorus loads of both free and controlled drainage systems, and simulated the hydrology of a free drainage system to evaluate surface runoff and ponding at the Davis Purdue Agricultural Center located in Eastern Indiana. Statistical analyses, including paired watershed approach and paired t-test, indicated that controlled drainage had a statistically significant effect (p-value \u3c0.01) on the rate of water table fall and reduced the water table recession rate by 29% to 62%. The slower recession rate caused by controlled drainage can have negative impacts on crop growth and trafficability by causing the water table to remain at a detrimental level for longer. This finding can be used by farmers and other decision-makers to improve the management of controlled drainage systems by actively managing the system during storm events. A method was developed to estimate drain flow during missing periods using the Hooghoudt equation and continuous water table observations. Estimated drain flow was combined with nutrient concentrations to show that controlled drainage decreased annual nitrate loads significantly (p\u3c0.05) by 25% and 39% in two paired plots, while annual soluble reactive phosphorus (SRP) and total phosphorus (TP) loads were not significantly different. These results underscore the potential of controlled drainage to reduce nitrate losses from drained landscapes with the higher level of outlet control during the non-growing season (winter) providing about 70% of annual water quality benefits and the lower level used during the growing season (summer) providing about 30%. Three different methods including monitored water table depth, a digital photo time series and the DRAINMOD model simulations were used to determine the generation process of surface ponding and runoff and the frequency of incidence. The estimated annual water balance indicated that only 7% of annual precipitation contributed to surface runoff. Results from both simulations and observations indicated that all of the ponding events were generated as a result of saturation excess process rather than infiltration excess. Overall, nitrate transport through controlled drainage was lower than free drainage, indicating the drainage water quality benefits of controlled drainage, but water table remained at a higher level for longer when drainage was controlled. This can have negative impacts on crop yields, when water table is above a detrimental level, and can also increase the potential of nutrient transport through surface runoff since the saturation excess was the main reason for generating runoff at this field

    Contrast Motif Discovery in Minecraft

    No full text
    Understanding event sequences is an important aspect of game analytics, since it is relevant to many player modeling questions. This paper introduces a method for analyzing event sequences by detecting contrasting motifs; the aim is to discover subsequences that are significantly more similar to one set of sequences vs. other sets. Compared to existing methods, our technique is scalable and capable of handling long event sequences. We applied our proposed sequence mining approach to analyze player behavior in Minecraft, a multiplayer online game that supports many forms of player collaboration. As a sandbox game, it provides players with a large amount of flexibility in deciding how to complete tasks; this lack of goal-orientation makes the problem of analyzing Minecraft event sequences more challenging than event sequences from more structured games. Using our approach, we were able to discover contrast motifs for many player actions, despite variability in how different players accomplished the same tasks. Furthermore, we explored how the level of player collaboration affects the contrast motifs. Although this paper focuses on applications within Minecraft, our tool, which we have made publicly available along with our dataset, can be used on any set of game event sequences

    Effect of Shoot Pruning and Flower Thinning on Quality and Quantity of Semi-Determinate Tomato (Lycopersicon esculentum Mill.)

    No full text
    There are many constraints of space, light and availability of fruits to harvest in tomatoes greenhouse. Therefore, two experiments were carried out to determine the effect of shoot pruning and flower thinning on quality and quantity of fruits of semi-determinate tomato in a greenhouse of the Faculty of Agriculture and Natural Resources, Persian Gulf University of Bushehr. Experimental design was randomized complete block designs in which the effect of shoot pruning (single branch pruning, double branch pruning, pyramidal pruning and control) or flower thinning (Cluster with 4 and 5 remained flowers and control) were studied separately. Results showed that, leaf area and plants yield were higher in treatments which were pruned than control. Yields from pyramidal pruning and cluster thinning with 5 remaining flowers were significantly higher than other treatments. On the other hand, qualitative study identified that pyramidal pruning increases vitamin C in fruits, but had no significant effect on total soluble solids

    Initializing Agent-Based Models With Clustering Archetypes

    No full text
    Agent-based models are a powerful tool for predicting population level behaviors; however their performance can be sensitive to the initial simulation conditions. This paper introduces a procedure for leveraging large datasets to initialize agent-based simulations in which the population is abstracted into a set of archetypes. We show that these archetypes can be discovered using clustering and evaluate the benefits of selecting clusters based on their stability over time. Our experiments on the GitHub dataset demonstrate that simulation runs performed with the clustering archetypes are more successful at predicting large-scale activity patterns

    Evaluation of the effective factors on Bipolar I Disorder frequent recurrence in a 5 years longitudinal study using generalized estimation equations method

    No full text
    Background and Purpose: Patients with Bipolar I Disorder recurrence experiences mood variation between manic and depression during the time. Hence, that is need to the longitudinal study on Bipolar Disorder patients. This study aims to evaluate the effective factors on Bipolar I Disorder frequent recurrence in 5 years longitudinal study using generalized estimation equations (GEE) method. Materials and Methods: Data were collected with repeated measurements on 255 Bipolar I Disorder patients in mazandaran, Iran, in a longitudinal study between 2007 and 2011. The outcome variable is Bipolar I Disorder recurrence, and the predictor variables are as follows: sex, age of onset, family history (Grade 1), economic status and education level. In this paper, SAS PROC GENMOD was used to apply GEE regression to the assessment of parameters corresponding to the factors causing recurrence. Results: The age was among 13-55 years and the average of age of onset was 24.1 years. Almost of patients were male and had economic status with (upper/middle) deciles and also had a diploma and under diploma education level. The results of GEE method showed that the covariate of family history (Grade 1) increased the odds of recurrence (odds ratio [OR] >1; P < 0.0500); and age of onset decreased the odds of recurrence in patients with Bipolar I Disorder (OR <1; P < 0.0500). Conclusion: Predictor variables in recurrence Bipolar I Disorder include first-degree relatives&rsquo; psychiatric family history and age of onset. Understanding this factors, and educate patients, and their families are valuable for the prevention and planning the treatment
    corecore