300,968 research outputs found
Analyzing collaborative learning processes automatically
In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a publicly available tool set called TagHelper tools. Analyzing the variety of pedagogically valuable facets of learners’ interactions is a time consuming and effortful process. Improving automated analyses of such highly valued processes of collaborative learning by adapting and applying recent text classification technologies would make it a less arduous task to obtain insights from corpus data. This endeavor also holds the potential for enabling substantially improved on-line instruction both by providing teachers and facilitators with reports about the groups they are moderating and by triggering context sensitive collaborative learning support on an as-needed basis. In this article, we report on an interdisciplinary research project, which has been investigating the effectiveness of applying text classification technology to a large CSCL corpus that has been analyzed by human coders using a theory-based multidimensional coding scheme. We report promising results and include an in-depth discussion of important issues such as reliability, validity, and efficiency that should be considered when deciding on the appropriateness of adopting a new technology such as TagHelper tools. One major technical contribution of this work is a demonstration that an important piece of the work towards making text classification technology effective for this purpose is designing and building linguistic pattern detectors, otherwise known as features, that can be extracted reliably from texts and that have high predictive power for the categories of discourse actions that the CSCL community is interested in
On the factors causing processing difficulty of multiple-scene displays
Multiplex viewing of static or dynamic scenes is an increasing feature of screen media. Most existing multiplex experiments have examined detection across increasing scene numbers, but currently no systematic evaluation of the factors that might produce difficulty in processing multiplexes exists. Across five experiments we provide such an evaluation. Experiment 1 characterises difficulty in change detection when the number of scenes is increased. Experiment 2 reveals that the increased difficulty across multiple-scene displays is caused by the total amount of visual information accounts for differences in change detection times, regardless of whether this information is presented across multiple scenes, or contained in one scene. Experiment 3 shows that whether quadrants of a display were drawn from the same, or different scenes did not affect change detection performance. Experiment 4 demonstrates that knowing which scene the change will occur in means participants can perform at monoplex level. Finally, Experiment 5 finds that changes of central interest in multiplexed scenes are detected far easier than marginal interest changes to such an extent that a centrally interesting object removal in nine screens is detected more rapidly than a marginally interesting object removal in four screens. Processing multiple-screen displays therefore seems dependent on the amount of information, and the importance of that information to the task, rather than simply the number of scenes in the display. We discuss the theoretical and applied implications of these findings
Why we (usually) don't have to worry about multiple comparisons
This is an Accepted Manuscript of an article published by Taylor & Francis Group in Journal Of Research On Educational Effectiveness on 04/03/2012, available online: https://doi.org/10.1080/19345747.2011.618213Applied researchers often find themselves making statistical inferences in settings that would seem to require multiple comparisons adjustments. We challenge the Type I error paradigm that underlies these corrections. Moreover we posit that the problem of multiple comparisons can disappear entirely when viewed from a hierarchical Bayesian perspective. We propose building multilevel models in the settings where multiple comparisons arise. Multilevel models perform partial pooling (shifting estimates toward each other), whereas classical procedures typically keep the centers of intervals stationary, adjusting for multiple comparisons by making the intervals wider (or, equivalently, adjusting the p values corresponding to intervals of fixed width). Thus, multilevel models address the multiple comparisons problem and also yield more efficient estimates, especially in settings with low group-level variation, which is where multiple comparisons are a particular concern
Why we (usually) don't have to worry about multiple comparisons
Applied researchers often find themselves making statistical inferences in
settings that would seem to require multiple comparisons adjustments. We
challenge the Type I error paradigm that underlies these corrections. Moreover
we posit that the problem of multiple comparisons can disappear entirely when
viewed from a hierarchical Bayesian perspective. We propose building multilevel
models in the settings where multiple comparisons arise.
Multilevel models perform partial pooling (shifting estimates toward each
other), whereas classical procedures typically keep the centers of intervals
stationary, adjusting for multiple comparisons by making the intervals wider
(or, equivalently, adjusting the -values corresponding to intervals of fixed
width). Thus, multilevel models address the multiple comparisons problem and
also yield more efficient estimates, especially in settings with low
group-level variation, which is where multiple comparisons are a particular
concern
An Empirical Comparison of Multiple Imputation Methods for Categorical Data
Multiple imputation is a common approach for dealing with missing values in
statistical databases. The imputer fills in missing values with draws from
predictive models estimated from the observed data, resulting in multiple,
completed versions of the database. Researchers have developed a variety of
default routines to implement multiple imputation; however, there has been
limited research comparing the performance of these methods, particularly for
categorical data. We use simulation studies to compare repeated sampling
properties of three default multiple imputation methods for categorical data,
including chained equations using generalized linear models, chained equations
using classification and regression trees, and a fully Bayesian joint
distribution based on Dirichlet Process mixture models. We base the simulations
on categorical data from the American Community Survey. In the circumstances of
this study, the results suggest that default chained equations approaches based
on generalized linear models are dominated by the default regression tree and
Bayesian mixture model approaches. They also suggest competing advantages for
the regression tree and Bayesian mixture model approaches, making both
reasonable default engines for multiple imputation of categorical data. A
supplementary material for this article is available online
Metrics for Graph Comparison: A Practitioner's Guide
Comparison of graph structure is a ubiquitous task in data analysis and
machine learning, with diverse applications in fields such as neuroscience,
cyber security, social network analysis, and bioinformatics, among others.
Discovery and comparison of structures such as modular communities, rich clubs,
hubs, and trees in data in these fields yields insight into the generative
mechanisms and functional properties of the graph.
Often, two graphs are compared via a pairwise distance measure, with a small
distance indicating structural similarity and vice versa. Common choices
include spectral distances (also known as distances) and distances
based on node affinities. However, there has of yet been no comparative study
of the efficacy of these distance measures in discerning between common graph
topologies and different structural scales.
In this work, we compare commonly used graph metrics and distance measures,
and demonstrate their ability to discern between common topological features
found in both random graph models and empirical datasets. We put forward a
multi-scale picture of graph structure, in which the effect of global and local
structure upon the distance measures is considered. We make recommendations on
the applicability of different distance measures to empirical graph data
problem based on this multi-scale view. Finally, we introduce the Python
library NetComp which implements the graph distances used in this work
Resolving Structure in Human Brain Organization: Identifying Mesoscale Organization in Weighted Network Representations
Human brain anatomy and function display a combination of modular and
hierarchical organization, suggesting the importance of both cohesive
structures and variable resolutions in the facilitation of healthy cognitive
processes. However, tools to simultaneously probe these features of brain
architecture require further development. We propose and apply a set of methods
to extract cohesive structures in network representations of brain connectivity
using multi-resolution techniques. We employ a combination of soft
thresholding, windowed thresholding, and resolution in community detection,
that enable us to identify and isolate structures associated with different
weights. One such mesoscale structure is bipartivity, which quantifies the
extent to which the brain is divided into two partitions with high connectivity
between partitions and low connectivity within partitions. A second,
complementary mesoscale structure is modularity, which quantifies the extent to
which the brain is divided into multiple communities with strong connectivity
within each community and weak connectivity between communities. Our methods
lead to multi-resolution curves of these network diagnostics over a range of
spatial, geometric, and structural scales. For statistical comparison, we
contrast our results with those obtained for several benchmark null models. Our
work demonstrates that multi-resolution diagnostic curves capture complex
organizational profiles in weighted graphs. We apply these methods to the
identification of resolution-specific characteristics of healthy weighted graph
architecture and altered connectivity profiles in psychiatric disease.Comment: Comments welcom
- …