1,523 research outputs found
What are the Best Hierarchical Descriptors for Complex Networks?
This work reviews several hierarchical measurements of the topology of
complex networks and then applies feature selection concepts and methods in
order to quantify the relative importance of each measurement with respect to
the discrimination between four representative theoretical network models,
namely Erd\"{o}s-R\'enyi, Barab\'asi-Albert, Watts-Strogatz as well as a
geographical type of network. The obtained results confirmed that the four
models can be well-separated by using a combination of measurements. In
addition, the relative contribution of each considered feature for the overall
discrimination of the models was quantified in terms of the respective weights
in the canonical projection into two dimensions, with the traditional
clustering coefficient, hierarchical clustering coefficient and neighborhood
clustering coefficient resulting particularly effective. Interestingly, the
average shortest path length and hierarchical node degrees contributed little
for the separation of the four network models.Comment: 9 pages, 4 figure
Searching for differentially expressed gene combinations
We propose 'CorScor', a novel approach for identifying gene pairs with joint differential expression. This is defined as a situation with good phenotype discrimination in the bivariate, but not in the two marginal distributions. CorScor can be used to detect phenotype-related dependencies and interactions among genes. Our easily interpretable approach is scalable to current microarray dimensions and yields promising results on several cancer-gene-expression datasets
Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression
Random Forests (Breiman, 2001) is a successful and widely used regression and
classification algorithm. Part of its appeal and reason for its versatility is
its (implicit) construction of a kernel-type weighting function on training
data, which can also be used for targets other than the original mean
estimation. We propose a novel forest construction for multivariate responses
based on their joint conditional distribution, independent of the estimation
target and the data model. It uses a new splitting criterion based on the MMD
distributional metric, which is suitable for detecting heterogeneity in
multivariate distributions. The induced weights define an estimate of the full
conditional distribution, which in turn can be used for arbitrary and
potentially complicated targets of interest. The method is very versatile and
convenient to use, as we illustrate on a wide range of examples. The code is
available as Python and R packages drf
Measuring Categorical Perception in Color-Coded Scatterplots
Scatterplots commonly use color to encode categorical data. However, as
datasets increase in size and complexity, the efficacy of these channels may
vary. Designers lack insight into how robust different design choices are to
variations in category numbers. This paper presents a crowdsourced experiment
measuring how the number of categories and choice of color encodings used in
multiclass scatterplots influences the viewers' abilities to analyze data
across classes. Participants estimated relative means in a series of
scatterplots with 2 to 10 categories encoded using ten color palettes drawn
from popular design tools. Our results show that the number of categories and
color discriminability within a color palette notably impact people's
perception of categorical data in scatterplots and that the judgments become
harder as the number of categories grows. We examine existing palette design
heuristics in light of our results to help designers make robust color choices
informed by the parameters of their data.Comment: The paper has been accepted to the ACM CHI 2023. 14 pages, 7 figure
Phonological awareness in preschool age children with developmental disabilities
Reading skills are critically important for a child’s development and continued growth in school. The home and school literacy experiences of children who have developmental disabilities have been found to be qualitatively different from the experiences of their same age peers without disabilities. In addition to access to instruction, a number of intrinsic factors including cognitive ability, receptive language and expressive speech skills have been suggested as factors that may place children with developmental disabilities at a greater risk for limited development of reading skills. Currently, little is understood about how children who have developmental disabilities and may have limitations in productive speech learn to read. This study identifies key intrinsic and extrinsic factors that are related to the development of phonological awareness in 42 children between 4 years and 5 years 9 months of age with developmental disabilities and a range of speech abilities. Aims of this project were to 1- systematically assess children’s intrinsic factors of speech ability, receptive and expressive language and vocabulary, cognitive skills and phonological awareness to determine key intrinsic factors related to phonological awareness and 2- describe the extrinsic factors of home literacy experience and preschool literacy instruction provided to children. Children were found to have frequent and positive home literacy experiences. No significant correlations between speech ability and frequency of shared reading experiences were found. Parents reported low levels of preschool literacy instruction. Significant correlations were found between instruction in decoding and word recognition and children’s sound-symbol awareness. Correlations were found between the use of technology and media and Augmentative and Alternative Communication (AAC) and children’s speech ability. Positive, significant relationships were found between phonological awareness and all direct assessment measures of developmental skill, speech ability and early reading skills but were not found between phonological awareness and home or school literacy experiences. Speech ability did not predict a significant amount of variance in phonological awareness skill beyond what would be expected by cognitive development, receptive language and orthographic knowledge. This study provides important implications for practitioners and researchers alike concerning the factors related to early reading development in children with limited speech ability
Rapid Visual Categorization is not Guided by Early Salience-Based Selection
The current dominant visual processing paradigm in both human and machine
research is the feedforward, layered hierarchy of neural-like processing
elements. Within this paradigm, visual saliency is seen by many to have a
specific role, namely that of early selection. Early selection is thought to
enable very fast visual performance by limiting processing to only the most
salient candidate portions of an image. This strategy has led to a plethora of
saliency algorithms that have indeed improved processing time efficiency in
machine algorithms, which in turn have strengthened the suggestion that human
vision also employs a similar early selection strategy. However, at least one
set of critical tests of this idea has never been performed with respect to the
role of early selection in human vision. How would the best of the current
saliency models perform on the stimuli used by experimentalists who first
provided evidence for this visual processing paradigm? Would the algorithms
really provide correct candidate sub-images to enable fast categorization on
those same images? Do humans really need this early selection for their
impressive performance? Here, we report on a new series of tests of these
questions whose results suggest that it is quite unlikely that such an early
selection process has any role in human rapid visual categorization.Comment: 22 pages, 9 figure
Leaf Morphology, Taxonomy and Geometric Morphometrics: A Simplified Protocol for Beginners
Taxonomy relies greatly on morphology to discriminate groups. Computerized geometric morphometric methods for quantitative shape analysis measure, test and visualize differences in form in a highly effective, reproducible, accurate and statistically powerful way. Plant leaves are commonly used in taxonomic analyses and are particularly suitable to landmark based geometric morphometrics. However, botanists do not yet seem to have taken advantage of this set of methods in their studies as much as zoologists have done. Using free software and an example dataset from two geographical populations of sessile oak leaves, we describe in detailed but simple terms how to: a) compute size and shape variables using Procrustes methods; b) test measurement error and the main levels of variation (population and trees) using a hierachical design; c) estimate the accuracy of group discrimination; d) repeat this estimate after controlling for the effect of size differences on shape (i.e., allometry). Measurement error was completely negligible; individual variation in leaf morphology was large and differences between trees were generally bigger than within trees; differences between the two geographic populations were small in both size and shape; despite a weak allometric trend, controlling for the effect of size on shape slighly increased discrimination accuracy. Procrustes based methods for the analysis of landmarks were highly efficient in measuring the hierarchical structure of differences in leaves and in revealing very small-scale variation. In taxonomy and many other fields of botany and biology, the application of geometric morphometrics contributes to increase scientific rigour in the description of important aspects of the phenotypic dimension of biodiversity. Easy to follow but detailed step by step example studies can promote a more extensive use of these numerical methods, as they provide an introduction to the discipline which, for many biologists, is less intimidating than the often inaccessible specialistic literature
- …