1,523 research outputs found

    What are the Best Hierarchical Descriptors for Complex Networks?

    Full text link
    This work reviews several hierarchical measurements of the topology of complex networks and then applies feature selection concepts and methods in order to quantify the relative importance of each measurement with respect to the discrimination between four representative theoretical network models, namely Erd\"{o}s-R\'enyi, Barab\'asi-Albert, Watts-Strogatz as well as a geographical type of network. The obtained results confirmed that the four models can be well-separated by using a combination of measurements. In addition, the relative contribution of each considered feature for the overall discrimination of the models was quantified in terms of the respective weights in the canonical projection into two dimensions, with the traditional clustering coefficient, hierarchical clustering coefficient and neighborhood clustering coefficient resulting particularly effective. Interestingly, the average shortest path length and hierarchical node degrees contributed little for the separation of the four network models.Comment: 9 pages, 4 figure

    Searching for differentially expressed gene combinations

    Get PDF
    We propose 'CorScor', a novel approach for identifying gene pairs with joint differential expression. This is defined as a situation with good phenotype discrimination in the bivariate, but not in the two marginal distributions. CorScor can be used to detect phenotype-related dependencies and interactions among genes. Our easily interpretable approach is scalable to current microarray dimensions and yields promising results on several cancer-gene-expression datasets

    Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression

    Full text link
    Random Forests (Breiman, 2001) is a successful and widely used regression and classification algorithm. Part of its appeal and reason for its versatility is its (implicit) construction of a kernel-type weighting function on training data, which can also be used for targets other than the original mean estimation. We propose a novel forest construction for multivariate responses based on their joint conditional distribution, independent of the estimation target and the data model. It uses a new splitting criterion based on the MMD distributional metric, which is suitable for detecting heterogeneity in multivariate distributions. The induced weights define an estimate of the full conditional distribution, which in turn can be used for arbitrary and potentially complicated targets of interest. The method is very versatile and convenient to use, as we illustrate on a wide range of examples. The code is available as Python and R packages drf

    Measuring Categorical Perception in Color-Coded Scatterplots

    Full text link
    Scatterplots commonly use color to encode categorical data. However, as datasets increase in size and complexity, the efficacy of these channels may vary. Designers lack insight into how robust different design choices are to variations in category numbers. This paper presents a crowdsourced experiment measuring how the number of categories and choice of color encodings used in multiclass scatterplots influences the viewers' abilities to analyze data across classes. Participants estimated relative means in a series of scatterplots with 2 to 10 categories encoded using ten color palettes drawn from popular design tools. Our results show that the number of categories and color discriminability within a color palette notably impact people's perception of categorical data in scatterplots and that the judgments become harder as the number of categories grows. We examine existing palette design heuristics in light of our results to help designers make robust color choices informed by the parameters of their data.Comment: The paper has been accepted to the ACM CHI 2023. 14 pages, 7 figure

    Phonological awareness in preschool age children with developmental disabilities

    Get PDF
    Reading skills are critically important for a child’s development and continued growth in school. The home and school literacy experiences of children who have developmental disabilities have been found to be qualitatively different from the experiences of their same age peers without disabilities. In addition to access to instruction, a number of intrinsic factors including cognitive ability, receptive language and expressive speech skills have been suggested as factors that may place children with developmental disabilities at a greater risk for limited development of reading skills. Currently, little is understood about how children who have developmental disabilities and may have limitations in productive speech learn to read. This study identifies key intrinsic and extrinsic factors that are related to the development of phonological awareness in 42 children between 4 years and 5 years 9 months of age with developmental disabilities and a range of speech abilities. Aims of this project were to 1- systematically assess children’s intrinsic factors of speech ability, receptive and expressive language and vocabulary, cognitive skills and phonological awareness to determine key intrinsic factors related to phonological awareness and 2- describe the extrinsic factors of home literacy experience and preschool literacy instruction provided to children. Children were found to have frequent and positive home literacy experiences. No significant correlations between speech ability and frequency of shared reading experiences were found. Parents reported low levels of preschool literacy instruction. Significant correlations were found between instruction in decoding and word recognition and children’s sound-symbol awareness. Correlations were found between the use of technology and media and Augmentative and Alternative Communication (AAC) and children’s speech ability. Positive, significant relationships were found between phonological awareness and all direct assessment measures of developmental skill, speech ability and early reading skills but were not found between phonological awareness and home or school literacy experiences. Speech ability did not predict a significant amount of variance in phonological awareness skill beyond what would be expected by cognitive development, receptive language and orthographic knowledge. This study provides important implications for practitioners and researchers alike concerning the factors related to early reading development in children with limited speech ability

    Rapid Visual Categorization is not Guided by Early Salience-Based Selection

    Full text link
    The current dominant visual processing paradigm in both human and machine research is the feedforward, layered hierarchy of neural-like processing elements. Within this paradigm, visual saliency is seen by many to have a specific role, namely that of early selection. Early selection is thought to enable very fast visual performance by limiting processing to only the most salient candidate portions of an image. This strategy has led to a plethora of saliency algorithms that have indeed improved processing time efficiency in machine algorithms, which in turn have strengthened the suggestion that human vision also employs a similar early selection strategy. However, at least one set of critical tests of this idea has never been performed with respect to the role of early selection in human vision. How would the best of the current saliency models perform on the stimuli used by experimentalists who first provided evidence for this visual processing paradigm? Would the algorithms really provide correct candidate sub-images to enable fast categorization on those same images? Do humans really need this early selection for their impressive performance? Here, we report on a new series of tests of these questions whose results suggest that it is quite unlikely that such an early selection process has any role in human rapid visual categorization.Comment: 22 pages, 9 figure

    Leaf Morphology, Taxonomy and Geometric Morphometrics: A Simplified Protocol for Beginners

    Get PDF
    Taxonomy relies greatly on morphology to discriminate groups. Computerized geometric morphometric methods for quantitative shape analysis measure, test and visualize differences in form in a highly effective, reproducible, accurate and statistically powerful way. Plant leaves are commonly used in taxonomic analyses and are particularly suitable to landmark based geometric morphometrics. However, botanists do not yet seem to have taken advantage of this set of methods in their studies as much as zoologists have done. Using free software and an example dataset from two geographical populations of sessile oak leaves, we describe in detailed but simple terms how to: a) compute size and shape variables using Procrustes methods; b) test measurement error and the main levels of variation (population and trees) using a hierachical design; c) estimate the accuracy of group discrimination; d) repeat this estimate after controlling for the effect of size differences on shape (i.e., allometry). Measurement error was completely negligible; individual variation in leaf morphology was large and differences between trees were generally bigger than within trees; differences between the two geographic populations were small in both size and shape; despite a weak allometric trend, controlling for the effect of size on shape slighly increased discrimination accuracy. Procrustes based methods for the analysis of landmarks were highly efficient in measuring the hierarchical structure of differences in leaves and in revealing very small-scale variation. In taxonomy and many other fields of botany and biology, the application of geometric morphometrics contributes to increase scientific rigour in the description of important aspects of the phenotypic dimension of biodiversity. Easy to follow but detailed step by step example studies can promote a more extensive use of these numerical methods, as they provide an introduction to the discipline which, for many biologists, is less intimidating than the often inaccessible specialistic literature
    • …
    corecore