1,992 research outputs found

    Faster K-Means Cluster Estimation

    Full text link
    There has been considerable work on improving popular clustering algorithm `K-means' in terms of mean squared error (MSE) and speed, both. However, most of the k-means variants tend to compute distance of each data point to each cluster centroid for every iteration. We propose a fast heuristic to overcome this bottleneck with only marginal increase in MSE. We observe that across all iterations of K-means, a data point changes its membership only among a small subset of clusters. Our heuristic predicts such clusters for each data point by looking at nearby clusters after the first iteration of k-means. We augment well known variants of k-means with our heuristic to demonstrate effectiveness of our heuristic. For various synthetic and real-world datasets, our heuristic achieves speed-up of up-to 3 times when compared to efficient variants of k-means.Comment: 6 pages, Accepted at ECIR 201

    Infant cortex responds to other humans from shortly after birth

    Get PDF
    A significant feature of the adult human brain is its ability to selectively process information about conspecifics. Much debate has centred on whether this specialization is primarily a result of phylogenetic adaptation, or whether the brain acquires expertise in processing social stimuli as a result of its being born into an intensely social environment. Here we study the haemodynamic response in cortical areas of newborns (1–5 days old) while they passively viewed dynamic human or mechanical action videos. We observed activation selective to a dynamic face stimulus over bilateral posterior temporal cortex, but no activation in response to a moving human arm. This selective activation to the social stimulus correlated with age in hours over the first few days post partum. Thus, even very limited experience of face-to-face interaction with other humans may be sufficient to elicit social stimulus activation of relevant cortical regions

    CWRML: representing crop wild relative conservation and use data in XML

    Get PDF
    Background Crop wild relatives are wild species that are closely related to crops. They are valuable as potential gene donors for crop improvement and may help to ensure food security for the future. However, they are becoming increasingly threatened in the wild and are inadequately conserved, both in situ and ex situ. Information about the conservation status and utilisation potential of crop wild relatives is diverse and dispersed, and no single agreed standard exists for representing such information; yet, this information is vital to ensure these species are effectively conserved and utilised. The European Community-funded project, European Crop Wild Relative Diversity Assessment and Conservation Forum, determined the minimum information requirements for the conservation and utilisation of crop wild relatives and created the Crop Wild Relative Information System, incorporating an eXtensible Markup Language (XML) schema to aid data sharing and exchange. Results Crop Wild Relative Markup Language (CWRML) was developed to represent the data necessary for crop wild relative conservation and ensure that they can be effectively utilised for crop improvement. The schema partitions data into taxon-, site-, and population-specific elements, to allow for integration with other more general conservation biology schemata which may emerge as accepted standards in the future. These elements are composed of sub-elements, which are structured in order to facilitate the use of the schema in a variety of crop wild relative conservation and use contexts. Pre-existing standards for data representation in conservation biology were reviewed and incorporated into the schema as restrictions on element data contents, where appropriate. Conclusion CWRML provides a flexible data communication format for representing in situ and ex situ conservation status of individual taxa as well as their utilisation potential. The development of the schema highlights a number of instances where additional standards-development may be valuable, particularly with regard to the representation of population-specific data and utilisation potential. As crop wild relatives are intrinsically no different to other wild plant species there is potential for the inclusion of CWRML data elements in the emerging standards for representation of biodiversity data

    The importance of specifying and studying causal mechanisms in school-based randomised controlled trials: lessons from two studies of cross-age peer tutoring

    Get PDF
    Based on the experience of evaluating 2 cross-age peer-tutoring interventions, we argue that researchers need to pay greater attention to causal mechanisms within the context of school-based randomised controlled trials. Without studying mechanisms, researchers are less able to explain the underlying causal processes that give rise to results from randomised controlled trials. Studying implementation fidelity is necessary but not sufficient for causal explanation; the study of causal mechanisms through the application of mixed methods is also required. Due to the increasingly complicated nature of many classroom-based innovations that are subject to evaluation, and the potentially distal nature of hypothesised effects, particularly on attainment, programme theory and articulation of mechanisms are essential in enhancing causal explanation and promoting the accumulation of knowledge of what works and why in classroom settings

    Durham Shared Maths Project. Evaluation report and executive summary

    Get PDF
    Published in July 2015, this report details the findings of the Durham University Shared Maths intervention project on pupils from 82 primary schools across four local authorities. The intervention was a cross-age peer tutoring, developed at Durham University, where older pupils (Year 5/Year 6) work with younger pupils (Year 3/Year 4) to discuss and work through maths problems using a structured approach. The intervention was delivered by teachers, with training and support from a Local Co-ordinator and participating pupils spent 20 minutes each week using the approach, for two blocks of 16 weeks over consecutive years

    k is the Magic Number -- Inferring the Number of Clusters Through Nonparametric Concentration Inequalities

    Full text link
    Most convex and nonconvex clustering algorithms come with one crucial parameter: the kk in kk-means. To this day, there is not one generally accepted way to accurately determine this parameter. Popular methods are simple yet theoretically unfounded, such as searching for an elbow in the curve of a given cost measure. In contrast, statistically founded methods often make strict assumptions over the data distribution or come with their own optimization scheme for the clustering objective. This limits either the set of applicable datasets or clustering algorithms. In this paper, we strive to determine the number of clusters by answering a simple question: given two clusters, is it likely that they jointly stem from a single distribution? To this end, we propose a bound on the probability that two clusters originate from the distribution of the unified cluster, specified only by the sample mean and variance. Our method is applicable as a simple wrapper to the result of any clustering method minimizing the objective of kk-means, which includes Gaussian mixtures and Spectral Clustering. We focus in our experimental evaluation on an application for nonconvex clustering and demonstrate the suitability of our theoretical results. Our \textsc{SpecialK} clustering algorithm automatically determines the appropriate value for kk, without requiring any data transformation or projection, and without assumptions on the data distribution. Additionally, it is capable to decide that the data consists of only a single cluster, which many existing algorithms cannot

    Decentralized Estimation over Orthogonal Multiple-access Fading Channels in Wireless Sensor Networks - Optimal and Suboptimal Estimators

    Get PDF
    Optimal and suboptimal decentralized estimators in wireless sensor networks (WSNs) over orthogonal multiple-access fading channels are studied in this paper. Considering multiple-bit quantization before digital transmission, we develop maximum likelihood estimators (MLEs) with both known and unknown channel state information (CSI). When training symbols are available, we derive a MLE that is a special case of the MLE with unknown CSI. It implicitly uses the training symbols to estimate the channel coefficients and exploits the estimated CSI in an optimal way. To reduce the computational complexity, we propose suboptimal estimators. These estimators exploit both signal and data level redundant information to improve the estimation performance. The proposed MLEs reduce to traditional fusion based or diversity based estimators when communications or observations are perfect. By introducing a general message function, the proposed estimators can be applied when various analog or digital transmission schemes are used. The simulations show that the estimators using digital communications with multiple-bit quantization outperform the estimator using analog-and-forwarding transmission in fading channels. When considering the total bandwidth and energy constraints, the MLE using multiple-bit quantization is superior to that using binary quantization at medium and high observation signal-to-noise ratio levels

    Don’t turn your back on the symptoms of psychosis : a proof-of-principle, quasi-experimental public health trial to reduce the duration of untreated psychosis in Birmingham, UK

    Get PDF
    Background: Reducing the duration of untreated psychosis (DUP) is an aspiration of international guidelines for first episode psychosis; however, public health initiatives have met with mixed results. Systematic reviews suggest that greater focus on the sources of delay within care pathways, (which will vary between healthcare settings) is needed to achieve sustainable reductions in DUP (BJP 198: 256-263; 2011). Methods/Design: A quasi-experimental trial, comparing a targeted intervention area with a ‘detection as usual’ area in the same city. A proof-of–principle trial, no a priori assumptions are made regarding effect size; key outcome will be an estimate of the potential effect size for a definitive trial. DUP and number of new cases will be collected over an 18-month period in target and control areas and compared; historical data on DUP collected in both areas over the previous three years, will serve as a benchmark. The intervention will focus on reducing two significant DUP component delays within the overall care pathway: delays within the mental health service and help-seeking delay. Discussion: This pragmatic trial will be the first to target known delays within the care pathway for those with a first episode of psychosis. If successful, this will provide a generalizable methodology that can be implemented in a variety of healthcare contexts with differing sources of delay. Trial registration: http://www.controlled-trials.com/ISRCTN45058713 Keywords: Public mental health campaign, First-episode psychosis, Early detection, Duration of untreated psychosis, Youth mental healt
    • …
    corecore