18,503 research outputs found

    UPMASK: unsupervised photometric membership assignment in stellar clusters

    Full text link
    We develop a method for membership assignment in stellar clusters using only photometry and positions. The method, UPMASK, is aimed to be unsupervised, data driven, model free, and to rely on as few assumptions as possible. It is based on an iterative process, principal component analysis, clustering algorithm, and kernel density estimations. Moreover, it is able to take into account arbitrary error models. An implementation in R was tested on simulated clusters that covered a broad range of ages, masses, distances, reddenings, and also on real data of cluster fields. Running UPMASK on simulations showed that it effectively separates cluster and field populations. The overall spatial structure and distribution of cluster member stars in the colour-magnitude diagram were recovered under a broad variety of conditions. For a set of 360 simulations, the resulting true positive rates (a measurement of purity) and member recovery rates (a measurement of completeness) at the 90% membership probability level reached high values for a range of open cluster ages (107.1109.510^{7.1}-10^{9.5} yr), initial masses (0.510×1030.5-10\times10^3M_{\sun}) and heliocentric distances (0.54.00.5-4.0 kpc). UPMASK was also tested on real data from the fields of the open cluster Haffner~16 and of the closely projected clusters Haffner~10 and Czernik~29. These tests showed that even for moderate variable extinction and cluster superposition, the method yielded useful cluster membership probabilities and provided some insight into their stellar contents. The UPMASK implementation will be available at the CRAN archive.Comment: 12 pages, 13 figures, accepted for publication in Astronomy and Astrophysic

    Towards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data Management

    Full text link
    Spreadsheet software is the tool of choice for interactive ad-hoc data management, with adoption by billions of users. However, spreadsheets are not scalable, unlike database systems. On the other hand, database systems, while highly scalable, do not support interactivity as a first-class primitive. We are developing DataSpread, to holistically integrate spreadsheets as a front-end interface with databases as a back-end datastore, providing scalability to spreadsheets, and interactivity to databases, an integration we term presentational data management (PDM). In this paper, we make a first step towards this vision: developing a storage engine for PDM, studying how to flexibly represent spreadsheet data within a database and how to support and maintain access by position. We first conduct an extensive survey of spreadsheet use to motivate our functional requirements for a storage engine for PDM. We develop a natural set of mechanisms for flexibly representing spreadsheet data and demonstrate that identifying the optimal representation is NP-Hard; however, we develop an efficient approach to identify the optimal representation from an important and intuitive subclass of representations. We extend our mechanisms with positional access mechanisms that don't suffer from cascading update issues, leading to constant time access and modification performance. We evaluate these representations on a workload of typical spreadsheets and spreadsheet operations, providing up to 20% reduction in storage, and up to 50% reduction in formula evaluation time

    The Use of Selected Methods of Linear Ordering to Assess the Innovation Performance of the European Union Member States

    Get PDF
    The growing interest in measuring economic and social phenomena that are difficult to observe directly increases the need for researchers to broaden the use of multivariate statistical analysis methods. The ease of interpreting results presented in the form of rankings makes it common practice to use different methods of linear ordering of objects. If the appropriate assumptions are met, the determined set of variables allows for the construction of a synthetic measure whose ordered values provide a ranking. Such a statistical approach is quite often used in assessing the level of innovativeness of economies, and the literature abounds in various innovation indices. The starting point of this paper is a set of 27 variables on the basis of which the Summary Innovation Index is developed. After verifying the statistical assumptions and reducing the database to 21 diagnostic factors, the authors construct a total of nine innovation rankings, using different methods of linear ordering and selected procedures for normalisation of variables. The aim of the paper is therefore to assess the impact of selected methods of linear ordering (Hellwig’s method, TOPSIS method, GDM method) and various procedures for normalising variables (classic standardisation, positional standardisation, quotient transformation) on the final ranking of the EU Member States due to the level of their innovation performance. The obtained results confirm that the applied method of linear ordering and the selection of the normalisation procedure have an impact on the final ranking of the examined objects – in this case, the final ranking of the EU Member States due to the level of their innovativeness analysed in the presented research

    Thermal error modelling of machine tools based on ANFIS with fuzzy c-means clustering using a thermal imaging camera

    Get PDF
    Thermal errors are often quoted as being the largest contributor to CNC machine tool errors, but they can be effectively reduced using error compensation. The performance of a thermal error compensation system depends on the accuracy and robustness of the thermal error model and the quality of the inputs to the model. The location of temperature measurement must provide a representative measurement of the change in temperature that will affect the machine structure. The number of sensors and their locations are not always intuitive and the time required to identify the optimal locations is often prohibitive, resulting in compromise and poor results. In this paper, a new intelligent compensation system for reducing thermal errors of machine tools using data obtained from a thermal imaging camera is introduced. Different groups of key temperature points were identified from thermal images using a novel schema based on a Grey model GM (0, N) and Fuzzy c-means (FCM) clustering method. An Adaptive Neuro-Fuzzy Inference System with Fuzzy c-means clustering (FCM-ANFIS) was employed to design the thermal prediction model. In order to optimise the approach, a parametric study was carried out by changing the number of inputs and number of membership functions to the FCM-ANFIS model, and comparing the relative robustness of the designs. According to the results, the FCM-ANFIS model with four inputs and six membership functions achieves the best performance in terms of the accuracy of its predictive ability. The residual value of the model is smaller than ± 2 μm, which represents a 95% reduction in the thermally-induced error on the machine. Finally, the proposed method is shown to compare favourably against an Artificial Neural Network (ANN) model

    From structural to functional glycomics: core substitutions as molecular switches for shape and lectin affinity of N-glycans

    Get PDF
    Glycan epitopes of cellular glycoconjugates act as versatile biochemical signals (sugar coding). Here, we test the hypothesis that the common N-glycan modifications by core fucosylation and introduction of the bisecting N-acetylglucosamine moiety have long-range effects with functional consequences. Molecular dynamics simulations indicate a shift in conformational equilibria between linear extension or backfolding of the glycan antennae upon substitution. We also present a new fingerprint-like mode of presentation for this multi-parameter system. In order to delineate definite structure-function relationships, we strategically combined chemoenzymatic synthesis with bioassaying cell binding and the distribution of radioiodinated neoglycoproteins in vivo. Of clinical relevance, tailoring the core region affects serum clearance markedly, e. g., prolonging circulation time for the neoglycoprotein presenting the N-glycan with both substitutions. alpha 2,3-Sialylation is another means toward this end, similarly seen for type II branching in triantennary N-glycans. This discovery signifies that rational glycoengineering along the given lines is an attractive perspective to optimize pharmacokinetic behavior of glycosylated pharmaproteins. Of general importance for the concept of the sugar code, the presented results teach the fundamental lesson that N-glycan core substitutions convey distinct characteristics to the concerned oligosaccharide relevant for cis and trans biorecognition processes. These modifications are thus molecular switches

    Consensus clustering and functional interpretation of gene-expression data

    Get PDF
    Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus clustering, which provides such an advantage. When coupled with a statistically based gene functional analysis, our method allowed the identification of novel genes regulated by NFκB and the unfolded protein response in certain B-cell lymphomas

    Direct Phenotypic Screening in Mice: Identification of Individual, Novel Antinociceptive Compounds from a Library of 734 821 Pyrrolidine Bis-piperazines

    Get PDF
    The hypothesis in the current study is that the simultaneous direct in vivo testing of thousands to millions of systematically arranged mixture-based libraries will facilitate the identification of enhanced individual compounds. Individual compounds identified from such libraries may have increased specificity and decreased side effects early in the discovery phase. Testing began by screening ten diverse scaffolds as single mixtures (ranging from 17 340 to 4 879 681 compounds) for analgesia directly in the mouse tail withdrawal model. The “all X” mixture representing the library TPI-1954 was found to produce significant antinociception and lacked respiratory depression and hyperlocomotor effects using the Comprehensive Laboratory Animal Monitoring System (CLAMS). The TPI-1954 library is a pyrrolidine bis-piperazine and totals 738 192 compounds. This library has 26 functionalities at the first three positions of diversity made up of 28 392 compounds each (26 × 26 × 42) and 42 functionalities at the fourth made up of 19 915 compounds each (26 × 26 × 26). The 120 resulting mixtures representing each of the variable four positions were screened directly in vivo in the mouse 55 °C warm-water tail-withdrawal assay (ip administration). The 120 samples were then ranked in terms of their antinociceptive activity. The synthesis of 54 individual compounds was then carried out. Nine of the individual compounds produced dose-dependent antinociception equivalent to morphine. In practical terms what this means is that one would not expect multiexponential increases in activity as we move from the all-X mixture, to the positional scanning libraries, to the individual compounds. Actually because of the systematic formatting one would typically anticipate steady increases in activity as the complexity of the mixtures is reduced. This is in fact what we see in the current study. One of the final individual compounds identified, TPI 2213-17, lacked significant respiratory depression, locomotor impairment, or sedation. Our results represent an example of this unique approach for screening large mixture-based libraries directly in vivo to rapidly identify individual compounds
    corecore