98 research outputs found

    Statistical HOmogeneous Cluster SpectroscopY (SHOCSY): an optimized statistical approach for clustering of ¹H NMR spectral data to reduce interference and enhance robust biomarkers selection.

    Get PDF
    We propose a novel statistical approach to improve the reliability of (1)H NMR spectral analysis in complex metabolic studies. The Statistical HOmogeneous Cluster SpectroscopY (SHOCSY) algorithm aims to reduce the variation within biological classes by selecting subsets of homogeneous (1)H NMR spectra that contain specific spectroscopic metabolic signatures related to each biological class in a study. In SHOCSY, we used a clustering method to categorize the whole data set into a number of clusters of samples with each cluster showing a similar spectral feature and hence biochemical composition, and we then used an enrichment test to identify the associations between the clusters and the biological classes in the data set. We evaluated the performance of the SHOCSY algorithm using a simulated (1)H NMR data set to emulate renal tubule toxicity and further exemplified this method with a (1)H NMR spectroscopic study of hydrazine-induced liver toxicity study in rats. The SHOCSY algorithm improved the predictive ability of the orthogonal partial least-squares discriminatory analysis (OPLS-DA) model through the use of "truly" representative samples in each biological class (i.e., homogeneous subsets). This method ensures that the analyses are no longer confounded by idiosyncratic responders and thus improves the reliability of biomarker extraction. SHOCSY is a useful tool for removing irrelevant variation that interfere with the interpretation and predictive ability of models and has widespread applicability to other spectroscopic data, as well as other "omics" type of data

    Understanding Group Structures and Properties in Social Media

    Full text link
    Abstract. The rapid growth of social networking sites enables people to connect to each other more conveniently than ever. With easy-to-use social media, people contribute and consume contents, leading to a new form of human interaction and the emergence of online collective behav-ior. In this chapter, we aim to understand group structures and proper-ties by extracting and profiling communities in social media. We present some challenges of community detection in social media. A prominent one is that networks in social media are often heterogeneous. We intro-duce two types of heterogeneity presented in online social networks and elaborate corresponding community detection approaches for each type, respectively. Social media provides not only interaction information but also textual and tag data. This variety of data can be exploited to profile individual groups in understanding group formation and relationships. We also suggest some future work in understanding group structures and properties. Key words: social media, community detection, group profiling, het-erogeneous networks, multi-mode networks, multi-dimensional networks

    Weighting and selection of variables for cluster analysis

    No full text
    Clustering, Variable selection, Feature selection, Variable weighting, Variable importance, Pattern recognition, Discriminant analysis,

    Variable selection in clustering

    No full text
    Variable selection, Cluster analysis of two-mode data, scaling of variables, Pillai trace statistic, Interactive data analysis,

    Preparation of new substituted alkylamide derivatives of teicoplanin as antibacterials

    No full text
    The title compds. [I; R = H, protecting group; Y = NR1X1(XX2)p(TX3)qW; R1 = H, alkyl; T, X = O, (substituted) imino; X1, X2, X3 = C2-10 alkylene; W = OH, amino; p = 1-50; q = 0-12; A = H, N-acylated \u3b2-D-2-deoxy-2-aminoglucopyranosyl; B = H, N-acetyl-\u3b2-D-2-deoxy-2-aminoglucopyranosyl; M = H, \u3b1-D-mannopyranosyl; B = H only when both A, M = H], were prepd. Thus, teicoplanin A1 component 2 in ET3N/DMF was treated with PhCH2O2CCl in acetone to give 3c96% of the N-15 CBZ deriv. This was esterified with ClCH2CN in DMF/Et3N in 3c98% yield and the ester was treated with H2N(CH2)2NH(CH2)2NH2 in DMF followed by hydrogenolysis to give I [A = N-(8-methylnonanoyl)-\u3b2-D-2-deoxy-2-aminoglucopyranosyl, B = N-acetyl-\u3b2-D-2-deoxy-2-aminoglucopyranosyl, M = \u3b1-D-mannopyranosyl, Y = H2NCH2CH2NHCH2CH2NH, R = H] (II). II had an ED50 of 0.09 mg/kg s.c. against Streptomyces pyrogenes C203 in mice. Several I were active against multi-resistant Pseudomonas aeruginosa with MIC of 4-128 \u3bcg/mL

    Constrained canonical correlation

    No full text
    canonical correlation, constrained multivariate analysis, response surface analysis,

    Effect of data standardization on chemical clustering and similarity searching

    No full text
    Standardization is used to ensure that the variables in a similarity calculation make an equal contribution to the computed similarity value. This paper compares the use of seven different methods that have been suggested previously for the standardization of integer-valued or real-valued data, comparing the results with unstandardized data. Sets of structures from the MDL Drug Data Report and IDAlert databases and represented by Pipeline Pilot physicochemical parameters, molecular holograms and Molconn-Z parameters are clustered using the k-means and Ward’s clustering methods. The resulting classifications are evaluated in terms of the degree of clustering of active compounds selected from eleven different biological activity classes, with these classes also being used in similarity searches. It is shown that there is no consistent pattern when the various standardization methods are ranked in order of decreasing effectiveness and that there is no obvious performance benefit (when compared to unstandardized data) that is likely to be obtained from the use of any particular standardization method
    • …
    corecore