683 research outputs found

    Macrostate Data Clustering

    Full text link
    We develop an effective nonhierarchical data clustering method using an analogy to the dynamic coarse graining of a stochastic system. Analyzing the eigensystem of an interitem transition matrix identifies fuzzy clusters corresponding to the metastable macroscopic states (macrostates) of a diffusive system. A "minimum uncertainty criterion" determines the linear transformation from eigenvectors to cluster-defining window functions. Eigenspectrum gap and cluster certainty conditions identify the proper number of clusters. The physically motivated fuzzy representation and associated uncertainty analysis distinguishes macrostate clustering from spectral partitioning methods. Macrostate data clustering solves a variety of test cases that challenge other methods.Comment: keywords: cluster analysis, clustering, pattern recognition, spectral graph theory, dynamic eigenvectors, machine learning, macrostates, classificatio

    Ab-initio density functional study of O on the Ag(001) surface

    Full text link
    The adsorption of oxygen on the Ag(001) is investigated by means of density functional techniques. Starting from a characterization of the clean silver surfaces oxygen adsorption in several modifications (molecularly, on-surface, sub-surface, Ag2_2O) for varying coverage was studied. Besides structural parameters and adsorption energies also work-function changes, vibrational frequencies and core level energies were calculated for a better characterization of the adsorption structures and an easier comparison to the rich experimental data.Comment: 26 pages, 8 figures, Surf. Sci. accepte

    Dietary elimination of children with food protein induced gastrointestinal allergy – micronutrient adequacy with and without a hypoallergenic formula?

    Get PDF
    Background: The cornerstone for management of Food protein-induced gastrointestinal allergy (FPGIA) is dietary exclusion; however the micronutrient intake of this population has been poorly studied. We set out to determine the dietary intake of children on an elimination diet for this food allergy and hypothesised that the type of elimination diet and the presence of a hypoallergenic formula (HF) significantly impacts on micronutrient intake. Method: A prospective observational study was conducted on children diagnosed with FPIGA on an exclusion diet who completed a 3 day semi-quantitative food diary 4 weeks after commencing the diet. Nutritional intake where HF was used was compared to those without HF, with or without a vitamin and mineral supplement (VMS). Results: One-hundred-and-five food diaries were included in the data analysis: 70 boys (66.7%) with median age of 21.8 months [IQR: 10 - 67.7]. Fifty-three children (50.5%) consumed a HF and the volume of consumption was correlated to micronutrient intake. Significantly (p <0.05) more children reached their micronutrient requirements if a HF was consumed. In those without a HF, some continued not to achieve requirements in particular for vitamin D and zinc, in spite of VMS. Conclusion: This study points towards the important micronutrient contribution of a HF in children with FPIGA. Children, who are not on a HF and without a VMS, are at increased risk of low intakes in particular vitamin D and zinc. Further studies need to be performed, to assess whether dietary intake translates into actual biological deficiencies

    Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-seq and ESTs

    Get PDF
    The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct annotation is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3-prime untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3-prime polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3-prime UTR re-annotation (including extension of one 3-prime UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental dataComment: 44 pages, 9 figure

    A General Realizability Method for the Reynolds Stress for 2-Equation RANS Models

    Get PDF

    Efficient Dynamic Importance Sampling of Rare Events in One Dimension

    Get PDF
    Exploiting stochastic path integral theory, we obtain \emph{by simulation} substantial gains in efficiency for the computation of reaction rates in one-dimensional, bistable, overdamped stochastic systems. Using a well-defined measure of efficiency, we compare implementations of ``Dynamic Importance Sampling'' (DIMS) methods to unbiased simulation. The best DIMS algorithms are shown to increase efficiency by factors of approximately 20 for a 5kBT5 k_B T barrier height and 300 for 9kBT9 k_B T, compared to unbiased simulation. The gains result from close emulation of natural (unbiased), instanton-like crossing events with artificially decreased waiting times between events that are corrected for in rate calculations. The artificial crossing events are generated using the closed-form solution to the most probable crossing event described by the Onsager-Machlup action. While the best biasing methods require the second derivative of the potential (resulting from the ``Jacobian'' term in the action, which is discussed at length), algorithms employing solely the first derivative do nearly as well. We discuss the importance of one-dimensional models to larger systems, and suggest extensions to higher-dimensional systems.Comment: version to be published in Phys. Rev.

    Evolutionary distances in the twilight zone -- a rational kernel approach

    Get PDF
    Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON

    Increased entropy of signal transduction in the cancer metastasis phenotype

    Get PDF
    Studies into the statistical properties of biological networks have led to important biological insights, such as the presence of hubs and hierarchical modularity. There is also a growing interest in studying the statistical properties of networks in the context of cancer genomics. However, relatively little is known as to what network features differ between the cancer and normal cell physiologies, or between different cancer cell phenotypes. Based on the observation that frequent genomic alterations underlie a more aggressive cancer phenotype, we asked if such an effect could be detectable as an increase in the randomness of local gene expression patterns. Using a breast cancer gene expression data set and a model network of protein interactions we derive constrained weighted networks defined by a stochastic information flux matrix reflecting expression correlations between interacting proteins. Based on this stochastic matrix we propose and compute an entropy measure that quantifies the degree of randomness in the local pattern of information flux around single genes. By comparing the local entropies in the non-metastatic versus metastatic breast cancer networks, we here show that breast cancers that metastasize are characterised by a small yet significant increase in the degree of randomness of local expression patterns. We validate this result in three additional breast cancer expression data sets and demonstrate that local entropy better characterises the metastatic phenotype than other non-entropy based measures. We show that increases in entropy can be used to identify genes and signalling pathways implicated in breast cancer metastasis. Further exploration of such integrated cancer expression and protein interaction networks will therefore be a fruitful endeavour.Comment: 5 figures, 2 Supplementary Figures and Table
    corecore