56,435 research outputs found

    A multiple sequential orthogonal least squares algorithm for feature ranking and subset selection

    Get PDF
    High-dimensional data analysis involving a large number of variables or features is commonly encountered in multiple regression and multivariate pattern recognition. It has been noted that in many cases not all the original variables are necessary for characterizing the overall features. More often only a subset of a small number of significant variables is required. The detection of significant variables from a library consisting of all the original variables is therefore a key and challenging step for dimensionality reduction. Principal component analysis is a useful tool for dimensionality reduction. Principal components, however, suffer from two main deficiencies: Principal components always involve all the original variables and are usually difficult to physically interpret. This study introduces a new multiple sequential orthogonal least squares algorithm for feature ranking and subset selection. The new method detects in a stepwise way the capability of each candidate feature to recover the first few principal components. At each step, only the significant variable with the strongest capability to represent the first few principal components is selected. Unlike principal components, which carry no clear physical meanings, features selected by the new method preserve the original measurement meanings

    Validation of suitable internal control genes for expression studies in aging.

    Get PDF
    Quantitative data from experiments of gene expression are often normalized through levels of housekeeping genes transcription by assuming that expression of these genes is highly uniform. This practice is being questioned as it becomes increasingly clear that the level of housekeeping genes expression may vary considerably in certain biological samples. To date, the validation of reference genes in aging has received little attention and suitable reference genes have not yet been defined. Our aim was to evaluate the expression stability of frequently used reference genes in human peripheral blood mononuclear cells with respect to aging. Using quantitative RT-PCR, we carried out an extensive evaluation of five housekeeping genes, i.e. 18s rRNA, ACTB, GAPDH, HPRT1 and GUSB, for stability of expression in samples from donors in the age range 35-74 years. The consistency in the expression stability was quantified on the basis of the coefficient of variation and two algorithms termed geNorm and NormFinder. Our results indicated GUSB be the most suitable transcript and 18s the least for accurate normalization in PBMCs. We also demonstrated that aging is a confounding factor with respect to stability of 18s, HPRT1 and ACTB expression, which were particularly prone to variability in aged donors

    Sequential Complexity as a Descriptor for Musical Similarity

    Get PDF
    We propose string compressibility as a descriptor of temporal structure in audio, for the purpose of determining musical similarity. Our descriptors are based on computing track-wise compression rates of quantised audio features, using multiple temporal resolutions and quantisation granularities. To verify that our descriptors capture musically relevant information, we incorporate our descriptors into similarity rating prediction and song year prediction tasks. We base our evaluation on a dataset of 15500 track excerpts of Western popular music, for which we obtain 7800 web-sourced pairwise similarity ratings. To assess the agreement among similarity ratings, we perform an evaluation under controlled conditions, obtaining a rank correlation of 0.33 between intersected sets of ratings. Combined with bag-of-features descriptors, we obtain performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction. For both tasks, analysis of selected descriptors reveals that representing features at multiple time scales benefits prediction accuracy.Comment: 13 pages, 9 figures, 8 tables. Accepted versio

    Complex exon-intron marking by histone modifications is not determined solely by nucleosome distribution

    Get PDF
    It has recently been shown that nucleosome distribution, histone modifications and RNA polymerase II (Pol II) occupancy show preferential association with exons (“exon-intron marking”), linking chromatin structure and function to co-transcriptional splicing in a variety of eukaryotes. Previous ChIP-sequencing studies suggested that these marking patterns reflect the nucleosomal landscape. By analyzing ChIP-chip datasets across the human genome in three cell types, we have found that this marking system is far more complex than previously observed. We show here that a range of histone modifications and Pol II are preferentially associated with exons. However, there is noticeable cell-type specificity in the degree of exon marking by histone modifications and, surprisingly, this is also reflected in some histone modifications patterns showing biases towards introns. Exon-intron marking is laid down in the absence of transcription on silent genes, with some marking biases changing or becoming reversed for genes expressed at different levels. Furthermore, the relationship of this marking system with splicing is not simple, with only some histone modifications reflecting exon usage/inclusion, while others mirror patterns of exon exclusion. By examining nucleosomal distributions in all three cell types, we demonstrate that these histone modification patterns cannot solely be accounted for by differences in nucleosome levels between exons and introns. In addition, because of inherent differences between ChIP-chip array and ChIP-sequencing approaches, these platforms report different nucleosome distribution patterns across the human genome. Our findings confound existing views and point to active cellular mechanisms which dynamically regulate histone modification levels and account for exon-intron marking. We believe that these histone modification patterns provide links between chromatin accessibility, Pol II movement and co-transcriptional splicing

    Graph-Embedding Empowered Entity Retrieval

    Full text link
    In this research, we improve upon the current state of the art in entity retrieval by re-ranking the result list using graph embeddings. The paper shows that graph embeddings are useful for entity-oriented search tasks. We demonstrate empirically that encoding information from the knowledge graph into (graph) embeddings contributes to a higher increase in effectiveness of entity retrieval results than using plain word embeddings. We analyze the impact of the accuracy of the entity linker on the overall retrieval effectiveness. Our analysis further deploys the cluster hypothesis to explain the observed advantages of graph embeddings over the more widely used word embeddings, for user tasks involving ranking entities
    corecore