21,615 research outputs found
A novel framework to elucidate core classes in a dataset
In this paper we present an original framework to extract representative groups from a dataset, and we validate it
over a novel case study. The framework specifies the application of different clustering algorithms, then several statistical and visualisation techniques are used to characterise the results, and core classes are defined by consensus clustering. Classes may be verified using supervised classification algorithms to obtain a set of rules which may be useful for new data points in the future. This framework is validated over a novel set of histone markers for breast cancer patients. From a technical perspective, the resultant classes are well separated and characterised by low, medium and high levels of biological markers. Clinically, the groups appear to distinguish patients with poor overall survival from those with low grading score and better survival. Overall, this framework offers a promising methodology for elucidating core consensus groups from data
Recommended from our members
Long non-coding RNA profiling of human lymphoid progenitor cells reveals transcriptional divergence of B cell and T cell lineages.
To elucidate the transcriptional 'landscape' that regulates human lymphoid commitment during postnatal life, we used RNA sequencing to assemble the long non-coding transcriptome across human bone marrow and thymic progenitor cells spanning the earliest stages of B lymphoid and T lymphoid specification. Over 3,000 genes encoding previously unknown long non-coding RNAs (lncRNAs) were revealed through the analysis of these rare populations. Lymphoid commitment was characterized by lncRNA expression patterns that were highly stage specific and were more lineage specific than those of protein-coding genes. Protein-coding genes co-expressed with neighboring lncRNA genes showed enrichment for ontologies related to lymphoid differentiation. The exquisite cell-type specificity of global lncRNA expression patterns independently revealed new developmental relationships among the earliest progenitor cells in the human bone marrow and thymus
Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance.
Mycobacterium tuberculosis is a serious human pathogen threat exhibiting complex evolution of antimicrobial resistance (AMR). Accordingly, the many publicly available datasets describing its AMR characteristics demand disparate data-type analyses. Here, we develop a reference strain-agnostic computational platform that uses machine learning approaches, complemented by both genetic interaction analysis and 3D structural mutation-mapping, to identify signatures of AMR evolution to 13 antibiotics. This platform is applied to 1595 sequenced strains to yield four key results. First, a pan-genome analysis shows that M. tuberculosis is highly conserved with sequenced variation concentrated in PE/PPE/PGRS genes. Second, the platform corroborates 33 genes known to confer resistance and identifies 24 new genetic signatures of AMR. Third, 97 epistatic interactions across 10 resistance classes are revealed. Fourth, detailed structural analysis of these genes yields mechanistic bases for their selection. The platform can be used to study other human pathogens
Recommended from our members
MPRAnalyze: statistical framework for massively parallel reporter assays.
Massively parallel reporter assays (MPRAs) can measure the regulatory function of thousands of DNA sequences in a single experiment. Despite growing popularity, MPRA studies are limited by a lack of a unified framework for analyzing the resulting data. Here we present MPRAnalyze: a statistical framework for analyzing MPRA count data. Our model leverages the unique structure of MPRA data to quantify the function of regulatory sequences, compare sequences' activity across different conditions, and provide necessary flexibility in an evolving field. We demonstrate the accuracy and applicability of MPRAnalyze on simulated and published data and compare it with existing methods
- …