740 research outputs found

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    Methods to study splicing from high-throughput RNA Sequencing data

    Full text link
    The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

    Employment Is Associated with the Health-Related Quality of Life of Morbidly Obese Persons

    Get PDF
    Published version of an article in the journal: Obesity Surgery. The original publication is available at Springerlink. http://dx.doi.org/10.1007/s11695-010-0289-6. Open AccessBackground  We aimed to investigate whether employment status was associated with health-related quality of life (HRQoL) in a population of morbidly obese subjects. Methods  A total of 143 treatment-seeking morbidly obese patients completed the Medical Outcome Study 36-Item Short-Form Health Survey (SF-36) and the Obesity and Weight-Loss Quality of Life (OWLQOL) questionnaires. The former (SF-36) is a generic measure of physical and mental health status and the latter (OWLQOL) an obesity-specific measure of emotional status. Multiple linear regression analyses included various measures of the HRQoL as dependent variables and employment status, education, marital status, gender, age, body mass index (BMI), type 2 diabetes, hypertension, obstructive sleep apnea, and treatment choice as independent variables. Results  The patients (74% women, 56% employed) had a mean (SD, range) age of 44 (11, 19–66) years and a mean BMI of 44.3 (5.4) kg/m2. The employed patients reported significantly higher HRQoL scores within all eight subscales of SF-36, while the OWLQOL scores were comparable between the two groups. Multiple linear regression confirmed that employment was a strong independent predictor of HRQoL according to the SF-36. Based on part correlation coefficients, employment explained 16% of the variation in the physical and 9% in the mental component summaries of SF-36, while gender explained 22% of the variation in the OWLQOL scores. Conclusion  Employment is associated with the physical and mental HRQoL of morbidly obese subjects, but is not associated with the emotional aspects of quality of life

    Plant growth environments with programmable relative humidity and homogeneous nutrient availability

    Get PDF
    We describe the design, characterization, and use of “programmable”, sterile growth environments for individual (or small sets of) plants. The specific relative humidities and nutrient availability experienced by the plant is established (RH between 15% and 95%; nutrient concentration as desired) during the setup of the growth environment, which takes about 5 minutes and <1$ in disposable cost. These systems maintain these environmental parameters constant for at least 14 days with minimal intervention (one minute every two days). The design is composed entirely of off-the-shelf components (e.g., LEGO® bricks) and is characterized by (i) a separation of root and shoot environment (which is physiologically relevant and facilitates imposing specific conditions on the root system, e.g., darkness), (ii) the development of the root system on a flat surface, where the root enjoys constant contact with nutrient solution and air, (iii) a compatibility with root phenotyping. We demonstrate phenotyping by characterizing root systems of Brassica rapa plants growing in different relative humidities (55%, 75%, and 95%). While most phenotypes were found to be sensitive to these environmental changes, a phenotype tightly associated with root system topology – the size distribution of the areas encircled by roots – appeared to be remarkably and counterintuitively insensitive to humidity changes. These setups combine many of the advantages of hydroponics conditions (e.g., root phenotyping, complete control over nutrient composition, scalability) and soil conditions (e.g., aeration of roots, shading of roots), while being comparable in cost and setup time to Magenta® boxes

    Revealing the missing expressed genes beyond the human reference genome by RNA-Seq

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The complete and accurate human reference genome is important for functional genomics researches. Therefore, the incomplete reference genome and individual specific sequences have significant effects on various studies.</p> <p>Results</p> <p>we used two RNA-Seq datasets from human brain tissues and 10 mixed cell lines to investigate the completeness of human reference genome. First, we demonstrated that in previously identified ~5 Mb Asian and ~5 Mb African novel sequences that are absent from the human reference genome of NCBI build 36, ~211 kb and ~201 kb of them could be transcribed, respectively. Our results suggest that many of those transcribed regions are not specific to Asian and African, but also present in Caucasian. Then, we found that the expressions of 104 RefSeq genes that are unalignable to NCBI build 37 in brain and cell lines are higher than 0.1 RPKM. 55 of them are conserved across human, chimpanzee and macaque, suggesting that there are still a significant number of functional human genes absent from the human reference genome. Moreover, we identified hundreds of novel transcript contigs that cannot be aligned to NCBI build 37, RefSeq genes and EST sequences. Some of those novel transcript contigs are also conserved among human, chimpanzee and macaque. By positioning those contigs onto the human genome, we identified several large deletions in the reference genome. Several conserved novel transcript contigs were further validated by RT-PCR.</p> <p>Conclusion</p> <p>Our findings demonstrate that a significant number of genes are still absent from the incomplete human reference genome, highlighting the importance of further refining the human reference genome and curating those missing genes. Our study also shows the importance of <it>de novo </it>transcriptome assembly. The comparative approach between reference genome and other related human genomes based on the transcriptome provides an alternative way to refine the human reference genome.</p

    Search for new phenomena in final states with an energetic jet and large missing transverse momentum in pp collisions at √ s = 8 TeV with the ATLAS detector

    Get PDF
    Results of a search for new phenomena in final states with an energetic jet and large missing transverse momentum are reported. The search uses 20.3 fb−1 of √ s = 8 TeV data collected in 2012 with the ATLAS detector at the LHC. Events are required to have at least one jet with pT > 120 GeV and no leptons. Nine signal regions are considered with increasing missing transverse momentum requirements between Emiss T > 150 GeV and Emiss T > 700 GeV. Good agreement is observed between the number of events in data and Standard Model expectations. The results are translated into exclusion limits on models with either large extra spatial dimensions, pair production of weakly interacting dark matter candidates, or production of very light gravitinos in a gauge-mediated supersymmetric model. In addition, limits on the production of an invisibly decaying Higgs-like boson leading to similar topologies in the final state are presente

    Heritability in the Efficiency of Nonsense-Mediated mRNA Decay in Humans

    Get PDF
    BACKGROUND: In eukaryotes mRNA transcripts of protein-coding genes in which an intron has been retained in the coding region normally result in premature stop codons and are therefore degraded through the nonsense-mediated mRNA decay (NMD) pathway. There is evidence in the form of selective pressure for in-frame stop codons in introns and a depletion of length three introns that this is an important and conserved quality-control mechanism. Yet recent reports have revealed that the efficiency of NMD varies across tissues and between individuals, with important clinical consequences. PRINCIPAL FINDINGS: Using previously published Affymetrix exon microarray data from cell lines genotyped as part of the International HapMap project, we investigated whether there are heritable, inter-individual differences in the abundance of intron-containing transcripts, potentially reflecting differences in the efficiency of NMD. We identified intronic probesets using EST data and report evidence of heritability in the extent of intron expression in 56 HapMap trios. We also used a genome-wide association approach to identify genetic markers associated with intron expression. Among the top candidates was a SNP in the DCP1A gene, which forms part of the decapping complex, involved in NMD. CONCLUSIONS: While we caution that some of the apparent inter-individual difference in intron expression may be attributable to different handling or treatments of cell lines, we hypothesize that there is significant polymorphism in the process of NMD, resulting in heritable differences in the abundance of intronic mRNA. Part of this phenotype is likely to be due to a polymorphism in a decapping enzyme on human chromosome 3

    FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data

    Get PDF
    We have developed FusionSeq to identify fusion transcripts from paired-end RNA-sequencing. FusionSeq includes filters to remove spurious candidate fusions with artifacts, such as misalignment or random pairing of transcript fragments, and it ranks candidates according to several statistics. It also has a module to identify exact sequences at breakpoint junctions. FusionSeq detected known and novel fusions in a specially sequenced calibration data set, including eight cancers with and without known rearrangements

    Differential Expression of Non-Coding RNAs and Continuous Evolution of the X Chromosome in Testicular Transcriptome of Two Mouse Species

    Get PDF
    BACKGROUND: Tight regulation of testicular gene expression is a prerequisite for male reproductive success, while differentiation of gene activity in spermatogenesis is important during speciation. Thus, comparison of testicular transcriptomes between closely related species can reveal unique regulatory patterns and shed light on evolutionary constraints separating the species. METHODOLOGY/PRINCIPAL FINDINGS: Here, we compared testicular transcriptomes of two closely related mouse species, Mus musculus and Mus spretus, which diverged more than one million years ago. We analyzed testicular expression using tiling arrays overlapping Chromosomes 2, X, Y and mitochondrial genome. An excess of differentially regulated non-coding RNAs was found on Chromosome 2 including the intronic antisense RNAs, intergenic RNAs and premature forms of Piwi-interacting RNAs (piRNAs). Moreover, striking difference was found in the expression of X-linked G6pdx gene, the parental gene of the autosomal retrogene G6pd2. CONCLUSIONS/SIGNIFICANCE: The prevalence of non-coding RNAs among differentially expressed transcripts indicates their role in species-specific regulation of spermatogenesis. The postmeiotic expression of G6pdx in Mus spretus points towards the continuous evolution of X-chromosome silencing and provides an example of expression change accompanying the out-of-the X-chromosomal retroposition
    corecore