184 research outputs found

    Unsupervised feature selection for noisy data

    Get PDF
    Feature selection techniques are enormously applied in a variety of data analysis tasks in order to reduce the dimensionality. According to the type of learning, feature selection algorithms are categorized to: supervised or unsupervised. In unsupervised learning scenarios, selecting features is a much harder problem, due to the lack of class labels that would facilitate the search for relevant features. The selecting feature difficulty is amplified when the data is corrupted by different noises. Almost all traditional unsupervised feature selection methods are not robust against the noise in samples. These approaches do not have any explicit mechanism for detaching and isolating the noise thus they can not produce an optimal feature subset. In this article, we propose an unsupervised approach for feature selection on noisy data, called Robust Independent Feature Selection (RIFS). Specifically, we choose feature subset that contains most of the underlying information, using the same criteria as the Independent component analysis (ICA). Simultaneously, the noise is separated as an independent component. The isolation of representative noise samples is achieved using factor oblique rotation whereas noise identification is performed using factor pattern loadings. Extensive experimental results over divers real-life data sets have showed the efficiency and advantage of the proposed algorithm.We thankfully acknowledge the support of the Comision Interministerial de Ciencia y Tecnologa (CICYT) under contract No. TIN2015-65316-P which has partially funded this work.Peer ReviewedPostprint (author's final draft

    Cost performance and risk in the construction of offshore and onshore wind farms

    Get PDF
    This article investigates the risk of cost overruns and underruns occurring in the construction of 51 onshore and offshore wind farms commissioned between 2000 and 2015 in 13 countries. In total, these projects required about 39billionininvestmentandreachedabout11GWofinstalledcapacity.Weusethisoriginaldatasettotestsixhypothesesaboutconstructioncostoverrunsrelatedto(i)technologicallearning,(ii)fiscalcontrol,(iii)economiesofscale,(iv)configuration,(v)regulationandmarketsand(vi)manufacturingexperience.Wefindthatacrosstheentiredataset,themeancostescalationperprojectis6.539 billion in investment and reached about 11 GW of installed capacity. We use this original dataset to test six hypotheses about construction cost overruns related to (i) technological learning, (ii) fiscal control, (iii) economies of scale, (iv) configuration, (v) regulation and markets and (vi) manufacturing experience. We find that across the entire dataset, the mean cost escalation per project is 6.5% or about 63 million per windfarm, although 20 projects within the sample (39%) did not exhibit cost overruns. The majority of onshore wind farms exhibit cost underruns while for offshore wind farms the results have a larger spread. Interestingly, no significant relationship exists between the size (in total MWor per individual turbine capacity) of a windfarm and the severity of a cost overrun. Nonetheless, there is an indication that the risk increases for larger wind farms at greater distances offshore using new types of turbines and foundations. Overall, the mean cost escalation for onshore projects is 1.7% and 9.6% for offshore projects, amounts much lower than those for other energy infrastructure

    Genome Trees from Conservation Profiles

    Get PDF
    The concept of the genome tree depends on the potential evolutionary significance in the clustering of species according to similarities in the gene content of their genomes. In this respect, genome trees have often been identified with species trees. With the rapid expansion of genome sequence data it becomes of increasing importance to develop accurate methods for grasping global trends for the phylogenetic signals that mutually link the various genomes. We therefore derive here the methodological concept of genome trees based on protein conservation profiles in multiple species. The basic idea in this derivation is that the multi-component “presence-absence” protein conservation profiles permit tracking of common evolutionary histories of genes across multiple genomes. We show that a significant reduction in informational redundancy is achieved by considering only the subset of distinct conservation profiles. Beyond these basic ideas, we point out various pitfalls and limitations associated with the data handling, paving the way for further improvements. As an illustration for the methods, we analyze a genome tree based on the above principles, along with a series of other trees derived from the same data and based on pair-wise comparisons (ancestral duplication-conservation and shared orthologs). In all trees we observe a sharp discrimination between the three primary domains of life: Bacteria, Archaea, and Eukarya. The new genome tree, based on conservation profiles, displays a significant correspondence with classically recognized taxonomical groupings, along with a series of departures from such conventional clusterings

    Quantification of codon selection for comparative bacterial genomics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Statistics measuring codon selection seek to compare genes by their sensitivity to selection for translational efficiency, but existing statistics lack a model for testing the significance of differences between genes. Here, we introduce a new statistic for measuring codon selection, the Adaptive Codon Enrichment (ACE).</p> <p>Results</p> <p>This statistic represents codon usage bias in terms of a probabilistic distribution, quantifying the extent that preferred codons are over-represented in the gene of interest relative to the mean and variance that would result from stochastic sampling of codons. Expected codon frequencies are derived from the observed codon usage frequencies of a broad set of genes, such that they are likely to reflect nonselective, genome wide influences on codon usage (<it>e.g</it>. mutational biases). The relative adaptiveness of synonymous codons is deduced from the frequency of codon usage in a pre-selected set of genes relative to the expected frequency. The ACE can predict both transcript abundance during rapid growth and the rate of synonymous substitutions, with accuracy comparable to or greater than existing metrics. We further examine how the composition of reference gene sets affects the accuracy of the statistic, and suggest methods for selecting appropriate reference sets for any genome, including bacteriophages. Finally, we demonstrate that the ACE may naturally be extended to quantify the genome-wide influence of codon selection in a manner that is sensitive to a large fraction of codons in the genome. This reveals substantial variation among genomes, correlated with the tRNA gene number, even among groups of bacteria where previously proposed whole-genome measures show little variation.</p> <p>Conclusions</p> <p>The statistical framework of the ACE allows rigorous comparison of the level of codon selection acting on genes, both within a genome and between genomes.</p

    Neurophysiological Defects and Neuronal Gene Deregulation in Drosophila mir-124 Mutants

    Get PDF
    miR-124 is conserved in sequence and neuronal expression across the animal kingdom and is predicted to have hundreds of mRNA targets. Diverse defects in neural development and function were reported from miR-124 antisense studies in vertebrates, but a nematode knockout of mir-124 surprisingly lacked detectable phenotypes. To provide genetic insight from Drosophila, we deleted its single mir-124 locus and found that it is dispensable for gross aspects of neural specification and differentiation. On the other hand, we detected a variety of mutant phenotypes that were rescuable by a mir-124 genomic transgene, including short lifespan, increased dendrite variation, impaired larval locomotion, and aberrant synaptic release at the NMJ. These phenotypes reflect extensive requirements of miR-124 even under optimal culture conditions. Comparison of the transcriptomes of cells from wild-type and mir-124 mutant animals, purified on the basis of mir-124 promoter activity, revealed broad upregulation of direct miR-124 targets. However, in contrast to the proposed mutual exclusion model for miR-124 function, its functional targets were relatively highly expressed in miR-124–expressing cells and were not enriched in genes annotated with epidermal expression. A notable aspect of the direct miR-124 network was coordinate targeting of five positive components in the retrograde BMP signaling pathway, whose activation in neurons increases synaptic release at the NMJ, similar to mir-124 mutants. Derepression of the direct miR-124 target network also had many secondary effects, including over-activity of other post-transcriptional repressors and a net incomplete transition from a neuroblast to a neuronal gene expression signature. Altogether, these studies demonstrate complex consequences of miR-124 loss on neural gene expression and neurophysiology

    Dynamics of Genome Rearrangement in Bacterial Populations

    Get PDF
    Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and pneumonic plague, often exhibit a high degree of genomic rearrangement. The recent availability of several Yersinia genomes offers an unprecedented opportunity to study the evolution of genome structure and arrangement. We introduce a set of statistical methods to study patterns of rearrangement in circular chromosomes and apply them to the Yersinia. We constructed a multiple alignment of eight Yersinia genomes using Mauve software to identify 78 conserved segments that are internally free from genome rearrangement. Based on the alignment, we applied Bayesian statistical methods to infer the phylogenetic inversion history of Yersinia. The sampling of genome arrangement reconstructions contains seven parsimonious tree topologies, each having different histories of 79 inversions. Topologies with a greater number of inversions also exist, but were sampled less frequently. The inversion phylogenies agree with results suggested by SNP patterns. We then analyzed reconstructed inversion histories to identify patterns of rearrangement. We confirm an over-representation of “symmetric inversions”—inversions with endpoints that are equally distant from the origin of chromosomal replication. Ancestral genome arrangements demonstrate moderate preference for replichore balance in Yersinia. We found that all inversions are shorter than expected under a neutral model, whereas inversions acting within a single replichore are much shorter than expected. We also found evidence for a canonical configuration of the origin and terminus of replication. Finally, breakpoint reuse analysis reveals that inversions with endpoints proximal to the origin of DNA replication are nearly three times more frequent. Our findings represent the first characterization of genome arrangement evolution in a bacterial population evolving outside laboratory conditions. Insight into the process of genomic rearrangement may further the understanding of pathogen population dynamics and selection on the architecture of circular bacterial chromosomes

    Polymorphisms in the Tlr4 and Tlr5 Gene Are Significantly Associated with Inflammatory Bowel Disease in German Shepherd Dogs

    Get PDF
    Inflammatory bowel disease (IBD) is considered to be the most common cause of vomiting and diarrhoea in dogs, and the German shepherd dog (GSD) is particularly susceptible. The exact aetiology of IBD is unknown, however associations have been identified between specific single-nucleotide polymorphisms (SNPs) in Toll-like receptors (TLRs) and human IBD. However, to date, no genetic studies have been undertaken in canine IBD. The aim of this study was to investigate whether polymorphisms in canine TLR 2, 4 and 5 genes are associated with IBD in GSDs. Mutational analysis of TLR2, TLR4 and TLR5 was performed in 10 unrelated GSDs with IBD. Four non-synonymous SNPs (T23C, G1039A, A1571T and G1807A) were identified in the TLR4 gene, and three non-synonymous SNPs (G22A, C100T and T1844C) were identified in the TLR5 gene. The non-synonymous SNPs identified in TLR4 and TLR5 were evaluated further in a case-control study using a SNaPSHOT multiplex reaction. Sequencing information from 55 unrelated GSDs with IBD were compared to a control group consisting of 61 unrelated GSDs. The G22A SNP in TLR5 was significantly associated with IBD in GSDs, whereas the remaining two SNPs were found to be significantly protective for IBD. Furthermore, the two SNPs in TLR4 (A1571T and G1807A) were in complete linkage disequilibrium, and were also significantly associated with IBD. The TLR5 risk haplotype (ACC) without the two associated TLR4 SNP alleles was significantly associated with IBD, however the presence of the two TLR4 SNP risk alleles without the TLR5 risk haplotype was not statistically associated with IBD. Our study suggests that the three TLR5 SNPs and two TLR4 SNPs; A1571T and G1807A could play a role in the pathogenesis of IBD in GSDs. Further studies are required to confirm the functional importance of these polymorphisms in the pathogenesis of this disease

    Structure of the ATP synthase catalytic complex (F(1)) from Escherichia coli in an autoinhibited conformation.

    Get PDF
    ATP synthase is a membrane-bound rotary motor enzyme that is critical for cellular energy metabolism in all kingdoms of life. Despite conservation of its basic structure and function, autoinhibition by one of its rotary stalk subunits occurs in bacteria and chloroplasts but not in mitochondria. The crystal structure of the ATP synthase catalytic complex (F(1)) from Escherichia coli described here reveals the structural basis for this inhibition. The C-terminal domain of subunit ɛ adopts a heretofore unknown, highly extended conformation that inserts deeply into the central cavity of the enzyme and engages both rotor and stator subunits in extensive contacts that are incompatible with functional rotation. As a result, the three catalytic subunits are stabilized in a set of conformations and rotational positions distinct from previous F(1) structures
    corecore