277 research outputs found

    Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures × time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set.</p> <p>Results</p> <p>We found that the optimal imputation algorithms (LSA, LLS, and BPCA) are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS) scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS) scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost.</p> <p>Conclusion</p> <p>Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA) are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA) performed better on mcroarray data with lower complexity, while neighbour-based methods (KNN, OLS, LSA, LLS) performed better in data with higher complexity. We also found that the EBS and STS schemes serve as complementary and effective tools for selecting the optimal imputation algorithm.</p

    Advanced Diagnostics for the Study of Linearly Polarized Emission. II: Application to Diffuse Interstellar Radio Synchrotron Emission

    Get PDF
    Diagnostics of polarized emission provide us with valuable information on the Galactic magnetic field and the state of turbulence in the interstellar medium, which cannot be obtained from synchrotron intensity alone. In Paper I (Herron et al. 2017b), we derived polarization diagnostics that are rotationally and translationally invariant in the QQ-UU plane, similar to the polarization gradient. In this paper, we apply these diagnostics to simulations of ideal magnetohydrodynamic turbulence that have a range of sonic and Alfv\'enic Mach numbers. We generate synthetic images of Stokes QQ and UU for these simulations, for the cases where the turbulence is illuminated from behind by uniform polarized emission, and where the polarized emission originates from within the turbulent volume. From these simulated images we calculate the polarization diagnostics derived in Paper I, for different lines of sight relative to the mean magnetic field, and for a range of frequencies. For all of our simulations, we find that the polarization gradient is very similar to the generalized polarization gradient, and that both trace spatial variations in the magnetoionic medium for the case where emission originates within the turbulent volume, provided that the medium is not supersonic. We propose a method for distinguishing the cases of emission coming from behind or within a turbulent, Faraday rotating medium, and a method to partly map the rotation measure of the observed region. We also speculate on statistics of these diagnostics that may allow us to constrain the physical properties of an observed turbulent region.Comment: 34 pages, 25 figures, accepted for publication in Ap

    Effort required to finish shotgun-generated genome sequences differs significantly among vertebrates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The approaches for shotgun-based sequencing of vertebrate genomes are now well-established, and have resulted in the generation of numerous draft whole-genome sequence assemblies. In contrast, the process of refining those assemblies to improve contiguity and increase accuracy (known as 'sequence finishing') remains tedious, labor-intensive, and expensive. As a result, the vast majority of vertebrate genome sequences generated to date remain at a draft stage.</p> <p>Results</p> <p>To date, our genome sequencing efforts have focused on comparative studies of targeted genomic regions, requiring sequence finishing of large blocks of orthologous sequence (average size 0.5-2 Mb) from various subsets of 75 vertebrates. This experience has provided a unique opportunity to compare the relative effort required to finish shotgun-generated genome sequence assemblies from different species, which we report here. Importantly, we found that the sequence assemblies generated for the same orthologous regions from various vertebrates show substantial variation with respect to misassemblies and, in particular, the frequency and characteristics of sequence gaps. As a consequence, the work required to finish different species' sequences varied greatly. Application of the same standardized methods for finishing provided a novel opportunity to "assay" characteristics of genome sequences among many vertebrate species. It is important to note that many of the problems we have encountered during sequence finishing reflect unique architectural features of a particular vertebrate's genome, which in some cases may have important functional and/or evolutionary implications. Finally, based on our analyses, we have been able to improve our procedures to overcome some of these problems and to increase the overall efficiency of the sequence-finishing process, although significant challenges still remain.</p> <p>Conclusion</p> <p>Our findings have important implications for the eventual finishing of the draft whole-genome sequences that have now been generated for a large number of vertebrates.</p

    Quantification of endogenous levels of IAA, IAAsp and IBA in micro-propagated shoots of hybrid chestnut pre-treated with IBA

    Get PDF
    Endogenous levels of indole-3-acetic acid (IAA), indole-3-acetylaspartic acid (IAAsp) and indole-3-butyric acid (IBA) were measured during the first 8 d of in vitro rooting of rootstock from the chestnut ‘M3’ hybrid by high performance liquid chromatography (HPLC). Rooting was induced either by dipping the basal ends of the shoots into a 4.92-mM IBA solution for 1 min or by sub-culturing the shoots on solid rooting medium supplemented with 14.8- μM IBA for 5 d. For root development, the induced shoots were transferred to auxin-free solid medium. Auxins were measured in the apical and basal parts of the shoots by means of HPLC. Endogenous levels of IAA and IAAsp were found to be greater in IBA-treated shoots than in control shoots. In extracts of the basal parts of the shoots, the concentration of free IAA showed a significant peak 2 d after either root inductive method and a subsequent gradual decrease for the remainder of the time course. The concentration of IAAsp peaked at day 6 in extracts of the basal parts of shoots induced with 14.8-μM IBA for 5 d, whereas shoots induced by dipping showed an initial increase until day 2 and then remained stable. In extracts from basal shoot portions induced by dipping, IBA concentration showed a transient peak at day 1 and a plateau between day 2 and 4, in contrast to the profile of shoots induced on auxin-containing medium, which showed a significant reduction between 4 and 6 d after transferred to auxin-free medium. All quantified auxins remained at a relatively low level, virtually constant, in extracts from apical shoot portions, as well as in extracts from control non-rooting shoots. In conclusion, the natural auxin IAA is the signal responsible for root induction, although it is driven by exogenous IBA independently of the adding conditions

    The completion of the Mammalian Gene Collection (MGC)

    Get PDF
    Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide

    In vivo and in vitro synthesis of CM-proteins (A-hordeins) from barley (Hordeum vulgare L.)

    Get PDF
    CM-proteins from barley endosperm (CMa, CMb, CMc, CMd), which are the main components of the A-hordein fraction, are synthesized most actively 10 to 30 d after anthesis (maximum at 15–20 d). They are synthesized by membranebound polysomes as precursors of higher apparent molecular weight (13,000–21,000) than the mature proteins (12,000–16,000). The largest in vitro product (21,000) is the putative precursor of protein CMd (16,000), as it is selected with anti-CMd monospecific IgG's, and is coded by an mRNA of greater sedimentation coefficient (9 S) than those encoding the other three proteins (7.5 S). CM-proteins always appear in the soluble fraction, following different homogenization and subcellular fractionation procedures, indicating that these proteins are transferred to the soluble fraction after processing

    Living on the edge: utilising lidar data to assess the importance of vegetation structure for avian diversity in fragmented woodlands and their edges

    Get PDF
    Context: In agricultural landscapes, small woodland patches can be important wildlife refuges. Their value in maintaining biodiversity may, however, be compromised by isolation, and so knowledge about the role of habitat structure is vital to understand the drivers of diversity. This study examined how avian diversity and abundance were related to habitat structure in four small woods in an agricultural landscape in eastern England. Objectives: The aims were to examine the edge effect on bird diversity and abundance, and the contributory role of vegetation structure. Specifically: what is the role of vegetation structure on edge effects, and which edge structures support the greatest bird diversity? Methods: Annual breeding bird census data for 28 species were combined with airborne lidar data in linear mixed models fitted separately at (i) the whole wood level, and (ii) for the woodland edges only. Results: Despite relatively small woodland areas (4.9–9.4 ha), bird diversity increased significantly towards the edges, being driven in part by vegetation structure. At the whole woods level, diversity was positively associated with increased vegetation above 0.5 m and especially with increasing vegetation density in the understorey layer, which was more abundant at the woodland edges. Diversity along the edges was largely driven by the density of vegetation below 4 m. Conclusions: The results demonstrate that bird diversity was maximised by a diverse vegetation structure across the wood and especially a dense understorey along the edge. These findings can assist bird conservation by guiding habitat management of remaining woodland patches
    corecore