754 research outputs found

    Benchmarking database systems for Genomic Selection implementation

    Get PDF
    Motivation: With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems. Results: We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix

    On the Parity Problem in One-Dimensional Cellular Automata

    Full text link
    We consider the parity problem in one-dimensional, binary, circular cellular automata: if the initial configuration contains an odd number of 1s, the lattice should converge to all 1s; otherwise, it should converge to all 0s. It is easy to see that the problem is ill-defined for even-sized lattices (which, by definition, would never be able to converge to 1). We then consider only odd lattices. We are interested in determining the minimal neighbourhood that allows the problem to be solvable for any initial configuration. On the one hand, we show that radius 2 is not sufficient, proving that there exists no radius 2 rule that can possibly solve the parity problem from arbitrary initial configurations. On the other hand, we design a radius 4 rule that converges correctly for any initial configuration and we formally prove its correctness. Whether or not there exists a radius 3 rule that solves the parity problem remains an open problem.Comment: In Proceedings AUTOMATA&JAC 2012, arXiv:1208.249

    AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication

    Get PDF
    Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication. Here, we introduce Anchored Wavefront alignment (AnchorWave), which performs whole-genome duplication–informed collinear anchor identification between genomes and performs base pair–resolved global alignment for collinear blocks using a two-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multikilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs. By contrast, other genome alignment tools showed low power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome as position matches or indels than the closest competitive approach when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor–binding sites at a rate of 1.05- to 74.85-fold higher than other tools with significantly lower false-positive alignments. AnchorWave complements available genome alignment tools by showing obvious improvement when applied to genomes with dispersed repeats, active TEs, high sequence diversity, and whole-genome duplication variation.This project is supported by the United States Department of Agriculture Agricultural Research Service, NSF No. 1822330, NSF No. 1854828, the European Union's Horizon 2020 Framework Programme under the DeepHealth project [825111], the European Union Regional Development Fund within the framework of The European Regional Development Fund Operational Program of Catalonia 2014 to 2020 with a grant of 50% of total cost eligible under the DRAC project [001-P-001723], and National Natural Science Foundation of China No. 31900486. M.C.S. was supported by NSF Postdoctoral Research Fellowship in Biology No. 1907343. M.M. was partially supported by the Spanish Ministry of Economy, Industry, and Competitiveness under Ramón y Cajal (RYC) fellowship number RYC-2016-21104.Peer ReviewedPostprint (published version

    Declarative operations on nets

    Get PDF
    To increase the expressiveness of knowledge representations, the graph-theoretical basis of semantic networks is reconsidered. Directed labeled graphs are generalized to directed recursive labelnode hypergraphs, which permit a most natural representation of multi-level structures and n-ary relationships. This net formalism is embedded into the relational/functional programming language RELFUN. Operations on (generalized) graphs are specified in a declarative fashion to enhance readability and maintainability. For this, nets are represented as nested RELFUN terms kept in a normal form by rules associated directly with their constructors. These rules rely on equational axioms postulated in the formal definition of the generalized graphs as a constructor algebra. Certain kinds of sharing in net diagrams are mirrored by binding common subterms to logical variables. A package of declarative transformations on net terms is developed. It includes generalized set operations, structure-reducing operations, and extended path searching. The generation of parts lists is given as an application in mechanical engineering. Finally, imperative net storage and retrieval operations are discussed

    A Single Molecule Scaffold for the Maize Genome

    Get PDF
    About 85% of the maize genome consists of highly repetitive sequences that are interspersed by low-copy, gene-coding sequences. The maize community has dealt with this genomic complexity by the construction of an integrated genetic and physical map (iMap), but this resource alone was not sufficient for ensuring the quality of the current sequence build. For this purpose, we constructed a genome-wide, high-resolution optical map of the maize inbred line B73 genome containing >91,000 restriction sites (averaging 1 site/∼23 kb) accrued from mapping genomic DNA molecules. Our optical map comprises 66 contigs, averaging 31.88 Mb in size and spanning 91.5% (2,103.93 Mb/∼2,300 Mb) of the maize genome. A new algorithm was created that considered both optical map and unfinished BAC sequence data for placing 60/66 (2,032.42 Mb) optical map contigs onto the maize iMap. The alignment of optical maps against numerous data sources yielded comprehensive results that proved revealing and productive. For example, gaps were uncovered and characterized within the iMap, the FPC (fingerprinted contigs) map, and the chromosome-wide pseudomolecules. Such alignments also suggested amended placements of FPC contigs on the maize genetic map and proactively guided the assembly of chromosome-wide pseudomolecules, especially within complex genomic regions. Lastly, we think that the full integration of B73 optical maps with the maize iMap would greatly facilitate maize sequence finishing efforts that would make it a valuable reference for comparative studies among cereals, or other maize inbred lines and cultivars

    A cumulative index to the 1973 issues of Aeronautical engineering: A special bibliography

    Get PDF
    This publication is a cumulative index to the abstracts contained in NASA SP-7037 (28) through NASA SP-7037 (39) of Aeronautical Engineering: A Special Bibliography. NASA SP-7037 and its supplements have been compiled through the cooperative efforts of the American Institute of Aeronautics and Astronautics (AIAA) and the National Aeronautics and Space Administration (NASA). This cumulative index includes subject, personal author, corporate source, contract, and report number indexes

    IView: introgression library visualization and query tool

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An introgression library is a family of near-isogenic lines in a common genetic background, each of which carries one or more genomic regions contributed by a donor genome. Near-isogenic lines are powerful genetic resources for the analysis of phenotypic variation and are important for map-base cloning genes underlying mutations and traits. With many thousands of distinct genotypes, querying introgression libraries for lines of interest is an issue. </p> <p>Results</p> <p>We have created IView, a tool to graphically display and query near-isogenic line libraries for specific introgressions. This tool incorporates a web interface for displaying the location and extent of introgressions. Each genetic marker is associated with a position on a reference map. Users can search for introgressions using marker names, or chromosome number and map positions. This search results in a display of lines carrying an introgression at the specified position. Upon selecting one of the lines, color-coded introgressions on all chromosomes of the line are displayed graphically.</p> <p>The source code for IView can be downloaded from <url>http://xrl.us/iview</url>. </p> <p>Conclusions</p> <p>IView will be useful for those wanting to make introgression data from their stock of germplasm searchable. </p
    corecore