1,735 research outputs found
Formal Verification of Neural Network Controlled Autonomous Systems
In this paper, we consider the problem of formally verifying the safety of an
autonomous robot equipped with a Neural Network (NN) controller that processes
LiDAR images to produce control actions. Given a workspace that is
characterized by a set of polytopic obstacles, our objective is to compute the
set of safe initial conditions such that a robot trajectory starting from these
initial conditions is guaranteed to avoid the obstacles. Our approach is to
construct a finite state abstraction of the system and use standard
reachability analysis over the finite state abstraction to compute the set of
the safe initial states. The first technical problem in computing the finite
state abstraction is to mathematically model the imaging function that maps the
robot position to the LiDAR image. To that end, we introduce the notion of
imaging-adapted sets as partitions of the workspace in which the imaging
function is guaranteed to be affine. We develop a polynomial-time algorithm to
partition the workspace into imaging-adapted sets along with computing the
corresponding affine imaging functions. Given this workspace partitioning, a
discrete-time linear dynamics of the robot, and a pre-trained NN controller
with Rectified Linear Unit (ReLU) nonlinearity, the second technical challenge
is to analyze the behavior of the neural network. To that end, we utilize a
Satisfiability Modulo Convex (SMC) encoding to enumerate all the possible
segments of different ReLUs. SMC solvers then use a Boolean satisfiability
solver and a convex programming solver and decompose the problem into smaller
subproblems. To accelerate this process, we develop a pre-processing algorithm
that could rapidly prune the space feasible ReLU segments. Finally, we
demonstrate the efficiency of the proposed algorithms using numerical
simulations with increasing complexity of the neural network controller
The Repetitive Landscape of the Barley Genome
While transposable elements (TEs) comprise the bulk of plant genomic DNA, how they contribute to genome structure and organization is still poorly understood. Especially, in large genomes where TEs make the majority of genomic DNA, it is still unclear whether TEs target specific chromosomal regions or whether they simply accumulate where they are best tolerated. The barley genome with its vast repetitive fraction is an ideal system to study chromosomal organization and evolution of TEs. Genes make only about 2% of the genome, while over 80% is derived from TEs. The TE fraction is composed of at least 350 different families. However, 50% of the genome is comprised of only 15 high-copy TE families, while all other TE families are present in moderate or low-copy numbers. The barley genome is highly compartmentalized with different types of TEs occupying different chromosomal “niches”, such as distal, interstitial or proximal regions of chromosome arms. Furthermore, gene space represents its own distinct genomic compartment that is enriched in small non-autonomous DNA transposons, suggesting that these TEs specifically target promoters and downstream regions. Some TE families also show a strong preference to insert in specific sequence motifs which may, in part, explain their distribution. The family-specific distribution patterns result in distinct TE compositions of different chromosomal compartments.Peer reviewe
Sequencing of BAC pools by different next generation sequencing platforms and strategies
<p>Abstract</p> <p>Background</p> <p>Next generation sequencing of BACs is a viable option for deciphering the sequence of even large and highly repetitive genomes. In order to optimize this strategy, we examined the influence of read length on the quality of Roche/454 sequence assemblies, to what extent Illumina/Solexa mate pairs (MPs) improve the assemblies by scaffolding and whether barcoding of BACs is dispensable.</p> <p>Results</p> <p>Sequencing four BACs with both FLX and Titanium technologies revealed similar sequencing accuracy, but showed that the longer Titanium reads produce considerably less misassemblies and gaps. The 454 assemblies of 96 barcoded BACs were improved by scaffolding 79% of the total contig length with MPs from a non-barcoded library.</p> <p>Assembly of the unmasked 454 sequences without separation by barcodes revealed chimeric contig formation to be a major problem, encompassing 47% of the total contig length. Masking the sequences reduced this fraction to 24%.</p> <p>Conclusion</p> <p>Optimal BAC pool sequencing should be based on the longest available reads, with barcoding essential for a comprehensive assessment of both repetitive and non-repetitive sequence information. When interest is restricted to non-repetitive regions and repeats are masked prior to assembly, barcoding is non-essential. In any case, the assemblies can be improved considerably by scaffolding with non-barcoded BAC pool MPs.</p
Repeat-length variation in a wheat cellulose synthase-like gene is associated with altered tiller number and stem cell wall composition
The tiller inhibition gene (tin) that reduces tillering in wheat (Triticum aestivum) is also associated with large spikes, increased grain weight, and thick leaves and stems. In this study, comparison of near-isogenic lines (NILs) revealed changes in stem morphology, cell wall composition, and stem strength. Microscopic analysis of stem cross-sections and chemical analysis of stem tissue indicated that cell walls in tin lines were thicker and more lignified than in free-tillering NILs. Increased lignification was associated with stronger stems in tin plants. A candidate gene for tin was identified through map-based cloning and was predicted to encode a cellulose synthase-like (Csl) protein with homology to members of the CslA clade. Dinucleotide repeat-length polymorphism in the 5′UTR region of the Csl gene was associated with tiller number in diverse wheat germplasm and linked to expression differences of Csl transcripts between NILs. We propose that regulation of Csl transcript and/or protein levels affects carbon partitioning throughout the plant, which plays a key role in the tin phenotype.J. Hyles, S. Vautrin, F. Pettolino, C. MacMillan, Z. Stachurski, J. Breen, H. Berges, T. Wicker, and W. Spielmeye
Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats
BACKGROUND: Barley has one of the largest and most complex genomes of all economically important food crops. The rise of new short read sequencing technologies such as Illumina/Solexa permits such large genomes to be effectively sampled at relatively low cost. Based on the corresponding sequence reads a Mathematically Defined Repeat (MDR) index can be generated to map repetitive regions in genomic sequences. RESULTS: We have generated 574 Mbp of Illumina/Solexa sequences from barley total genomic DNA, representing about 10% of a genome equivalent. From these sequences we generated an MDR index which was then used to identify and mark repetitive regions in the barley genome. Comparison of the MDR plots with expert repeat annotation drawing on the information already available for known repetitive elements revealed a significant correspondence between the two methods. MDR-based annotation allowed for the identification of dozens of novel repeat sequences, though, which were not recognised by hand-annotation. The MDR data was also used to identify gene-containing regions by masking of repetitive sequences in eight de-novo sequenced bacterial artificial chromosome (BAC) clones. For half of the identified candidate gene islands indeed gene sequences could be identified. MDR data were only of limited use, when mapped on genomic sequences from the closely related species Triticum monococcum as only a fraction of the repetitive sequences was recognised. CONCLUSION: An MDR index for barley, which was obtained by whole-genome Illumina/Solexa sequencing, proved as efficient in repeat identification as manual expert annotation. Circumventing the labour-intensive step of producing a specific repeat library for expert annotation, an MDR index provides an elegant and efficient resource for the identification of repetitive and low-copy (i.e. potentially gene-containing sequences) regions in uncharacterised genomic sequences. The restriction that a particular MDR index can not be used across species is outweighed by the low costs of Illumina/Solexa sequencing which makes any chosen genome accessible for whole-genome sequence sampling
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
Novel sequencing technologies permit the rapid production of large sequence data sets. These technologies are likely to revolutionize genetics and biomedical research, but a thorough characterization of the ultra-short read output is necessary. We generated and analyzed two Illumina 1G ultra-short read data sets, i.e. 2.8 million 27mer reads from a Beta vulgaris genomic clone and 12.3 million 36mers from the Helicobacter acinonychis genome. We found that error rates range from 0.3% at the beginning of reads to 3.8% at the end of reads. Wrong base calls are frequently preceded by base G. Base substitution error frequencies vary by 10- to 11-fold, with A > C transversion being among the most frequent and C > G transversions among the least frequent substitution errors. Insertions and deletions of single bases occur at very low rates. When simulating re-sequencing we found a 20-fold sequencing coverage to be sufficient to compensate errors by correct reads. The read coverage of the sequenced regions is biased; the highest read density was found in intervals with elevated GC content. High Solexa quality scores are over-optimistic and low scores underestimate the data quality. Our results show different types of biases and ways to detect them. Such biases have implications on the use and interpretation of Solexa data, for de novo sequencing, re-sequencing, the identification of single nucleotide polymorphisms and DNA methylation sites, as well as for transcriptome analysis
Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes
<p>Abstract</p> <p>Background</p> <p>We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using <it>in silico </it>simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence.</p> <p>Results</p> <p>The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on <it>Arabidopsis</it>. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most.</p> <p>Conclusions</p> <p>BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies.</p
- …