28 research outputs found
Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies
Twelve cDNA libraries from two species of catfish have been sequenced, resulting in the generation of nearly 500,000 ESTs
Generation of Physical Map Contig-Specific Sequences Useful for Whole Genome Sequence Scaffolding
<div><p>Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly) were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge.</p> </div
The distribution of contig sizes.
<p>The X axis represents the length of sequence contigs, starting with 200-300 bp, since the minimum length of the contig is 200 bp, followed by 301-400 bp, 401-500 bp, 501-600 bp, 601-700 bp, 701-800 bp, 801-900 bp, 901-1000 bp, 1001-2000 bp and > 2000 bp. The Y axis represents the number of sequence contigs.</p
Identification and Analysis of Genome-Wide SNPs Provide Insight into Signatures of Selection and Domestication in Channel Catfish (<i>Ictalurus punctatus</i>)
<div><p>Domestication and selection for important performance traits can impact the genome, which is most often reflected by reduced heterozygosity in and surrounding genes related to traits affected by selection. In this study, analysis of the genomic impact caused by domestication and artificial selection was conducted by investigating the signatures of selection using single nucleotide polymorphisms (SNPs) in channel catfish (<i>Ictalurus punctatus</i>). A total of 8.4 million candidate SNPs were identified by using next generation sequencing. On average, the channel catfish genome harbors one SNP per 116 bp. Approximately 6.6 million, 5.3 million, 4.9 million, 7.1 million and 6.7 million SNPs were detected in the Marion, Thompson, USDA103, Hatchery strain, and wild population, respectively. The allele frequencies of 407,861 SNPs differed significantly between the domestic and wild populations. With these SNPs, 23 genomic regions with putative selective sweeps were identified that included 11 genes. Although the function for the majority of the genes remain unknown in catfish, several genes with known function related to aquaculture performance traits were included in the regions with selective sweeps. These included hypoxia-inducible factor 1β· <i>HIF</i>ι<i>β ¨</i> and the transporter gene ATP-binding cassette sub-family B member 5 (<i>ABCB5</i>). HIF1β· is important for response to hypoxia and tolerance to low oxygen levels is a critical aquaculture trait. The large numbers of SNPs identified from this study are valuable for the development of high-density SNP arrays for genetic and genomic studies of performance traits in catfish.</p></div
The workflow of data processing.
<p>The raw reads were first trimmed off the low quality reads (Q20) and BAC vectors. <i>De </i><i>novo</i> assembly was then conducted with the filtered high quality reads. The assembled contigs with tag on it, plus singletons which have tag on it were then assigned to each physical contig, based on the specific tag. The tags were removed. The clean sequences were then used as queries to BLAST search against the draft catfish whole genome contigs. The targeted genome contigs were then retrieved and annotated.</p
Flow chart illustrating the physical map contig-specific fragment preparation.
<p>The minimal tilling path BAC clones from each physical contig were selected (highlighted). The pooled BAC DNA from each physical map contig was digested with two 4-bp restriction enzymes, <i>Mse</i> I and <i>Bfa</i> I, respectively. The digestion product was then ligated with in-house designed adaptors, followed by PCR using in-house designed primers. The combination of adaptor and primer formed a specific tag representing each physical contig ID. All PCR products with a physical map contig-specific tag then were pooled together, and sequenced using Illumina HiSeq 2000 platform.</p
The distribution of number of contigs per physical map contig.
<p>The X axis represents the number of sequence contigs contained in each physical map contig, starting with 0-100, followed by 101-200, 201-300, 301-400 and > 400. The Y axis represents the number of physical map contigs.</p