199,093 research outputs found
In vitro identification and in silico utilization of interspecies sequence similarities using GeneChip(Ā® )technology
BACKGROUND: Genomic approaches in large animal models (canine, ovine etc) are challenging due to insufficient genomic information for these species and the lack of availability of corresponding microarray platforms. To address this problem, we speculated that conserved interspecies genetic sequences can be experimentally detected by cross-species hybridization. The Affymetrix platform probe redundancy offers flexibility in selecting individual probes with high sequence similarities between related species for gene expression analysis. RESULTS: Gene expression profiles of 40 canine samples were generated using the human HG-U133A GeneChip (U133A). Due to interspecies genetic differences, only 14 Ā± 2% of canine transcripts were detected by U133A probe sets whereas profiling of 40 human samples detected 49 Ā± 6% of human transcripts. However, when these probe sets were deconstructed into individual probes and examined performance of each probe, we found that 47% of human probes were able to find their targets in canine tissues and generate a detectable hybridization signal. Therefore, we restricted gene expression analysis to these probes and observed the 60% increase in the number of identified canine transcripts. These results were validated by comparison of transcripts identified by our restricted analysis of cross-species hybridization with transcripts identified by hybridization of total lung canine mRNA to new Affymetrix Canine GeneChip(Ā®). CONCLUSION: The experimental identification and restriction of gene expression analysis to probes with detectable hybridization signal drastically increases transcript detection of canine-human hybridization suggesting the possibility of broad utilization of cross-hybridizations of related species using GeneChip technology
DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks
Pre-trained large language models demonstrate potential in extracting
information from DNA sequences, yet adapting to a variety of tasks and data
modalities remains a challenge. To address this, we propose DNAGPT, a
generalized DNA pre-training model trained on over 200 billion base pairs from
all mammals. By enhancing the classic GPT model with a binary classification
task (DNA sequence order), a numerical regression task (guanine-cytosine
content prediction), and a comprehensive token language, DNAGPT can handle
versatile DNA analysis tasks while processing both sequence and numerical data.
Our evaluation of genomic signal and region recognition, mRNA abundance
regression, and artificial genomes generation tasks demonstrates DNAGPT's
superior performance compared to existing models designed for specific
downstream tasks, benefiting from pre-training using the newly designed model
structure
The neuropeptide transcriptome of a model echinoderm, the sea urchin Strongylocentrotus purpuratus
The work reported here was supported by a grant from the University of London Central Research Fun
Local Binary Patterns as a Feature Descriptor in Alignment-free Visualisation of Metagenomic Data
Shotgun sequencing has facilitated the analysis of complex microbial communities. However, clustering and visualising these communities without prior taxonomic information is a major challenge. Feature descriptor methods can be utilised to extract these taxonomic relations from the data. Here, we present a novel approach consisting of local binary patterns (LBP) coupled with randomised singular value decomposition (RSVD) and Barnes-Hut t-stochastic neighbor embedding (BH-tSNE) to highlight the underlying taxonomic structure of the metagenomic data. The effectiveness of our approach is demonstrated using several simulated and a real metagenomic datasets
Ks1, an epithelial cell-specific gene, responds to early signals of head formation in Hydra
As a molecular marker for head specification in
Hydra, we
have cloned an epithelial cell-specific gene which responds
to early signals of head formation. The gene, designated
ks1, encodes a 217-amino acid protein lacking significant
sequence similarity to any known protein. KS1 contains a
N-terminal signal sequence and is rich in charged residues
which are clustered in several domains. ks1 is expressed in
tentacle-specific epithelial cells (battery cells) as well as in
a small fraction of ectodermal epithelial cells in the gastric
region subjacent to the tentacles. Treatment with the
protein kinase C activator 12-O-tetradecanoylphorbol-13-
acetate (TPA) causes a rapid increase in the level of ks1
mRNA in head-specific epithelial cells and also induces
ectopic ks1 expression in cells of the gastric region.
Sequence elements in the 5
Ā¢-flanking region of ks1 that are
related to TPA-responsive elements may mediate the TPA
inducibility of ks1 expression. The pattern of expression of
ks1 suggests that a ligand-activated diacylglycerol second
messenger system is involved in head-specific differentiation
Recommended from our members
The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization.
Sorghum bicolor is a drought tolerant C4 grass used for the production of grain, forage, sugar, and lignocellulosic biomass and a genetic model for C4 grasses due to its relatively small genome (approximately 800 Mbp), diploid genetics, diverse germplasm, and colinearity with other C4 grass genomes. In this study, deep sequencing, genetic linkage analysis, and transcriptome data were used to produce and annotate a high-quality reference genome sequence. Reference genome sequence order was improved, 29.6 Mbp of additional sequence was incorporated, the number of genes annotated increased 24% to 34 211, average gene length and N50 increased, and error frequency was reduced 10-fold to 1 per 100 kbp. Subtelomeric repeats with characteristics of Tandem Repeats in Miniature (TRIM) elements were identified at the termini of most chromosomes. Nucleosome occupancy predictions identified nucleosomes positioned immediately downstream of transcription start sites and at different densities across chromosomes. Alignment of more than 50 resequenced genomes from diverse sorghum genotypes to the reference genome identified approximately 7.4 M single nucleotide polymorphisms (SNPs) and 1.9 M indels. Large-scale variant features in euchromatin were identified with periodicities of approximately 25 kbp. A transcriptome atlas of gene expression was constructed from 47 RNA-seq profiles of growing and developed tissues of the major plant organs (roots, leaves, stems, panicles, and seed) collected during the juvenile, vegetative and reproductive phases. Analysis of the transcriptome data indicated that tissue type and protein kinase expression had large influences on transcriptional profile clustering. The updated assembly, annotation, and transcriptome data represent a resource for C4 grass research and crop improvement
- ā¦