Search CORE

77 research outputs found

An introduction to scripting in Ruby for biologists

Author: Andy Law
D Thomas
DW Mount
J Ousterhout
Jan Aerts
L Carlson
M Fitzgerald
SJ Wheelan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

<p>Abstract</p> <p>The Ruby programming language has a lot to offer to any scientist with electronic data to process. Not only is the initial learning curve very shallow, but its reflection and meta-programming capabilities allow for the rapid creation of relatively complex applications while still keeping the code short and readable. This paper provides a gentle introduction to this scripting language for researchers without formal informatics training such as many wet-lab scientists. We hope this will provide such researchers an idea of how powerful a tool Ruby can be for their data management tasks and encourage them to learn more about it.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Establishing the baseline level of repetitive element expression in the human cortex

Author: B Conrad
B Langmead
BT Wilhelm
C Ladd-Acosta
C Nellaker
D Karolchik
DE Montoya-Durango
E Balada
GF Richard
GJ Faulkner
HH Kazazian
HH Kazazian
JA Armour
Jennifer Parla
JL Weber
JR Landry
JR Landry
JR Landry
M Barak
M Lafon
Melissa Kramer
O Frank
P Jern
PA Callinan
R Cordaux
R Lower
Robert H Yolken
RS Harris
S Mi
S Weis
Sarah J Wheelan
Sarven Sabunciyan
SJ Wheelan
Svitlana Tyekucheva
W Richard McCombie
WA Schulz
YHY Benjamini
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: Although nearly half of the human genome is comprised of repetitive sequences, the expression profile of these elements remains largely uncharacterized. Recently developed high throughput sequencing technologies provide us with a powerful new set of tools to study repeat elements. Hence, we performed whole transcriptome sequencing to investigate the expression of repetitive elements in human frontal cortex using postmortem tissue obtained from the Stanley Medical Research Institute. Results: We found a significant amount of reads from the human frontal cortex originate from repeat elements. We also noticed that Alu elements were expressed at levels higher than expected by random or background transcription. In contrast, L1 elements were expressed at lower than expected amounts. Conclusions: Repetitive elements are expressed abundantly in the human brain. This expression pattern appears to be element specific and can not be explained by random or background transcription. These results demonstrate that our knowledge about repetitive elements is far from complete. Further characterization is required to determine the mechanism, the control, and the effects of repeat element expression

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Springer - Publisher Connector

PubMed Central

Improving the performance of DomainDiscovery of protein domain boundary assignment using inter-domain linker index

Author: A Andreeva
Abdur R Sikder
Albert Y Zomaya
AR Sikder
FMG Pearl
G Pollastri
G Pollastri
HM Berman
J Cheng
J Liu
J Sim
JE Gewehr
L Kong
M Dumontier
M Suyama
N Nagarajan
OV Galzitskaya
RA George
RL Marsden
S Veretnik
SF Altschul
SJ Wheelan
T Joachims
TA Holland
V Vapnik
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Knowledge of protein domain boundaries is critical for the characterisation and understanding of protein function. The ability to identify domains without the knowledge of the structure – by using sequence information only – is an essential step in many types of protein analyses. In this present study, we demonstrate that the performance of DomainDiscovery is improved significantly by including the inter-domain linker index value for domain identification from sequence-based information. Improved DomainDiscovery uses a Support Vector Machine (SVM) approach and a unique training dataset built on the principle of consensus among experts in defining domains in protein structure. The SVM was trained using a PSSM (Position Specific Scoring Matrix), secondary structure, solvent accessibility information and inter-domain linker index to detect possible domain boundaries for a target sequence. RESULTS: Improved DomainDiscovery is compared with other methods by benchmarking against a structurally non-redundant dataset and also CASP5 targets. Improved DomainDiscovery achieves 70% accuracy for domain boundary identification in multi-domains proteins. CONCLUSION: Improved DomainDiscovery compares favourably to the performance of other methods and excels in the identification of domain boundaries for multi-domain proteins as a result of introducing support vector machine with benchmark_2 dataset

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Collection of Epithelial Cells from Rodent Mammary Gland Via Laser Capture Microdissection Yielding High-Quality RNA Suitable for Microarray Analysis

Author: A Mikulowska-Mennis
B Domazet
D Gugic
GA Balogh
H Wang
Henry J. Thompson
HJ Thompson
HS Erickson
IA Kerman
IM Leiva
JJ Upson
John N. McGinley
K Specht
M Schena
M Schena
M Srinivasan
ML Cox
N Masuda
NL Simone
O Kabbarah
P Sarder
P Sluka
R Singh
S Weis
SJ Wheelan
SM Goldsworthy
V Espina
V Vincek
Weiqin Jiang
Z Chen
Z Zhu
Zongjian Zhu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Laser capture microdissection (LCM) enables collection of cell populations highly enriched for specific cell types that have the potential of yielding critical information about physiological and pathophysiological processes. One use of cells collected by LCM is for gene expression profiling. Samples intended for transcript analyses should be of the highest quality possible. RNA degradation is an ever-present concern in molecular biological assays, and LCM is no exception. This paper identifies issues related to preparation, collection, and processing in a lipid-rich tissue, rodent mammary gland, in which the epithelial to stromal cell ratio is low and the stromal component is primarily adipocytes, a situation that presents numerous technical challenges for high-quality RNA isolation. Our goal was to improve the procedure so that a greater probe set present call rate would be obtained when isolated RNA was evaluated using Affymetrix microarrays. The results showed that the quality of RNA isolated from epithelial cells of both mammary gland and mammary adenocarcinomas was high with a probe set present call rate of 65% and a high signal-to-noise ratio

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Multi-ancestry fine mapping of interferon lambda and the outcome of acute hepatitis C virus infection

Author: Alexander G
Alric L
Busch MP
Chung RT
Cox AL
Cramp ME
Donfield SM
Duggal P
Edlin BR
Goedert JJ
Johnson EO
Khakoo S
Kim AY
Kirk GD
Kral AH
Latanich R
Lauer GM
Mangia A
Mehta SH
Murphy EL
O' Brien TR
Peters MG
Piazzola V
Rosen HR
Taub MA
Thio CL
Thomas DL
Timp W
Valencia A
Vergara C
Wheelan SJ
Wojcik GL
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2020
Field of study

Clearance of acute infection with hepatitis C virus (HCV) is associated with the chr19q13.13 region containing the rs368234815 (TT/ΔG) polymorphism. We fine-mapped this region to detect possible causal variants that may contribute to HCV clearance. First, we performed sequencing of IFNL1-IFNL4 region in 64 individuals sampled according to rs368234815 genotype: TT/clearance (N = 16) and ΔG/persistent (N = 15) (genotype-outcome concordant) or TT/persistent (N = 19) and ΔG/clearance (N = 14) (discordant). 25 SNPs had a difference in counts of alternative allele >5 between clearance and persistence individuals. Then, we evaluated those markers in an association analysis of HCV clearance conditioning on rs368234815 in two groups of European (692 clearance/1 025 persistence) and African ancestry (320 clearance/1 515 persistence) individuals. 10/25 variants were associated (P < 0.05) in the conditioned analysis leaded by rs4803221 (P value = 4.9 × 10−04) and rs8099917 (P value = 5.5 × 10−04). In the European ancestry group, individuals with the haplotype rs368234815ΔG/rs4803221C were 1.7× more likely to clear than those with the rs368234815ΔG/rs4803221G haplotype (P value = 3.6 × 10−05). For another nearby SNP, the haplotype of rs368234815ΔG/rs8099917T was associated with HCV clearance compared to rs368234815ΔG/rs8099917G (OR: 1.6, P value = 1.8 × 10−04). We identified four possible causal variants: rs368234815, rs12982533, rs10612351 and rs4803221. Our results suggest a main signal of association represented by rs368234815, with contributions from rs4803221, and/or nearby SNPs including rs8099917

UCL Discovery

A Systematic Survey of Mini-Proteins in Bacteria and Archaea

Author: A de la Pena-Moctezuma
B Imperiali
DJ Lipman
DT Krieger
Fengyu Wang
Guoqiang Zhang
H Amiri
H Seligmann
J Ebedes
JC Hotopp
Jingfa Xiao
Josh Bongard
JP Kastenmayer
Jun Yu
KA Brayton
L Brocchieri
L Martin
Linlin Pan
Ming Yang
P Setlow
R Apweiler
RL Tatusov
S Gribaldo
S Kumar
SH Gellman
Shouguang Jin
SJ Wheelan
UH Ha
W Wu
Publication venue: Public Library of Science
Publication date
Field of study

BACKGROUND: Mini-proteins, defined as polypeptides containing no more than 100 amino acids, are ubiquitous in prokaryotes and eukaryotes. They play significant roles in various biological processes, and their regulatory functions gradually attract the attentions of scientists. However, the functions of the majority of mini-proteins are still largely unknown due to the constraints of experimental methods and bioinformatic analysis. METHODOLOGY/PRINCIPAL FINDINGS: In this article, we extracted a total of 180,879 mini-proteins from the annotations of 532 sequenced genomes, including 491 strains of Bacteria and 41 strains of Archaea. The average proportion of mini-proteins among all genomic proteins is approximately 10.99%, but different strains exhibit remarkable fluctuations. These mini-proteins display two notable characteristics. First, the majority are species-specific proteins with an average proportion of 58.79% among six representative phyla. Second, an even larger proportion (70.03% among all strains) is hypothetical proteins. However, a fraction of highly conserved hypothetical proteins potentially play crucial roles in organisms. Among mini-proteins with known functions, it seems that regulatory and metabolic proteins are more abundant than essential structural proteins. Furthermore, domains in mini-proteins seem to have greater distributions in Bacteria than Eukarya. Analysis of the evolutionary progression of these domains reveals that they have diverged to new patterns from a single ancestor. CONCLUSIONS/SIGNIFICANCE: Mini-proteins are ubiquitous in bacterial and archaeal species and play significant roles in various functions. The number of mini-proteins in each genome displays remarkable fluctuation, likely resulting from the differential selective pressures that reflect the respective life-styles of the organisms. The answers to many questions surrounding mini-proteins remain elusive and need to be resolved experimentally

Crossref

Directory of Open Access Journals

PubMed Central

BAC-Based Sequencing of Behaviorally-Relevant Genes in the Prairie Vole

Author: B Aragona
BJ Aragona
BS Cushing
BS Cushing
CS Carter
EA Hammock
J Davis
J Zhang
James W. Thomas
Jamie K. Davis
JD Thompson
JE Pool
JF Baines
JF Storz
JI Kim
JW Thomas
K Lei
KA Young
KM Kramer
L Hellborg
LA McGraw
LA McGraw
Larry J. Young
Lisa A. McGraw
LJ Young
LJ Young
LJ Young
M Kimura
M Lim
M Lynch
O Berton
OJ Bosch
Pamela J. Thomas
PC Ng
R Li
RW Blakesley
S Levy
S Schwartz
S Steppan
SJ Wheelan
WJ Murphy
Y Liu
Z Yang
Zhanjiang Liu
ZR Donaldson
Publication venue: Public Library of Science
Publication date: 06/01/2012
Field of study

The prairie vole (Microtus ochrogaster) is an important model organism for the study of social behavior, yet our ability to correlate genes and behavior in this species has been limited due to a lack of genetic and genomic resources. Here we report the BAC-based targeted sequencing of behaviorally-relevant genes and flanking regions in the prairie vole. A total of 6.4 Mb of non-redundant or haplotype-specific sequence assemblies were generated that span the partial or complete sequence of 21 behaviorally-relevant genes as well as an additional 55 flanking genes. Estimates of nucleotide diversity from 13 loci based on alignments of 1.7 Mb of haplotype-specific assemblies revealed an average pair-wise heterozygosity (8.4×10−3). Comparative analyses of the prairie vole proteins encoded by the behaviorally-relevant genes identified >100 substitutions specific to the prairie vole lineage. Finally, our sequencing data indicate that a duplication of the prairie vole AVPR1A locus likely originated from a recent segmental duplication spanning a minimum of 105 kb. In summary, the results of our study provide the genomic resources necessary for the molecular and genetic characterization of a high-priority set of candidate genes for regulating social behavior in the prairie vole

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Mathematical model for empirically optimizing large scale production of soluble protein domains

Author: A Fontana
A Kouranov
Atsushi Kurotani
BH Dessailly
C Zhang
D Christ
DT Jones
E Chikayama
Eisuke Chikayama
F Corpet
GE Folkers
GE Tusnady
HM Berman
JM Chandonia
M Dumontier
M Suyama
PB Card
R Kikuno
RL Marsden
S Cabantous
S Dokudovskaya
S Miyazaki
S Miyazaki
Satoshi Miyazaki
SF Altschul
Shigeyuki Yokoyama
SJ Wheelan
T Hondoh
T Kigawa
T Niwa
T Tanaka
Takanori Tanaka
Takashi Yabuki
TC Terwilliger
X Gao
Y Kuroda
Yutaka Kuroda
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The Impact of CpG Island on Defining Transcriptional Activation of the Mouse L1 Retrotransposable Elements

Author: A Huda
A Huda
A Mazo
A Rabinovich
AF Smit
AR Muotri
AV Furano
AV Furano
C Esnault
CY Ko
Danny Rangasamy
DE Montoya-Durango
DE Schones
DV Babushok
E Allen
EM Ostertag
ES Lander
FM Pauler
H Khan
H Khatib
I. King Jordan
JA Bailey
JC Walser
JH Martens
JL Goodier
JM Chen
JN Athanikar
JS Han
Jun Fan
K Hata
KP Lu
M. Frances Shannon
MJ Thomas
N Yang
P Fontanillas
P Nigumann
R Beraldi
RH Waterston
RJ DeBerardinis
RJ DeBerardinis
RK Slotkin
S Boissinot
SE Dmitriev
SH Lee
SJ Wheelan
Soo-Young Cho
Sung-Hun Lee
T Graham
T Penzkofer
TP Naas
Y Kinoshita
Z Wang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

BACKGROUND: L1 retrotransposable elements are potent insertional mutagens responsible for the generation of genomic variation and diversification of mammalian genomes, but reliable estimates of the numbers of actively transposing L1 elements are mostly nonexistent. While the human and mouse genomes contain comparable numbers of L1 elements, several phylogenetic and L1Xplore analyses in the mouse genome suggest that 1,500-3,000 active L1 elements currently exist and that they are still expanding in the genome. Conversely, the human genome contains only 150 active L1 elements. In addition, there is a discrepancy among the nature and number of mouse L1 elements in L1Xplore and the mouse genome browser at the UCSC and in the literature. To date, the reason why a high copy number of active L1 elements exist in the mouse genome but not in the human genome is unknown, as are the potential mechanisms that are responsible for transcriptional activation of mouse L1 elements. METHODOLOGY/PRINCIPAL FINDINGS: We analyzed the promoter sequences of the 1,501 potentially active mouse L1 elements retrieved from the GenBank and L1Xplore databases and evaluated their transcription factors binding sites and CpG content. To this end, we found that a substantial number of mouse L1 elements contain altered transcription factor YY1 binding sites on their promoter sequences that are required for transcriptional initiation, suggesting that only a half of L1 elements are capable of being transcriptionally active. Furthermore, we present experimental evidence that previously unreported CpG islands exist in the promoters of the most active T(F) family of mouse L1 elements. The presence of sequence variations and polymorphisms in CpG islands of L1 promoters that arise from transition mutations indicates that CpG methylation could play a significant role in determining the activity of L1 elements in the mouse genome. CONCLUSIONS: A comprehensive analysis of mouse L1 promoters suggests that the number of transcriptionally active elements is significantly lower than the total number of full-length copies from the three active mouse L1 families. Like human L1 elements, the CpG islands and potentially the transcription factor YY1 binding sites are likely to be required for transcriptional initiation of mouse L1 elements

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

University of Canberra Research Repository

The Australian National University

Sequences, Annotation and Single Nucleotide Polymorphism of the Major Histocompatibility Complex in the Domestic Cat

Author: AJ Pearks Wilkerson
AL Roca
AL Roca
B Ewing
B Ewing
BH Koller
C Burge
CA Stewart
D Gordon
D Gordon
E Birney
EW Brown
Hans Ellegren
HL Niman
J Klein
J Loconto
J Pontius
JA Traherne
James C. Mullikin
JC Mullikin
JL Troyer
JL Troyer
K Okita
K Takahashi
M Krawczyk
MA Carpenter
N Yuhki
N Yuhki
N Yuhki
N Yuhki
N Yuhki
Naoya Yuhki
PJ van den Elsen
R Horton
R Horton
RJ Allcock
RM Younger
Robert Stephens
S Schwartz
S Schwartz
SF Altschul
SJ Wheelan
Stephen J. O'Brien
T Beck
TA Tatusova
Thomas Beck
TW Beck
V Klement
WA Nelson-Rees
WD Hardy
Publication venue: Public Library of Science
Publication date: 01/06/2008
Field of study

Two sequences of major histocompatibility complex (MHC) regions in the domestic cat, 2.976 and 0.362 Mbps, which were separated by an ancient chromosome break (55–80 MYA) and followed by a chromosomal inversion were annotated in detail. Gene annotation of this MHC was completed and identified 183 possible coding regions, 147 human homologues, possible functional genes and 36 pseudo/unidentified genes) by GENSCAN and BLASTN, BLASTP RepeatMasker programs. The first region spans 2.976 Mbp sequence, which encodes six classical class II antigens (three DRA and three DRB antigens) lacking the functional DP, DQ regions, nine antigen processing molecules (DOA/DOB, DMA/DMB, TAPASIN, and LMP2/LMP7,TAP1/TAP2), 52 class III genes, nineteen class I genes/gene fragments (FLAI-A to FLAI-S). Three class I genes (FLAI-H, I-K, I-E) may encode functional classical class I antigens based on deduced amino acid sequence and promoter structure. The second region spans 0.362 Mbp sequence encoding no class I genes and 18 cross-species conserved genes, excluding class I, II and their functionally related/associated genes, namely framework genes, including three olfactory receptor genes. One previously identified feline endogenous retrovirus, a baboon retrovirus derived sequence (ECE1) and two new endogenous retrovirus sequences, similar to brown bat endogenous retrovirus (FERVmlu1, FERVmlu2) were found within a 140 Kbp interval in the middle of class I region. MHC SNPs were examined based on comparisons of this BAC sequence and MHC homozygous 1.9× WGS sequences and found that 11,654 SNPs in 2.84 Mbp (0.00411 SNP per bp), which is 2.4 times higher rate than average heterozygous region in the WGS (0.0017 SNP per bp genome), and slightly higher than the SNP rate observed in human MHC (0.00337 SNP per bp)

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

NSU Works