Search CORE

14 research outputs found

An intuitive Python interface for Bioconductor libraries demonstrates the utility of language translators

Author: DG Bobrow
JE Stajich
L Prechelt
Laurent Gautier
MD Robinson
PJ Cock
R Development Core Team
R Knight
RC Gentleman
RCG Holland
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Computer languages can be domain-related, and in the case of multidisciplinary projects, knowledge of several languages will be needed in order to quickly implements ideas. Moreover, each computer language has relative strong points, making some languages better suited than others for a given task to be implemented. The Bioconductor project, based on the R language, has become a reference for the numerical processing and statistical analysis of data coming from high-throughput biological assays, providing a rich selection of methods and algorithms to the research community. At the same time, Python has matured as a rich and reliable language for the agile development of prototypes or final implementations, as well as for handling large data sets. Results The data structures and functions from Bioconductor can be exposed to Python as a regular library. This allows a fully transparent and native use of Bioconductor from Python, without one having to know the R language and with only a small community of <it>translators</it> required to know both. To demonstrate this, we have implemented such Python representations for key infrastructure packages in Bioconductor, letting a Python programmer handle annotation data, microarray data, and next-generation sequencing data. Conclusions Bioconductor is now not solely reserved to R users. Building a Python application using Bioconductor functionality can be done just like if Bioconductor was a Python package. Moreover, similar principles can be applied to other languages and libraries. Our Python package is available at: <url>http://pypi.python.org/pypi/rpy2-bioconductor-extensions/</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Online Research Database In Technology

OntoGene web services for biomedical text mining

Author: A Davis
AA Morgan
AJ Williams
AR Aronson
C Arighi
C Jonquet
C Stark
CN Arighi
D Campos
D Ferrucci
D Maglott
D Rebholz-Schuhmann
D Rebholz-Schuhmann
D Rebholz-Schuhmann
DC Comeau
F Leitner
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
Fabio Rinaldi
G Schneider
G Schneider
G Schneider
G Schneider
GK Savova
H Cunningham
H Hermjakob
Hernani Marques
I Androutsopoulos
I Segura-Bedmar
J Hakenberg
J Hakenberg
J Kim
JD Kim
JD Kim
K Dolinski
K Haverinen
K Kaljurand
K Sangkuhl
KB Cohen
L Richardson
L Tanabe
M Craven
M Krallinger
M Krallinger
M Mintz
Martin Romacker
R Hoffmann
Raul Rodriguez-Esteban
S Clematide
S Clematide
S Federhen
S Gama-Castro
S Gama-Castro
S Gama-Castro
Simon Clematide
T Consortium
T Kappeler
Tilia Ellendorff
W Liu
W Sun
X Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

BioC Implementations in Go, Perl, Python and Ruby

Author: Comeau Donald C
Doğan Rezarta Islamaj
Kwon Dongseop
Liu Wanli
Marques Hernani
Rinaldi Fabio
Wilbur W John
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

As part of a communitywide effort for evaluating text mining and information extraction systems applied to the biomedical domain, BioC is focused on the goal of interoperability, currently a major barrier to wide-scale adoption of text mining tools. BioC is a simple XML format, specified by DTD, for exchanging data for biomedical natural language processing. With initial implementations in C++ and Java, BioC provides libraries of code for reading and writing BioC text documents and annotations. We extend BioC to Perl, Python, Go and Ruby. We used SWIG to extend the C++ implementation for Perl and one Python implementation. A second Python implementation and the Ruby implementation use native data structures and libraries. BioC is also implemented in the Google language Go. BioC modules are functional in all of these languages, which can facilitate text mining tasks. BioC implementations are freely available through the BioC site: http://bioc.sourceforge.net

Crossref

PubMed Central

ZORA

BioC implementations in Go, Perl, Python and Ruby

Author: D. C. Comeau
D. Kwon
F. Rinaldi
H. Marques
R. Islamaj Do an
Stajich
W. J. Wilbur
W. Liu
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

Improving the prediction of transcription factor binding sites to aid the interpretation of non-coding single nucleotide variants

Author: Jayaram N
Publication venue: UCL (University College London)
Publication date: 28/05/2017
Field of study

Single nucleotide variants (SNVs) that occur in transcription factor binding sites (TFBSs) can disrupt the binding of transcription factors and alter gene expression which can cause inherited diseases and act as driver SNVs in cancer. The identification of SNVs in TFBSs has historically been challenging given the limited number of experimentally characterised TFBSs. The recent ENCODE project has resulted in the availability of ChIP-Seq data that provides genome wide sets of regions bound by transcription factors. These data have the potential to improve the identification of SNVs in TFBSs. However, as the ChIP-Seq data identify a broader range of DNA in which a transcription factor binds, computational prediction is required to identify the precise TFBS. Prediction of TFBSs involves scanning a DNA sequence with a Position Weight Matrix (PWM) using a pattern matching tool. This thesis focusses on the prediction of TFBSs by: (a) evaluating a set of locally-installable pattern-matching tools and identifying the best performing tool (FIMO), (b) using the ENCODE ChIP-Seq data to evaluate a set of de novo motif discovery tools that are used to derive PWMs which can handle large volumes of data, (c) identifying the best performing tool (rGADEM), (d) using rGADEM to generate a set of PWMs from the ENCODE ChIP-Seq data and (e) by finally checking that the selection of the best pattern matching tool is not unduly influenced by the choice of PWMs. These analyses were exploited to obtain a set of predicted TFBSs from the ENCODE ChIP-Seq data. The predicted TFBSs were utilised to analyse somatic cancer driver, and passenger SNVs that occur in TFBSs. Clear signals in conservation and therefore Shannon entropy values were identified, and subsequently exploited to identify a threshold that can be used to prioritize somatic cancer driver SNVs for experimental validation

UCL Discovery