1,722 research outputs found
IEEE Access special section editorial: scalable deep learning for big data
Deep learning (DL) has emerged as a key application exploiting the increasing computational power in systems such as GPUs, multicore processors, Systems-on-Chip (SoC), and distributed clusters. It has also attracted much attention in discovering correlation patterns in data in an unsupervised manner and has been applied in various domains including speech recognition, image classification, natural language processing, and computer vision. Unlike traditional machine learning (ML) approaches, DL also enables dynamic discovery of features from data. In addition, now, a number of commercial vendors also offer accelerators for deep learning systems (such as Nvidia, Intel, and Huawei)
An Introduction to Programming for Bioscientists: A Python-based Primer
Computing has revolutionized the biological sciences over the past several
decades, such that virtually all contemporary research in the biosciences
utilizes computer programs. The computational advances have come on many
fronts, spurred by fundamental developments in hardware, software, and
algorithms. These advances have influenced, and even engendered, a phenomenal
array of bioscience fields, including molecular evolution and bioinformatics;
genome-, proteome-, transcriptome- and metabolome-wide experimental studies;
structural genomics; and atomistic simulations of cellular-scale molecular
assemblies as large as ribosomes and intact viruses. In short, much of
post-genomic biology is increasingly becoming a form of computational biology.
The ability to design and write computer programs is among the most
indispensable skills that a modern researcher can cultivate. Python has become
a popular programming language in the biosciences, largely because (i) its
straightforward semantics and clean syntax make it a readily accessible first
language; (ii) it is expressive and well-suited to object-oriented programming,
as well as other modern paradigms; and (iii) the many available libraries and
third-party toolkits extend the functionality of the core language into
virtually every biological domain (sequence and structure analyses,
phylogenomics, workflow management systems, etc.). This primer offers a basic
introduction to coding, via Python, and it includes concrete examples and
exercises to illustrate the language's usage and capabilities; the main text
culminates with a final project in structural bioinformatics. A suite of
Supplemental Chapters is also provided. Starting with basic concepts, such as
that of a 'variable', the Chapters methodically advance the reader to the point
of writing a graphical user interface to compute the Hamming distance between
two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables,
numerous exercises, and 19 pages of Supporting Information; currently in
press at PLOS Computational Biolog
A Revised Publication Model for ECML PKDD
ECML PKDD is the main European conference on machine learning and data
mining. Since its foundation it implemented the publication model common in
computer science: there was one conference deadline; conference submissions
were reviewed by a program committee; papers were accepted with a low
acceptance rate. Proceedings were published in several Springer Lecture Notes
in Artificial (LNAI) volumes, while selected papers were invited to special
issues of the Machine Learning and Data Mining and Knowledge Discovery
journals. In recent years, this model has however come under stress. Problems
include: reviews are of highly variable quality; the purpose of bringing the
community together is lost; reviewing workloads are high; the information
content of conferences and journals decreases; there is confusion among
scientists in interdisciplinary contexts. In this paper, we present a new
publication model, which will be adopted for the ECML PKDD 2013 conference, and
aims to solve some of the problems of the traditional model. The key feature of
this model is the creation of a journal track, which is open to submissions all
year long and allows for revision cycles.Comment: 13 page
Bioinformatics and Medicine in the Era of Deep Learning
Many of the current scientific advances in the life sciences have their
origin in the intensive use of data for knowledge discovery. In no area this is
so clear as in bioinformatics, led by technological breakthroughs in data
acquisition technologies. It has been argued that bioinformatics could quickly
become the field of research generating the largest data repositories, beating
other data-intensive areas such as high-energy physics or astroinformatics.
Over the last decade, deep learning has become a disruptive advance in machine
learning, giving new live to the long-standing connectionist paradigm in
artificial intelligence. Deep learning methods are ideally suited to
large-scale data and, therefore, they should be ideally suited to knowledge
discovery in bioinformatics and biomedicine at large. In this brief paper, we
review key aspects of the application of deep learning in bioinformatics and
medicine, drawing from the themes covered by the contributions to an ESANN 2018
special session devoted to this topic
Computational Ontologies and Information Systems II: Formal Specification
This paper extends the study of ontologies in Part I of this study (Volume 14, Article 8) in the context of Information Systems. The basic foundations of computational ontologies presented in Part I are extended to formal specifications in this paper. This paper provides a review of the formalisms, languages, and tools for specifying and implementing computational ontologies Directions for future research are also provided
The similarity metric
A new class of distances appropriate for measuring similarity relations
between sequences, say one type of similarity per distance, is studied. We
propose a new ``normalized information distance'', based on the noncomputable
notion of Kolmogorov complexity, and show that it is in this class and it
minorizes every computable distance in the class (that is, it is universal in
that it discovers all computable similarities). We demonstrate that it is a
metric and call it the {\em similarity metric}. This theory forms the
foundation for a new practical tool. To evidence generality and robustness we
give two distinctive applications in widely divergent areas using standard
compression programs like gzip and GenCompress. First, we compare whole
mitochondrial genomes and infer their evolutionary history. This results in a
first completely automatic computed whole mitochondrial phylogeny tree.
Secondly, we fully automatically compute the language tree of 52 different
languages.Comment: 13 pages, LaTex, 5 figures, Part of this work appeared in Proc. 14th
ACM-SIAM Symp. Discrete Algorithms, 2003. This is the final, corrected,
version to appear in IEEE Trans Inform. T
- …