30,657 research outputs found
An empirical comparison of supervised machine learning techniques in bioinformatics
Research in bioinformatics is driven by the experimental data.
Current biological databases are populated by vast amounts of
experimental data. Machine learning has been widely applied to
bioinformatics and has gained a lot of success in this research
area. At present, with various learning algorithms available in the
literature, researchers are facing difficulties in choosing the best
method that can apply to their data. We performed an empirical
study on 7 individual learning systems and 9 different combined
methods on 4 different biological data sets, and provide some
suggested issues to be considered when answering the following
questions: (i) How does one choose which algorithm is best
suitable for their data set? (ii) Are combined methods better than
a single approach? (iii) How does one compare the effectiveness
of a particular algorithm to the others
Transforming specifications of observable behaviour into programs
A methodology for deriving programs from specifications of observable
behaviour is described. The class of processes to which this methodology
is applicable includes those whose state changes are fully definable by labelled
transition systems, for example communicating processes without
internal state changes. A logic program representation of such labelled
transition systems is proposed, interpreters based on path searching techniques
are defined, and the use of partial evaluation techniques to derive
the executable programs is described
Recommended from our members
Integrative machine learning approach for multi-class SCOP protein fold classification
Classification and prediction of protein structure has been a central research theme in structural bioinformatics. Due to the imbalanced distribution of proteins over multi SCOP classification, most discriminative machine learning suffers the well-known âFalse Positives â problem when learning over these types of problems. We have devised eKISS, an ensemble machine learning specifically designed to increase the coverage of positive examples when learning under multiclass imbalanced data sets. We have applied eKISS to classify 25 SCOP folds and show that our learning system improved over classical learning methods
Recommended from our members
Characterisation of FAD-family folds using a machine learning approach
Flavin adenine dinucleotide (FAD) and its derivatives play a crucial role in
biological processes. They are major organic cofactors and electron carriers
in both enzymatic activities and biochemical pathways. We have analysed
the relationships between sequence and structure of FAD-containing proteins
using a machine learning approach. Decision trees were generated using the
C4.5 algorithm as a means of automatically generating rules from biological
databases (TOPS, CATH and PDB). These rules were then used as
background knowledge for an ILP system to characterise the four different
classes of FAD-family folds classified in Dym and Eisenberg (2001). These
FAD-family folds are: glutathione reductase (GR), ferredoxin reductase (FR),
p-cresol methylhydroxylase (PCMH) and pyruvate oxidase (PO). Each FADfamily
was characterised by a set of rules. The âknowledge patternsâ
generated from this approach are a set of rules containing conserved sequence
motifs, secondary structure sequence elements and folding information.
Every rule was then verified using statistical evaluation on the measured
significance of each rule. We show that this machine learning approach is
capable of learning and discovering interesting patterns from large biological
databases and can generate âknowledge patternsâ that characterise the FADcontaining
proteins, and at the same time classify these proteins into four
different families
Relations between extensional tectonics and magmatism within the Southern Oklahoma aulacogen
Variations in the geometry, distribution and thickness of Cambrian igneous and sedimentary units within southwest Oklahoma are related to a late Proterozoic - early Paleozoic rifting event which formed the Southern Oklahoma aulacogen. These rock units are exposed in the Wichita Mountains, southwest Olkahoma, located on the northern margin of a Proterozoic basin, identified in the subsurface by COCORP reflection data. Overprinting of the Cambrian extensional event by Pennyslvanian tectonism obsured the influence of pre-existing basement structures and contrasting basement lithologies upon the initial development of the aulacogen
Field Driven Thermostated System : A Non-Linear Multi-Baker Map
In this paper, we discuss a simple model for a field driven, thermostated
random walk that is constructed by a suitable generalization of a multi-baker
map. The map is a usual multi-baker, but perturbed by a thermostated external
field that has many of the properties of the fields used in systems with
Gaussian thermostats. For small values of the driving field, the map is
hyperbolic and has a unique SRB measure that we solve analytically to first
order in the field parameter. We then compute the positive and negative
Lyapunov exponents to second order and discuss their relation to the transport
properties. For higher values of the parameter, this system becomes
non-hyperbolic and posseses an attractive fixed point.Comment: 6 pages + 5 figures, to appear in Phys. Rev.
Solar Orbiter: Exploring the Sun-heliosphere connection
The heliosphere represents a uniquely accessible domain of space, where
fundamental physical processes common to solar, astrophysical and laboratory
plasmas can be studied under conditions impossible to reproduce on Earth and
unfeasible to observe from astronomical distances. Solar Orbiter, the first
mission of ESA's Cosmic Vision 2015-2025 programme, will address the central
question of heliophysics: How does the Sun create and control the heliosphere?
In this paper, we present the scientific goals of the mission and provide an
overview of the mission implementation.Comment: 52 pages, 21 figures, 125 references; accepted for publication in
Solar Physic
Non-equilibrium Lorentz gas on a curved space
The periodic Lorentz gas with external field and iso-kinetic thermostat is
equivalent, by conformal transformation, to a billiard with expanding
phase-space and slightly distorted scatterers, for which the trajectories are
straight lines. A further time rescaling allows to keep the speed constant in
that new geometry. In the hyperbolic regime, the stationary state of this
billiard is characterized by a phase-space contraction rate, equal to that of
the iso-kinetic Lorentz gas. In contrast to the iso-kinetic Lorentz gas where
phase-space contraction occurs in the bulk, the phase-space contraction rate
here takes place at the periodic boundaries
Exons, introns and DNA thermodynamics
The genes of eukaryotes are characterized by protein coding fragments, the
exons, interrupted by introns, i.e. stretches of DNA which do not carry any
useful information for the protein synthesis. We have analyzed the melting
behavior of randomly selected human cDNA sequences obtained from the genomic
DNA by removing all introns. A clear correspondence is observed between exons
and melting domains. This finding may provide new insights in the physical
mechanisms underlying the evolution of genes.Comment: 4 pages, 8 figures - Final version as published. See also Phys. Rev.
Focus 15, story 1
- âŚ