53,100 research outputs found
An empirical comparison of supervised machine learning techniques in bioinformatics
Research in bioinformatics is driven by the experimental data.
Current biological databases are populated by vast amounts of
experimental data. Machine learning has been widely applied to
bioinformatics and has gained a lot of success in this research
area. At present, with various learning algorithms available in the
literature, researchers are facing difficulties in choosing the best
method that can apply to their data. We performed an empirical
study on 7 individual learning systems and 9 different combined
methods on 4 different biological data sets, and provide some
suggested issues to be considered when answering the following
questions: (i) How does one choose which algorithm is best
suitable for their data set? (ii) Are combined methods better than
a single approach? (iii) How does one compare the effectiveness
of a particular algorithm to the others
Recommended from our members
Characterisation of FAD-family folds using a machine learning approach
Flavin adenine dinucleotide (FAD) and its derivatives play a crucial role in
biological processes. They are major organic cofactors and electron carriers
in both enzymatic activities and biochemical pathways. We have analysed
the relationships between sequence and structure of FAD-containing proteins
using a machine learning approach. Decision trees were generated using the
C4.5 algorithm as a means of automatically generating rules from biological
databases (TOPS, CATH and PDB). These rules were then used as
background knowledge for an ILP system to characterise the four different
classes of FAD-family folds classified in Dym and Eisenberg (2001). These
FAD-family folds are: glutathione reductase (GR), ferredoxin reductase (FR),
p-cresol methylhydroxylase (PCMH) and pyruvate oxidase (PO). Each FADfamily
was characterised by a set of rules. The âknowledge patternsâ
generated from this approach are a set of rules containing conserved sequence
motifs, secondary structure sequence elements and folding information.
Every rule was then verified using statistical evaluation on the measured
significance of each rule. We show that this machine learning approach is
capable of learning and discovering interesting patterns from large biological
databases and can generate âknowledge patternsâ that characterise the FADcontaining
proteins, and at the same time classify these proteins into four
different families
Recommended from our members
Multi-class protein fold classification using a new ensemble machine learning approach.
Protein structure classification represents an important process in understanding the associations
between sequence and structure as well as possible functional and evolutionary relationships.
Recent structural genomics initiatives and other high-throughput experiments have populated the
biological databases at a rapid pace. The amount of structural data has made traditional methods
such as manual inspection of the protein structure become impossible. Machine learning has been
widely applied to bioinformatics and has gained a lot of success in this research area. This work
proposes a novel ensemble machine learning method that improves the coverage of the classifiers
under the multi-class imbalanced sample sets by integrating knowledge induced from different base
classifiers, and we illustrate this idea in classifying multi-class SCOP protein fold data. We have
compared our approach with PART and show that our method improves the sensitivity of the
classifier in protein fold classification. Furthermore, we have extended this method to learning over
multiple data types, preserving the independence of their corresponding data sources, and show
that our new approach performs at least as well as the traditional technique over a single joined
data source. These experimental results are encouraging, and can be applied to other bioinformatics
problems similarly characterised by multi-class imbalanced data sets held in multiple data
sources
Recommended from our members
Integrative machine learning approach for multi-class SCOP protein fold classification
Classification and prediction of protein structure has been a central research theme in structural bioinformatics. Due to the imbalanced distribution of proteins over multi SCOP classification, most discriminative machine learning suffers the well-known âFalse Positives â problem when learning over these types of problems. We have devised eKISS, an ensemble machine learning specifically designed to increase the coverage of positive examples when learning under multiclass imbalanced data sets. We have applied eKISS to classify 25 SCOP folds and show that our learning system improved over classical learning methods
Implementation of Design Changes Towards a More Reliable, Hands-off Magnetron Ion Source
As the main ion source for the accelerator complex, magnetron ion
sources have been used at Fermilab since the 1970s. At the offline test stand,
new R&D is carried out to develop and upgrade the present magnetron-type
sources of ions of up to 80 mA and 35 keV beam energy in the context of
the Proton Improvement Plan. The aim of this plan is to provide high-power
proton beams for the experiments at FNAL. In order to reduce the amount of
tuning and monitoring of these ion sources, a new electronic system consisting
of a current-regulated arc discharge modulator allow the ion source to run at a
constant arc current for improved beam output and operation. A solenoid-type
gas valve feeds gas into the source precisely and independently of
ambient temperature. This summary will cover several studies and design changes
that have been tested and will eventually be implemented on the operational
magnetron sources at Fermilab. Innovative results for this type of ion source
include cathode geometries, solenoid gas valves, current controlled arc pulser,
cesium boiler redesign, gas mixtures of hydrogen and nitrogen, and duty factor
reduction, with the aim to improve source lifetime, stability, and reducing the
amount of tuning needed. In this summary, I will highlight the advances made in
ion sources at Fermilab and will outline the directions of the continuing R&D
effort.Comment: 4 pp. arXiv admin note: substantial text overlap with
arXiv:1701.0175
Improvements on the Stability and Operation of a Magnetron H- Ion Source
The magnetron H- ion sources developed in the 1970s currently in operation at
Fermilab provide beam to the rest of the accelerator complex. A series of
modifications to these sources have been tested in a dedicated offline test
stand with the aim of improving different operational issues. The solenoid type
gas valve was tested as an alternative to the piezoelectric gas valve in order
to avoid its temperature dependence. A new cesium oven was designed and tested
in order to avoid glass pieces that were present with the previous oven,
improve thermal insulation and fine tune its temperature. A current-regulated
arc modulator was developed to run the ion source at a constant arc current,
providing very stable beam outputs during operations. In order to reduce beam
noise, the addition of small amounts of N2 gas was explored, as well as testing
different cathode shapes with increasing plasma volume. This paper summarizes
the studies and modifications done in the source over the last three years with
the aim of improving its stability, reliability and overall performance.Comment: 8 pages, 19 figure
Observations of HONO by laser-induced fluorescence at the South Pole during ANTCI 2003
Observations of nitrous acid (HONO) by laser-induced fluorescence (LIF) at the South Pole taken during the Antarctic Troposphere Chemistry Investigation (ANTCI), which took place over the time period of Nov. 15, 2003 to Jan. 4, 2004, are presented here. The median observed mixing ratio of HONO 10 m above the snow was 5.8 pptv (mean value 6.3 pptv) with a maximum of 18.2 pptv on Nov 30th, Dec 1st, 3rd, 15th, 17th, 21st, 22nd, 25th, 27th and 28th. The measurement uncertainty is ±35%. The LIF HONO observations are compared to concurrent HONO observations performed by mist chamber/ion chromatography (MC/IC). The HONO levels reported by MC/IC are about 7.2 ± 2.3 times higher than those reported by LIF. Citation: Liao, W., A. T. Case, J. Mastromarino, D. Tan, and J. E. Dibb (2006), Observations of HONO by laser-induced fluorescence at the South Pole during ANTCI 2003, Geophys. Res. Lett., 33, L09810, doi:10.1029/2005GL025470
Radio Images of 3C 58: Expansion and Motion of its Wisp
New 1.4 GHz VLA observations of the pulsar-powered supernova remnant 3C 58
have resulted in the highest-quality radio images of this object to date. The
images show filamentary structure over the body of the nebula. The present
observations were combined with earlier ones from 1984 and 1991 to investigate
the variability of the radio emission on a variety of time-scales. No
significant changes are seen over a 110 day interval. In particular, the upper
limit on the apparent projected velocity of the wisp is 0.05c. The expansion
rate of the radio nebula was determined between 1984 and 2004, and is
0.014+/-0.003%/year, corresponding to a velocity of 630+/-70 km/s along the
major axis. If 3C 58 is the remnant of SN 1181, it must have been strongly
decelerated, which is unlikely given the absence of emission from the supernova
shell. Alternatively, the low expansion speed and a number of other arguments
suggest that 3C 58 may be several thousand years old and not be the remnant of
SN 1181.Comment: 12 pages; accepted for publication in the Astrophysical Journa
- âŠ