11,737 research outputs found
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
Omnipresent Maxwell’s demons orchestrate information management in living cells
The development of synthetic biology calls for accurate
understanding of the critical functions that allow
construction and operation of a living cell. Besides
coding for ubiquitous structures, minimal genomes
encode a wealth of functions that dissipate energy in
an unanticipated way. Analysis of these functions
shows that they are meant to manage information
under conditions when discrimination of substrates
in a noisy background is preferred over a simple
recognition process. We show here that many of
these functions, including transporters and the ribosome
construction machinery, behave as would
behave a material implementation of the informationmanaging
agent theorized by Maxwell almost
150 years ago and commonly known as Maxwell’s
demon (MxD). A core gene set encoding these functions belongs to the minimal genome required
to allow the construction of an autonomous cell.
These MxDs allow the cell to perform computations
in an energy-efficient way that is vastly better than
our contemporary computers
Evolving Ensemble Fuzzy Classifier
The concept of ensemble learning offers a promising avenue in learning from
data streams under complex environments because it addresses the bias and
variance dilemma better than its single model counterpart and features a
reconfigurable structure, which is well suited to the given context. While
various extensions of ensemble learning for mining non-stationary data streams
can be found in the literature, most of them are crafted under a static base
classifier and revisits preceding samples in the sliding window for a
retraining step. This feature causes computationally prohibitive complexity and
is not flexible enough to cope with rapidly changing environments. Their
complexities are often demanding because it involves a large collection of
offline classifiers due to the absence of structural complexities reduction
mechanisms and lack of an online feature selection mechanism. A novel evolving
ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in
this paper. pENsemble differs from existing architectures in the fact that it
is built upon an evolving classifier from data streams, termed Parsimonious
Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism,
which estimates a localized generalization error of a base classifier. A
dynamic online feature selection scenario is integrated into the pENsemble.
This method allows for dynamic selection and deselection of input features on
the fly. pENsemble adopts a dynamic ensemble structure to output a final
classification decision where it features a novel drift detection scenario to
grow the ensemble structure. The efficacy of the pENsemble has been numerically
demonstrated through rigorous numerical studies with dynamic and evolving data
streams where it delivers the most encouraging performance in attaining a
tradeoff between accuracy and complexity.Comment: this paper has been published by IEEE Transactions on Fuzzy System
HIVToolbox, an Integrated Web Application for Investigating HIV
Many bioinformatic databases and applications focus on a limited domain of knowledge federating links to information in other databases. This segregated data structure likely limits our ability to investigate and understand complex biological systems. To facilitate research, therefore, we have built HIVToolbox, which integrates much of the knowledge about HIV proteins and allows virologists and structural biologists to access sequence, structure, and functional relationships in an intuitive web application. HIV-1 integrase protein was used as a case study to show the utility of this application. We show how data integration facilitates identification of new questions and hypotheses much more rapid and convenient than current approaches using isolated repositories. Several new hypotheses for integrase were created as an example, and we experimentally confirmed a predicted CK2 phosphorylation site. Weblink: [http://hivtoolbox.bio-toolkit.com
Switching Between Discrete and Continuous Models To Predict Genetic Activity
Molecular biologists use a variety of models when they predict the behavior of genetic systems. A discrete model of the behavior of individual macromolecular elements forms the foundation for their theory of each system. Yet a continuous model of the aggregate properties of the system is necessary for many predictive tasks.
I propose to build a computer program, called PEPTIDE, which can predict the behavior of moderately complex genetics systems by performing qualitative simulation on the discrete model, generating a continuous model from the discrete model through aggregation, and applying limit analysis to the continuous model. PEPTIDE's initial knowledge of a specific system will be represented with a discrete model which distinguishes between macromolecule structure and function and which uses five atomic processes as its functional primitives. Qualitative Process (QP) theory [Forbus 83] provides the representation for the continuous model.
Whenever a system has multiple models of a domain, the decision of which model to use in a given time becomes a critically important issue. Knowledge of the relative significance of differing element concentrations and the behavior of process structure cycles will allow PEPTIDE to determine when to switch reasoning modes.MIT Artificial Intelligence Laborator
Classification and reduction of pilot error
Human error is a primary or contributing factor in about two-thirds of commercial aviation accidents worldwide. With the ultimate goal of reducing pilot error accidents, this contract effort is aimed at understanding the factors underlying error events and reducing the probability of certain types of errors by modifying underlying factors such as flight deck design and procedures. A review of the literature relevant to error classification was conducted. Classification includes categorizing types of errors, the information processing mechanisms and factors underlying them, and identifying factor-mechanism-error relationships. The classification scheme developed by Jens Rasmussen was adopted because it provided a comprehensive yet basic error classification shell or structure that could easily accommodate addition of details on domain-specific factors. For these purposes, factors specific to the aviation environment were incorporated. Hypotheses concerning the relationship of a small number of underlying factors, information processing mechanisms, and error types types identified in the classification scheme were formulated. ASRS data were reviewed and a simulation experiment was performed to evaluate and quantify the hypotheses
BioIMAX : a Web2.0 approach to visual data mining in bioimage data
Loyek C. BioIMAX : a Web2.0 approach to visual data mining in bioimage data. Bielefeld: Universität Bielefeld; 2012
- …