9 research outputs found
Human pol II promoter prediction: time series descriptors and machine learning
Although several in silico promoter prediction methods have been developed to date, they are still limited in predictive performance. The limitations are due to the challenge of selecting appropriate features of promoters that distinguish them from non-promoters and the generalization or predictive ability of the machine-learning algorithms. In this paper we attempt to define a novel approach by using unique descriptors and machine-learning methods for the recognition of eukaryotic polymerase II promoters. In this study, non-linear time series descriptors along with non-linear machine-learning algorithms, such as support vector machine (SVM), are used to discriminate between promoter and non-promoter regions. The basic idea here is to use descriptors that do not depend on the primary DNA sequence and provide a clear distinction between promoter and non-promoter regions. The classification model built on a set of 1000 promoter and 1500 non-promoter sequences, showed a 10-fold cross-validation accuracy of 87% and an independent test set had an accuracy >85% in both promoter and non-promoter identification. This approach correctly identified all 20 experimentally verified promoters of human chromosome 22. The high sensitivity and selectivity indicates that n-mer frequencies along with non-linear time series descriptors, such as Lyapunov component stability and Tsallis entropy, and supervised machine-learning methods, such as SVMs, can be useful in the identification of pol II promoters
DivCalc: A Utility for Diversity Analysis and Compound Sampling
Abstract: Diversity, in the form of genetic diversity, chemical diversity etc, is a very important concept in several areas of scientific research, and calculation of diversity is one of the most important considerations in pre-clinical drug discovery research and, in particular, in design of diverse chemical libraries for combinatorial chemistry and compound selection for High Throughput Screening (HTS). DivCalc is a Windows TM based software that implements a previously published method of diversity calculation [1]. This facilitates sampling of a given data matrix to obtain the most diverse compounds that span the entire descriptor space
doi:10.1093/nar/gki320 ERRATUM Human pol II promoter prediction: time series descriptors and machine learning
The publishers would like to apologize for an error in the references. Reference 14 (Gangal et al., 2003) should disregarded as it is not cited in this paper, and the references following it should be renumbered. The entire article is reprinted here with the references and their citations corrected
SVMCRYS: An SVM Approach for the Prediction of Protein Crystallization Propensity from Protein Sequence
Potential Application of Network Descriptions for Understanding Conformational Changes and Protonation States of ABC Transporters
The ABC (ATP Binding Cassette) transporter protein superfamily comprises a
large number of ubiquitous and functionally versatile proteins conserved from
archaea to humans. ABC transporters have a key role in many human diseases and
also in the development of multidrug resistance in cancer and in parasites.
Although a dramatic progress has been achieved in ABC protein studies in the
last decades, we are still far from a detailed understanding of their molecular
functions. Several aspects of pharmacological ABC transporter targeting also
remain unclear. Here we summarize the conformational and protonation changes of
ABC transporters and the potential use of this information in pharmacological
design. Network related methods, which recently became useful tools to describe
protein structure and dynamics, have not been applied to study allosteric
coupling in ABC proteins as yet. A detailed description of the strengths and
limitations of these methods is given, and their potential use in describing
ABC transporter dynamics is outlined. Finally, we highlight possible future
aspects of pharmacological utilization of network methods and outline the
future trends of this exciting field.Comment: 18 pages, 3 Figures and 241 reference