7 research outputs found

    Full "Laplacianised" posterior naive Bayesian algorithm

    Get PDF
    BACKGROUND: In the last decade the standard Naive Bayes (SNB) algorithm has been widely employed in multi–class classification problems in cheminformatics. This popularity is mainly due to the fact that the algorithm is simple to implement and in many cases yields respectable classification results. Using clever heuristic arguments β€œanchored” by insightful cheminformatics knowledge, Xia et al. have simplified the SNB algorithm further and termed it the Laplacian Corrected Modified Naive Bayes (LCMNB) approach, which has been widely used in cheminformatics since its publication. In this note we mathematically illustrate the conditions under which Xia et al.’s simplification holds. It is our hope that this clarification could help Naive Bayes practitioners in deciding when it is appropriate to employ the LCMNB algorithm to classify large chemical datasets. RESULTS: A general formulation that subsumes the simplified Naive Bayes version is presented. Unlike the widely used NB method, the Standard Naive Bayes description presented in this work is discriminative (not generative) in nature, which may lead to possible further applications of the SNB method. CONCLUSIONS: Starting from a standard Naive Bayes (SNB) algorithm, we have derived mathematically the relationship between Xia et al.’s ingenious, but heuristic algorithm, and the SNB approach. We have also demonstrated the conditions under which Xia et al.’s crucial assumptions hold. We therefore hope that the new insight and recommendations provided can be found useful by the cheminformatics community

    Proposed algorithm for image classification using regression-based pre-processing and recognition models

    Get PDF
    Image classification algorithms can categorise pixels regarding to image attributes with the pre-processing of learner’s trained samples. The precision and classification accuracy are complex to compute due to the variable size of pixels (different image width and height) and numerous characteristics of image per se. This research proposes an image classification algorithm based on regression-based pre-processing and the recognition models. The proposed algorithm focuses on an optimization of pre-processing results such as accuracy and precision. To evaluate and validate, recognition model is mapped in order to cluster the digital images which are developing the problem of a multidimensional state space. Simulation results show that compared to existing algorithms, the proposed method outperforms with the optimal number of precision and accuracy in classification as well as results higher matching percentage based upon image analytics

    Granularity analysis of classification and estimation for complex datasets with MOA

    Get PDF
    Dispersed and unstructured datasets are substantial parameters to realize an exact amount of the required space. Depending upon the size and the data distribution, especially, if the classes are significantly associating, the level of granularity to agree a precise classification of the datasets exceeds. The data complexity is one of the major attributes to govern the proper value of the granularity, as it has a direct impact on the performance. Dataset classification exhibits the vital step in complex data analytics and designs to ensure that dataset is prompt to be efficiently scrutinized. Data collections are always causing missing, noisy and out-of-the-range values. Data analytics which has not been wisely classified for problems as such can induce unreliable outcomes. Hence, classifications for complex data sources help comfort the accuracy of gathered datasets by machine learning algorithms. Dataset complexity and pre-processing time reflect the effectiveness of individual algorithm. Once the complexity of datasets is characterized then comparatively simpler datasets can further investigate with parallelism approach. Speedup performance is measured by the execution of MOA simulation. Our proposed classification approach outperforms and improves granularity level of complex datasets

    ΠšΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½ΠΎΠ΅ ΠΏΡ€ΠΎΠ³Π½ΠΎΠ·ΠΈΡ€ΠΎΠ²Π°Π½ΠΈΠ΅ спСктров биологичСской активности химичСских соСдинСний: возмоТности ΠΈ ограничСния

    Get PDF
    oai:www.bmc-rm.org:article/4An essential characteristic of chemical compounds is their biological activity since its presence can become the basis for the use of the substance for therapeutic purposes, or, on the contrary, limit the possibilities of its practical application due to the manifestation of side action and toxic effects. Computer assessment of the biological activity spectra makes it possible to determine the most promising directions for the study of the pharmacological action of particular substances, and to filter out potentially dangerous molecules at the early stages of research. For more than 25 years, we have been developing and improving the computer program PASS (Prediction of Activity Spectra for Substances), designed to predict the biological activity spectrum of substance based on the structural formula of its molecules. The prediction is carried out by the analysis of structure-activity relationships for the training set, which currently contains information on structures and known biological activities for more than one million molecules. The structure of the organic compound is represented in PASS using Multilevel Neighborhoods of Atoms descriptors; the activity prediction for new compounds is performed by the naive Bayes classifier and the structure-activity relationships determined by the analysis of the training set. We have created and improved both local versions of the PASS program and freely available web resources based on PASS (http://www.way2drug.com). They predict several thousand biological activities (pharmacological effects, molecular mechanisms of action, specific toxicity and adverse effects, interaction with the unwanted targets, metabolism and action on molecular transport), cytotoxicity for tumor and non-tumor cell lines, carcinogenicity, induced changes of gene expression profiles, metabolic sites of the major enzymes of the first and second phases of xenobiotics biotransformation, and belonging to substrates and/or metabolites of metabolic enzymes. The web resource Way2Drug is used by over 19 000 researchers from more than 100 countries around the world, which allowed them to obtain over 600 000 predictions and publish about 500 papers describing the obtained results. The analysis of the published works shows that in some cases the interpretation of the prediction results presented by the authors of these publications requires an adjustment. In this work, we provide the theoretical basis and consider, on particular examples, the opportunities and limitations of computer-aided prediction of biological activity spectra.Π’Π°ΠΆΠ½ΠΎΠΉ характСристикой химичСских соСдинСний являСтся ΠΈΡ… биологичСская Π°ΠΊΡ‚ΠΈΠ²Π½ΠΎΡΡ‚ΡŒ, ΠΏΠΎΡΠΊΠΎΠ»ΡŒΠΊΡƒ Π΅Π΅ Π½Π°Π»ΠΈΡ‡ΠΈΠ΅ ΠΌΠΎΠΆΠ΅Ρ‚ ΡΡ‚Π°Ρ‚ΡŒ основой для использования вСщСства Π² тСрапСвтичСских цСлях, Π»ΠΈΠ±ΠΎ, Π½Π°ΠΏΡ€ΠΎΡ‚ΠΈΠ², ΠΎΠ³Ρ€Π°Π½ΠΈΡ‡ΠΈΡ‚ΡŒ возмоТности Π΅Π³ΠΎ практичСского примСнСния вслСдствиС проявлСния ΠΏΠΎΠ±ΠΎΡ‡Π½Ρ‹Ρ… ΠΈ токсичСских эффСктов. ΠšΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½Π°Ρ ΠΎΡ†Π΅Π½ΠΊΠ° спСктра биологичСской активности Π΄Π°Π΅Ρ‚ Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎΡΡ‚ΡŒ ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΡ‚ΡŒ Π½Π°ΠΈΠ±ΠΎΠ»Π΅Π΅ пСрспСктивныС направлСния для тСстирования фармакологичСского дСйствия ΠΊΠΎΠ½ΠΊΡ€Π΅Ρ‚Π½Ρ‹Ρ… вСщСств ΠΈ ΠΎΡ‚ΡΠ΅ΡΡ‚ΡŒ ΠΏΠΎΡ‚Π΅Π½Ρ†ΠΈΠ°Π»ΡŒΠ½ΠΎ опасныС ΠΌΠΎΠ»Π΅ΠΊΡƒΠ»Ρ‹ Π½Π° Ρ€Π°Π½Π½ΠΈΡ… стадиях исслСдований. Π‘Π²Ρ‹ΡˆΠ΅ 25 Π»Π΅Ρ‚ Π½Π°ΠΌΠΈ осущСствляСтся Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΊΠ° ΠΈ ΡΠΎΠ²Π΅Ρ€ΡˆΠ΅Π½ΡΡ‚Π²ΠΎΠ²Π°Π½ΠΈΠ΅ ΠΊΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½ΠΎΠΉ ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΡ‹ PASS (Prediction of Activity Spectra for Substances), ΠΏΡ€Π΅Π΄Π½Π°Π·Π½Π°Ρ‡Π΅Π½Π½ΠΎΠΉ для прогнозирования спСктра биологичСской активности вСщСства ΠΏΠΎ структурной Ρ„ΠΎΡ€ΠΌΡƒΠ»Π΅ Π΅Π³ΠΎ ΠΌΠΎΠ»Π΅ΠΊΡƒΠ». ΠŸΡ€ΠΎΠ³Π½ΠΎΠ· осущСствляСтся Π½Π° основС Π°Π½Π°Π»ΠΈΠ·Π° зависимостСй «структура-Π°ΠΊΡ‚ΠΈΠ²Π½ΠΎΡΡ‚ΡŒΒ» для соСдинСний ΠΎΠ±ΡƒΡ‡Π°ΡŽΡ‰Π΅ΠΉ Π²Ρ‹Π±ΠΎΡ€ΠΊΠΈ, Π² настоящСС врСмя содСрТащСй ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡŽ ΠΎ структурах ΠΈ извСстных Π²ΠΈΠ΄Π°Ρ… биологичСской активности Π±ΠΎΠ»Π΅Π΅ Ρ‡Π΅ΠΌ для ΠΌΠΈΠ»Π»ΠΈΠΎΠ½Π° ΠΌΠΎΠ»Π΅ΠΊΡƒΠ». ОписаниС структуры ΠΌΠΎΠ»Π΅ΠΊΡƒΠ» органичСского соСдинСния Ρ€Π΅Π°Π»ΠΈΠ·ΠΎΠ²Π°Π½ΠΎ Π² PASS посрСдством дСскрипторов Π°Ρ‚ΠΎΠΌΠ½Ρ‹Ρ… окрСстностСй (Multilevel Neighborhoods of Atoms), ΠΏΡ€ΠΎΠ³Π½ΠΎΠ·ΠΈΡ€ΠΎΠ²Π°Π½ΠΈΠ΅ активности для Π½ΠΎΠ²Ρ‹Ρ… соСдинСний выполняСтся Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠΌ Π½Π° основС Β«Π½Π°ΠΈΠ²Π½ΠΎΠ³ΠΎ БайСсовского ΠΏΠΎΠ΄Ρ…ΠΎΠ΄Π°Β» ΠΈ зависимостСй «структура-Π°ΠΊΡ‚ΠΈΠ²Π½ΠΎΡΡ‚ΡŒΒ», выявляСмых ΠΏΡ€ΠΈ Π°Π½Π°Π»ΠΈΠ·Π΅ ΠΎΠ±ΡƒΡ‡Π°ΡŽΡ‰Π΅ΠΉ Π²Ρ‹Π±ΠΎΡ€ΠΊΠΈ. Нами созданы ΠΈ ΡΠΎΠ²Π΅Ρ€ΡˆΠ΅Π½ΡΡ‚Π²ΡƒΡŽΡ‚ΡΡ ΠΊΠ°ΠΊ Π»ΠΎΠΊΠ°Π»ΡŒΠ½Ρ‹Π΅ вСрсии ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΡ‹ PASS, Ρ‚Π°ΠΊ ΠΈ свободно доступныС Π² Π˜Π½Ρ‚Π΅Ρ€Π½Π΅Ρ‚ Π²Π΅Π±-рСсурсы Π½Π° основС PASS (http://way2drug.com): ΠΏΡ€ΠΎΠ³Π½ΠΎΠ· Π½Π΅ΡΠΊΠΎΠ»ΡŒΠΊΠΈΡ… тысяч Π²ΠΈΠ΄ΠΎΠ² биологичСской активности (фармакологичСскиС эффСкты, молСкулярныС ΠΌΠ΅Ρ…Π°Π½ΠΈΠ·ΠΌΡ‹ дСйствия, спСцифичСская Ρ‚ΠΎΠΊΡΠΈΡ‡Π½ΠΎΡΡ‚ΡŒ ΠΈ ΠΏΠΎΠ±ΠΎΡ‡Π½ΠΎΠ΅ дСйствиС, ΠΌΠ΅Ρ‚Π°Π±ΠΎΠ»ΠΈΠ·ΠΌ, Π° Ρ‚Π°ΠΊΠΆΠ΅ влияниС Π½Π° Π½Π΅ΠΆΠ΅Π»Π°Ρ‚Π΅Π»ΡŒΠ½Ρ‹Π΅ мишСни, молСкулярный транспорт, Π³Π΅Π½Π½ΡƒΡŽ ΡΠΊΡΠΏΡ€Π΅ΡΡΠΈΡŽ), ΠΏΡ€ΠΎΠ³Π½ΠΎΠ· цитотоксичности для ΠΎΠΏΡƒΡ…ΠΎΠ»Π΅Π²Ρ‹Ρ… ΠΈ Π½Π΅ΠΎΠΏΡƒΡ…ΠΎΠ»Π΅Π²Ρ‹Ρ… ΠΊΠ»Π΅Ρ‚ΠΎΡ‡Π½Ρ‹Ρ… Π»ΠΈΠ½ΠΈΠΉ, ΠΏΡ€ΠΎΠ³Π½ΠΎΠ· канцСрогСнности, ΠΏΡ€ΠΎΠ³Π½ΠΎΠ· ΠΈΠ½Π΄ΡƒΡ†ΠΈΡ€ΠΎΠ²Π°Π½Π½Ρ‹Ρ… органичСскими соСдинСниями ΠΈΠ·ΠΌΠ΅Π½Π΅Π½ΠΈΠΉ ΠΏΡ€ΠΎΡ„ΠΈΠ»Π΅ΠΉ экспрСссии Π³Π΅Π½ΠΎΠ², ΠΏΡ€ΠΎΠ³Π½ΠΎΠ· взаимодСйствия с Ρ„Π΅Ρ€ΠΌΠ΅Π½Ρ‚Π°ΠΌΠΈ ΠΌΠ΅Ρ‚Π°Π±ΠΎΠ»ΠΈΠ·ΠΌΠ° лСкарств, Π² Ρ‚ΠΎΠΌ числС ΠΏΡ€ΠΎΠ³Π½ΠΎΠ· сайтов ΠΌΠ΅Ρ‚Π°Π±ΠΎΠ»ΠΈΠ·ΠΌΠ°, Π° Ρ‚Π°ΠΊΠΆΠ΅ ΠΏΡ€ΠΎΠ³Π½ΠΎΠ· принадлСТности ΠΊ субстратам ΠΈ/ΠΈΠ»ΠΈ ΠΌΠ΅Ρ‚Π°Π±ΠΎΠ»ΠΈΡ‚Π°ΠΌ этих Ρ„Π΅Ρ€ΠΌΠ΅Π½Ρ‚ΠΎΠ². Π’Π΅Π±-рСсурс Way2Drug ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡŽΡ‚ ΡΠ²Ρ‹ΡˆΠ΅ 19 тысяч исслСдоватСлСй Π±ΠΎΠ»Π΅Π΅ Ρ‡Π΅ΠΌ ΠΈΠ· 100 стран ΠΌΠΈΡ€Π°, Ρ‡Ρ‚ΠΎ ΠΏΠΎΠ·Π²ΠΎΠ»ΠΈΠ»ΠΎ ΠΈΠΌ ΠΎΡΡƒΡ‰Π΅ΡΡ‚Π²ΠΈΡ‚ΡŒ ΡΠ²Ρ‹ΡˆΠ΅ 600 тысяч ΠΏΡ€ΠΎΠ³Π½ΠΎΠ·ΠΎΠ² ΠΈ ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Ρ‚ΡŒ ΠΎΠΊΠΎΠ»ΠΎ 500 Ρ€Π°Π±ΠΎΡ‚ с описаниСм ΠΏΠΎΠ»ΡƒΡ‡Π΅Π½Π½Ρ‹Ρ… Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΎΠ². Анализ ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π½Π½Ρ‹Ρ… Ρ€Π°Π±ΠΎΡ‚ ΠΏΠΎΠΊΠ°Π·Ρ‹Π²Π°Π΅Ρ‚, Ρ‡Ρ‚ΠΎ Π² Π½Π΅ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… случаях приводимая Π°Π²Ρ‚ΠΎΡ€Π°ΠΌΠΈ этих ΠΏΡƒΠ±Π»ΠΈΠΊΠ°Ρ†ΠΈΠΉ интСрпрСтация Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΎΠ² ΠΏΡ€ΠΎΠ³Π½ΠΎΠ·Π° Ρ‚Ρ€Π΅Π±ΡƒΠ΅Ρ‚ ΠΊΠΎΡ€Ρ€Π΅ΠΊΡ‚ΠΈΡ€ΠΎΠ²ΠΊΠΈ. Π’ Ρ€Π°ΠΌΠΊΠ°Ρ… настоящСй Ρ€Π°Π±ΠΎΡ‚Ρ‹ ΠΌΡ‹ прСдставим тСорСтичСскоС обоснованиС ΠΈ рассмотрим Π½Π° ΠΊΠΎΠ½ΠΊΡ€Π΅Ρ‚Π½Ρ‹Ρ… ΠΏΡ€ΠΈΠΌΠ΅Ρ€Π°Ρ… возмоТности ΠΈ ограничСния ΠΊΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½ΠΎΠ³ΠΎ прогнозирования спСктров биологичСской активности
    corecore