Search CORE

3,014 research outputs found

An Overview of the Use of Neural Networks for Data Mining Tasks

Author: Alberts B
Alpaydin E
Ando T
Blake CL
Bramer MA
Castanheira LG
Han J
Lu H
Mitchell M
Ni X
Quinlan RJ
Rumelhart DE
Shafer JC
Shendure J
Simić D
Stahl F
Steinwart I
Surjandari I
Wei JS
Widrow B
Witten IH
Zaslavsky B
Zhang D
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks

Central Archive at the University of Reading

Crossref

Portsmouth University Research Portal (Pure)

Bournemouth University Research Online

On the hierarchical classification of G Protein-Coupled Receptors

Author: A. A. Freitas
A. Secker
Attwood
Bhasin
Bhasin
Bissantz
Cardoso
Christopoulos
D. R. Flower
Das
Davies
Flower
Flower
Foord
Gether
Gloriam
Guo
Horn
H bert
J. Timmis
Karchin
Keerthi
Klabunde
Kolakowski
Lapinsh
M. Mendao
M. N. Davies
Milligan
Papasaikas
Prabhu
Sandberg
Schi th
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/10/2007
Field of study

Motivation: G protein-coupled receptors (GPCRs) play an important role in many physiological systems by transducing an extracellular signal into an intracellular response. Over 50% of all marketed drugs are targeted towards a GPCR. There is considerable interest in developing an algorithm that could effectively predict the function of a GPCR from its primary sequence. Such an algorithm is useful not only in identifying novel GPCR sequences but in characterizing the interrelationships between known GPCRs. Results: An alignment-free approach to GPCR classification has been developed using techniques drawn from data mining and proteochemometrics. A dataset of over 8000 sequences was constructed to train the algorithm. This represents one of the largest GPCR datasets currently available. A predictive algorithm was developed based upon the simplest reasonable numerical representation of the protein's physicochemical properties. A selective top-down approach was developed, which used a hierarchical classifier to assign sequences to subdivisions within the GPCR hierarchy. The predictive performance of the algorithm was assessed against several standard data mining classifiers and further validated against Support Vector Machine-based GPCR prediction servers. The selective top-down approach achieves significantly higher accuracy than standard data mining methods in almost all cases

CiteSeerX

Crossref

Aberystwyth Research Portal

Kent Academic Repository

Mining Images in Biomedical Publications: Detection and Analysis of Gel Diagrams

Author: Krauthammer Michael
Kuhn Tobias
Luong ThaiBinh
Nagy Mate Levente
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Authors of biomedical publications use gel images to report experimental results such as protein-protein interactions or protein expressions under different conditions. Gel images offer a concise way to communicate such findings, not all of which need to be explicitly discussed in the article text. This fact together with the abundance of gel images and their shared common patterns makes them prime candidates for automated image mining and parsing. We introduce an approach for the detection of gel images, and present a workflow to analyze them. We are able to detect gel segments and panels at high accuracy, and present preliminary results for the identification of gene names in these images. While we cannot provide a complete solution at this point, we present evidence that this kind of image mining is feasible.Comment: arXiv admin note: substantial text overlap with arXiv:1209.148

arXiv.org e-Print Archive

Repository for Publications and Research Data

Springer - Publisher Connector

PubMed Central

Predicting protein function by machine learning on amino acid sequences – a critical evaluation

Author: Al-Shahib A
Breitling R
Gilbert D
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Copyright @ 2007 Al-Shahib et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. Results: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. Conclusion: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function

University of Groningen

University of Birmingham Research Portal

Directory of Open Access Journals

Enlighten

The University of Manchester - Institutional Repository

Brunel University Research Archive

Crossref

Proceedings - University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

PubMed Central

University of Groningen Digital Archive

Dissertations of the University of Groningen

Investigating time series classification techniques for rapid pathogen identification with single-cell MALDI-TOF mass spectrum data

Author: Papagiannopoulou Christina
Parchen René
Waegeman Willem
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Geometric Path Integrals. A Language for Multiscale Biology and Systems Robustness

Author: A. Altland
A. Minelli
A.J. Izenman
D. Houle
E.O. Voit
G.A. Wray
H. Kitano
H. Kleinert
J. Bochnak
J. Gunawardena
J.M. Irish
L. Ehrenpreis
L. Ehrenpreis
L.S. Schulman
M. Kashiwara
N. Kumano-go
R.P. Feynman
S. Basu
T. Hastie
V. Calvert
V.I. Arnold
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/08/2011
Field of study

In this paper we suggest that, under suitable conditions, supervised learning can provide the basis to formulate at the microscopic level quantitative questions on the phenotype structure of multicellular organisms. The problem of explaining the robustness of the phenotype structure is rephrased as a real geometrical problem on a fixed domain. We further suggest a generalization of path integrals that reduces the problem of deciding whether a given molecular network can generate specific phenotypes to a numerical property of a robustness function with complex output, for which we give heuristic justification. Finally, we use our formalism to interpret a pointedly quantitative developmental biology problem on the allowed number of pairs of legs in centipedes

arXiv.org e-Print Archive

Crossref

A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

Author: Adibi N
Ahmadzadeh MR
Barati E
Mohammadi A
Saraee MH
Publication venue: Cyber Journals
Publication date: 01/03/2011
Field of study

Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

University of Salford Institutional Repository

Recommended from our members

Comparison of Feature Selection and Classification for MALDI-MS Data

Author: Chen Zhongxue
Deng Youping
Huang Xudong
Liu Qingzhong
Qiao Mengyu
Sung Andrew H
Yang Jack Y
Yang Mary Qu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/03/2011
Field of study

Introduction: In the classification of Mass Spectrometry (MS) proteomics data, peak detection, feature selection, and learning classifiers are critical to classification accuracy. To better understand which methods are more accurate when classifying data, some publicly available peak detection algorithms for Matrix assisted Laser Desorption Ionization Mass Spectrometry (MALDI-MS) data were recently compared; however, the issue of different feature selection methods and different classification models as they relate to classification performance has not been addressed. With the application of intelligent computing, much progress has been made in the development of feature selection methods and learning classifiers for the analysis of high-throughput biological data. The main objective of this paper is to compare the methods of feature selection and different learning classifiers when applied to MALDI-MS data and to provide a subsequent reference for the analysis of MS proteomics data. Results: We compared a well-known method of feature selection, Support Vector Machine Recursive Feature Elimination (SVMRFE), and a recently developed method, Gradient based Leave-one-out Gene Selection (GLGS) that effectively performs microarray data analysis. We also compared several learning classifiers including K-Nearest Neighbor Classifier (KNNC), Naïve Bayes Classifier (NBC), Nearest Mean Scaled Classifier (NMSC), uncorrelated normal based quadratic Bayes Classifier recorded as UDC, Support Vector Machines, and a distance metric learning for Large Margin Nearest Neighbor classifier (LMNN) based on Mahanalobis distance. To compare, we conducted a comprehensive experimental study using three types of MALDI-MS data. Conclusion: Regarding feature selection, SVMRFE outperformed GLGS in classification. As for the learning classifiers, when classification models derived from the best training were compared, SVMs performed the best with respect to the expected testing accuracy. However, the distance metric learning LMNN outperformed SVMs and other classifiers on evaluating the best testing. In such cases, the optimum classification model based on LMNN is worth investigating for future study

Harvard University - DASH