289 research outputs found

    Clustering System and Clustering Support Vector Machine for Local Protein Structure Prediction

    Get PDF
    Protein tertiary structure plays a very important role in determining its possible functional sites and chemical interactions with other related proteins. Experimental methods to determine protein structure are time consuming and expensive. As a result, the gap between protein sequence and its structure has widened substantially due to the high throughput sequencing techniques. Problems of experimental methods motivate us to develop the computational algorithms for protein structure prediction. In this work, the clustering system is used to predict local protein structure. At first, recurring sequence clusters are explored with an improved K-means clustering algorithm. Carefully constructed sequence clusters are used to predict local protein structure. After obtaining the sequence clusters and motifs, we study how sequence variation for sequence clusters may influence its structural similarity. Analysis of the relationship between sequence variation and structural similarity for sequence clusters shows that sequence clusters with tight sequence variation have high structural similarity and sequence clusters with wide sequence variation have poor structural similarity. Based on above knowledge, the established clustering system is used to predict the tertiary structure for local sequence segments. Test results indicate that highest quality clusters can give highly reliable prediction results and high quality clusters can give reliable prediction results. In order to improve the performance of the clustering system for local protein structure prediction, a novel computational model called Clustering Support Vector Machines (CSVMs) is proposed. In our previous work, the sequence-to-structure relationship with the K-means algorithm has been explored by the conventional K-means algorithm. The K-means clustering algorithm may not capture nonlinear sequence-to-structure relationship effectively. As a result, we consider using Support Vector Machine (SVM) to capture the nonlinear sequence-to-structure relationship. However, SVM is not favorable for huge datasets including millions of samples. Therefore, we propose a novel computational model called CSVMs. Taking advantage of both the theory of granular computing and advanced statistical learning methodology, CSVMs are built specifically for each information granule partitioned intelligently by the clustering algorithm. Compared with the clustering system introduced previously, our experimental results show that accuracy for local structure prediction has been improved noticeably when CSVMs are applied

    Computational Intelligence Based Classifier Fusion Models for Biomedical Classification Applications

    Get PDF
    The generalization abilities of machine learning algorithms often depend on the algorithms’ initialization, parameter settings, training sets, or feature selections. For instance, SVM classifier performance largely relies on whether the selected kernel functions are suitable for real application data. To enhance the performance of individual classifiers, this dissertation proposes classifier fusion models using computational intelligence knowledge to combine different classifiers. The first fusion model called T1FFSVM combines multiple SVM classifiers through constructing a fuzzy logic system. T1FFSVM can be improved by tuning the fuzzy membership functions of linguistic variables using genetic algorithms. The improved model is called GFFSVM. To better handle uncertainties existing in fuzzy MFs and in classification data, T1FFSVM can also be improved by applying type-2 fuzzy logic to construct a type-2 fuzzy classifier fusion model (T2FFSVM). T1FFSVM, GFFSVM, and T2FFSVM use accuracy as a classifier performance measure. AUC (the area under an ROC curve) is proved to be a better classifier performance metric. As a comparison study, AUC-based classifier fusion models are also proposed in the dissertation. The experiments on biomedical datasets demonstrate promising performance of the proposed classifier fusion models comparing with the individual composing classifiers. The proposed classifier fusion models also demonstrate better performance than many existing classifier fusion methods. The dissertation also studies one interesting phenomena in biology domain using machine learning and classifier fusion methods. That is, how protein structures and sequences are related each other. The experiments show that protein segments with similar structures also share similar sequences, which add new insights into the existing knowledge on the relation between protein sequences and structures: similar sequences share high structure similarity, but similar structures may not share high sequence similarity

    Application of machine learning and deep learning for proteomics data analysis

    Get PDF

    Mining complex trees for hidden fruit : a graph–based computational solution to detect latent criminal networks : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Information Technology at Massey University, Albany, New Zealand.

    Get PDF
    The detection of crime is a complex and difficult endeavour. Public and private organisations – focusing on law enforcement, intelligence, and compliance – commonly apply the rational isolated actor approach premised on observability and materiality. This is manifested largely as conducting entity-level risk management sourcing ‘leads’ from reactive covert human intelligence sources and/or proactive sources by applying simple rules-based models. Focusing on discrete observable and material actors simply ignores that criminal activity exists within a complex system deriving its fundamental structural fabric from the complex interactions between actors - with those most unobservable likely to be both criminally proficient and influential. The graph-based computational solution developed to detect latent criminal networks is a response to the inadequacy of the rational isolated actor approach that ignores the connectedness and complexity of criminality. The core computational solution, written in the R language, consists of novel entity resolution, link discovery, and knowledge discovery technology. Entity resolution enables the fusion of multiple datasets with high accuracy (mean F-measure of 0.986 versus competitors 0.872), generating a graph-based expressive view of the problem. Link discovery is comprised of link prediction and link inference, enabling the high-performance detection (accuracy of ~0.8 versus relevant published models ~0.45) of unobserved relationships such as identity fraud. Knowledge discovery uses the fused graph generated and applies the “GraphExtract” algorithm to create a set of subgraphs representing latent functional criminal groups, and a mesoscopic graph representing how this set of criminal groups are interconnected. Latent knowledge is generated from a range of metrics including the “Super-broker” metric and attitude prediction. The computational solution has been evaluated on a range of datasets that mimic an applied setting, demonstrating a scalable (tested on ~18 million node graphs) and performant (~33 hours runtime on a non-distributed platform) solution that successfully detects relevant latent functional criminal groups in around 90% of cases sampled and enables the contextual understanding of the broader criminal system through the mesoscopic graph and associated metadata. The augmented data assets generated provide a multi-perspective systems view of criminal activity that enable advanced informed decision making across the microscopic mesoscopic macroscopic spectrum

    An intelligent recommender system based on short-term disease risk prediction for patients with chronic diseases in a telehealth environment

    Get PDF
    Clinical decisions are usually made based on the practitioners' experiences with limited support from data-centric analytic processes from medical databases. This often leads to undesirable biases, human errors and high medical costs affecting the quality of services provided to patients. Recently, the use of intelligent technologies in clinical decision making in the telehealth environment has begun to play a vital role in improving the quality of patients' lives and reducing the costs and workload involved in their daily healthcare. In the telehealth environment, patients suffering from chronic diseases such as heart disease or diabetes have to take various medical tests such as measuring blood pressure, blood sugar and blood oxygen, etc. This practice adversely affects the overall convenience and quality of their everyday living. In this PhD thesis, an effective recommender system is proposed utilizing a set of innovative disease risk prediction algorithms and models for short-term disease risk prediction to provide chronic disease patients with appropriate recommendations regarding the need to take a medical test on the coming day. The input sequence of sliding windows based on the patient's time series data, is analyzed in both the time domain and the frequency domain. The time series medical data obtained for each chronicle disease patient is partitioned into consecutive sliding windows for analysis in both the time and the frequency domains. The available time series data are readily available in time domains which can be used for analysis without any further conversion. For data analysis in the frequency domain, Fast Fourier Transformation (FFT) and Dual-Tree Complex Wavelet Transformation (DTCWT) are applied to convert the data into the frequency domain and extract the frequency information. In the time domain, four innovative predictive algorithms, Basic Heuristic Algorithm (BHA), Regression-Based Algorithm (RBA) and Hybrid Algorithm (HA) as well as a structural graph-based method (SG), are proposed to study the time series data for producing recommendations. While, in the frequency domain, three predictive classifiers, Artificial Neural Network, Least Squares-Support Vector Machine, and NaĂŻve Bayes, are used to produce the recommendations. An ensemble machine learning model is utilized to combine all the used predictive models and algorithms in both the time and frequency domains to produce the final recommendation. Two real-life telehealth datasets collected from chronic disease patients (i.e., heart disease and diabetes patients) are utilized for a comprehensive experimental evaluation in this study. The results show that the proposed system is effective in analysing time series medical data and providing accurate and reliable (very low risk) recommendations to patients suffering from chronic diseases such as heart disease and diabetes. This research work will help provide high-quality evidence-based intelligent decision support to clinical disease patients that significantly reduces workload associated with medical checkups would otherwise have to be conducted every day in a telehealth environment

    Development of a Self-Learning Approach Applied to Pattern Recognition and Fuzzy Control

    Get PDF
    Systeme auf Basis von Fuzzy-Regeln sind in der Entwicklung der Mustererkennung und Steuersystemen weit verbreitet verwendet. Die meisten aktuellen Methoden des Designs der Fuzzy-Regel-basierte Systeme leiden unter folgenden Problemen 1. Das Verfahren der Fuzzifizierung berĂŒcksichtigt weder die statistischen Eigenschaften noch reale Verteilung der betrachteten Daten / Signale nicht. Daher sind die generierten Fuzzy- Zugehörigkeitsfunktionen nicht wirklich in der Lage, diese Daten zu Ă€ußern. DarĂŒber hinaus wird der Prozess der Fuzzifizierung manuell definiert. 2. Die ursprĂŒngliche GrĂ¶ĂŸe der Regelbasis ist pauschal bestimmt. Diese Feststellung bedeutet, dass dieses Verfahren eine Redundanz in den verwendeten Regeln produzieren kann. Somit wird diese Redundanz zum Auftreten der Probleme von KomplexitĂ€t und DimensionalitĂ€t fĂŒhren. Der Prozess der Vermeidung dieser Probleme durch das Auswahlverfahren der einschlĂ€gigen Regeln kann zum Rechenaufwandsproblem fĂŒhren. 3. Die Form der Fuzzy-Regel leidet unter dem Problem des Verlusts von Informationen, was wiederum zur Zuschreibung diesen betrachteten Variablen anderen unrealen Bereich fĂŒhren kann. 4. Ferner wird die Anpassung der Fuzzy- Zugehörigkeitsfunktionen mit den Problemen von KomplexitĂ€t und Rechenaufwand, wegen der damit verbundenen Iteration und mehrerer Parameter, zugeordnet. Auch wird diese Anpassung im Bereich jeder einzelner Regel realisiert; das heißt, der Anpassungsprozess im Bereich der gesamten Fuzzy-Regelbasis wird nicht durchgefĂŒhrt

    Recent Developments in Smart Healthcare

    Get PDF
    Medicine is undergoing a sector-wide transformation thanks to the advances in computing and networking technologies. Healthcare is changing from reactive and hospital-centered to preventive and personalized, from disease focused to well-being centered. In essence, the healthcare systems, as well as fundamental medicine research, are becoming smarter. We anticipate significant improvements in areas ranging from molecular genomics and proteomics to decision support for healthcare professionals through big data analytics, to support behavior changes through technology-enabled self-management, and social and motivational support. Furthermore, with smart technologies, healthcare delivery could also be made more efficient, higher quality, and lower cost. In this special issue, we received a total 45 submissions and accepted 19 outstanding papers that roughly span across several interesting topics on smart healthcare, including public health, health information technology (Health IT), and smart medicine

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications
    • 

    corecore