832 research outputs found

    Outcome prediction based on microarray analysis: a critical perspective on methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Information extraction from microarrays has not yet been widely used in diagnostic or prognostic decision-support systems, due to the diversity of results produced by the available techniques, their instability on different data sets and the inability to relate statistical significance with biological relevance. Thus, there is an urgent need to address the statistical framework of microarray analysis and identify its drawbacks and limitations, which will enable us to thoroughly compare methodologies under the same experimental set-up and associate results with confidence intervals meaningful to clinicians. In this study we consider gene-selection algorithms with the aim to reveal inefficiencies in performance evaluation and address aspects that can reduce uncertainty in algorithmic validation.</p> <p>Results</p> <p>A computational study is performed related to the performance of several gene selection methodologies on publicly available microarray data. Three basic types of experimental scenarios are evaluated, i.e. the independent test-set and the 10-fold cross-validation (CV) using maximum and average performance measures. Feature selection methods behave differently under different validation strategies. The performance results from CV do not mach well those from the independent test-set, except for the support vector machines (SVM) and the least squares SVM methods. However, these wrapper methods achieve variable (often low) performance, whereas the hybrid methods attain consistently higher accuracies. The use of an independent test-set within CV is important for the evaluation of the predictive power of algorithms. The optimal size of the selected gene-set also appears to be dependent on the evaluation scheme. The consistency of selected genes over variation of the training-set is another aspect important in reducing uncertainty in the evaluation of the derived gene signature. In all cases the presence of outlier samples can seriously affect algorithmic performance.</p> <p>Conclusion</p> <p>Multiple parameters can influence the selection of a gene-signature and its predictive power, thus possible biases in validation methods must always be accounted for. This paper illustrates that independent test-set evaluation reduces the bias of CV, and case-specific measures reveal stability characteristics of the gene-signature over changes of the training set. Moreover, frequency measures on gene selection address the algorithmic consistency in selecting the same gene signature under different training conditions. These issues contribute to the development of an objective evaluation framework and aid the derivation of statistically consistent gene signatures that could eventually be correlated with biological relevance. The benefits of the proposed framework are supported by the evaluation results and methodological comparisons performed for several gene-selection algorithms on three publicly available datasets.</p

    Analysis of Retinal Image Data to Support Glaucoma Diagnosis

    Get PDF
    Fundus kamera je široce dostupné zobrazovací zařízení, které umožňuje relativně rychlé a nenákladné vyšetření zadního segmentu oka – sítnice. Z těchto důvodů se mnoho výzkumných pracovišť zaměřuje právě na vývoj automatických metod diagnostiky nemocí sítnice s využitím fundus fotografií. Tato dizertační práce analyzuje současný stav vědeckého poznání v oblasti diagnostiky glaukomu s využitím fundus kamery a navrhuje novou metodiku hodnocení vrstvy nervových vláken (VNV) na sítnici pomocí texturní analýzy. Spolu s touto metodikou je navržena metoda segmentace cévního řečiště sítnice, jakožto další hodnotný příspěvek k současnému stavu řešené problematiky. Segmentace cévního řečiště rovněž slouží jako nezbytný krok předcházející analýzu VNV. Vedle toho práce publikuje novou volně dostupnou databázi snímků sítnice se zlatými standardy pro účely hodnocení automatických metod segmentace cévního řečiště.Fundus camera is widely available imaging device enabling fast and cheap examination of the human retina. Hence, many researchers focus on development of automatic methods towards assessment of various retinal diseases via fundus images. This dissertation summarizes recent state-of-the-art in the field of glaucoma diagnosis using fundus camera and proposes a novel methodology for assessment of the retinal nerve fiber layer (RNFL) via texture analysis. Along with it, a method for the retinal blood vessel segmentation is introduced as an additional valuable contribution to the recent state-of-the-art in the field of retinal image processing. Segmentation of the blood vessels also serves as a necessary step preceding evaluation of the RNFL via the proposed methodology. In addition, a new publicly available high-resolution retinal image database with gold standard data is introduced as a novel opportunity for other researches to evaluate their segmentation algorithms.

    Air quality forecasting using neural networks

    Get PDF
    In this thesis project, a special type of neural network: Extreme Learning Machine (ELM) is implemented to predict the air quality based on the air quality time series itself and the external meteorological records. A regularized version of ELM with linear components is chosen to be the main model for prediction. To take full advantage of this model, its hyper-parameters are studied and optimized. Then a set of variables is selected (or constructed) to maximize the performance of ELM, where two different variable selection methods (i.e. wrapper and filtering methods) are evaluated. The wrapper method ELM-based forward selection is chosen for the variable selection. Meanwhile, a feature extraction method (Principal Component Analysis) is implemented in the hope of reducing the candidate meteorological variables for feature selection, which proves to be helpful. At last, with all the parameters being properly optimized, ELM is used for the prediction and generates satisfying results

    Algorithms for Neural Prosthetic Applications

    Get PDF
    abstract: In the last 15 years, there has been a significant increase in the number of motor neural prostheses used for restoring limb function lost due to neurological disorders or accidents. The aim of this technology is to enable patients to control a motor prosthesis using their residual neural pathways (central or peripheral). Recent studies in non-human primates and humans have shown the possibility of controlling a prosthesis for accomplishing varied tasks such as self-feeding, typing, reaching, grasping, and performing fine dexterous movements. A neural decoding system comprises mainly of three components: (i) sensors to record neural signals, (ii) an algorithm to map neural recordings to upper limb kinematics and (iii) a prosthetic arm actuated by control signals generated by the algorithm. Machine learning algorithms that map input neural activity to the output kinematics (like finger trajectory) form the core of the neural decoding system. The choice of the algorithm is thus, mainly imposed by the neural signal of interest and the output parameter being decoded. The various parts of a neural decoding system are neural data, feature extraction, feature selection, and machine learning algorithm. There have been significant advances in the field of neural prosthetic applications. But there are challenges for translating a neural prosthesis from a laboratory setting to a clinical environment. To achieve a fully functional prosthetic device with maximum user compliance and acceptance, these factors need to be addressed and taken into consideration. Three challenges in developing robust neural decoding systems were addressed by exploring neural variability in the peripheral nervous system for dexterous finger movements, feature selection methods based on clinically relevant metrics and a novel method for decoding dexterous finger movements based on ensemble methods.Dissertation/ThesisDoctoral Dissertation Bioengineering 201

    Survey of data mining approaches to user modeling for adaptive hypermedia

    Get PDF
    The ability of an adaptive hypermedia system to create tailored environments depends mainly on the amount and accuracy of information stored in each user model. Some of the difficulties that user modeling faces are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for automatic generation of user models that simulate human decision making. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the applicatio

    Autoencoder for clinical data analysis and classification : data imputation, dimensional reduction, and pattern recognition

    Get PDF
    Over the last decade, research has focused on machine learning and data mining to develop frameworks that can improve data analysis and output performance; to build accurate decision support systems that benefit from real-life datasets. This leads to the field of clinical data analysis, which has attracted a significant amount of interest in the computing, information systems, and medical fields. To create and develop models by machine learning algorithms, there is a need for a particular type of data for the existing algorithms to build an efficient model. Clinical datasets pose several issues that can affect the classification of the dataset: missing values, high dimensionality, and class imbalance. In order to build a framework for mining the data, it is necessary first to preprocess data, by eliminating patients’ records that have too many missing values, imputing missing values, addressing high dimensionality, and classifying the data for decision support.This thesis investigates a real clinical dataset to solve their challenges. Autoencoder is employed as a tool that can compress data mining methodology, by extracting features and classifying data in one model. The first step in data mining methodology is to impute missing values, so several imputation methods are analysed and employed. Then high dimensionality is demonstrated and used to discard irrelevant and redundant features, in order to improve prediction accuracy and reduce computational complexity. Class imbalance is manipulated to investigate the effect on feature selection algorithms and classification algorithms.The first stage of analysis is to investigate the role of the missing values. Results found that techniques based on class separation will outperform other techniques in predictive ability. The next stage is to investigate the high dimensionality and a class imbalance. However it was found a small set of features that can improve the classification performance, the balancing class does not affect the performance as much as imbalance class

    Ant Colony Optimization

    Get PDF
    Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented

    Wavelet-support vector machine for forecasting palm oil price

    Get PDF
    This study examines the feasibility of applying Wavelet-Support Vector Machine (W-SVM) model in forecasting palm oil price. The conjunction method wavelet-support vector machine (W-SVM) is obtained by the integration of discrete wavelet transform (DWT) method and support vector machine (SVM). In W-SVM model, wavelet transform is used to decompose data series into two parts; approximation series and details series. This decomposed series were then used as the input to the SVM model to forecast the palm oil price. This study also utilizes the application of partial correlation-based input variable selection as the preprocessing steps in determining the best input to the model. The performance of the W-SVM model was then compared with the classical SVM model and also artificial neural network (ANN) model. The empirical result shows that the addition of wavelet technique in W-SVM model enhances the forecasting performance of classical SVM and performs better than ANN
    corecore