563 research outputs found

    Survey analysis for optimization algorithms applied to electroencephalogram

    Get PDF
    This paper presents a survey for optimization approaches that analyze and classify Electroencephalogram (EEG) signals. The automatic analysis of EEG presents a significant challenge due to the high-dimensional data volume. Optimization algorithms seek to achieve better accuracy by selecting practical features and reducing unwanted features. Forty-seven reputable research papers are provided in this work, emphasizing the developed and executed techniques divided into seven groups based on the applied optimization algorithm particle swarm optimization (PSO), ant colony optimization (ACO), artificial bee colony (ABC), grey wolf optimizer (GWO), Bat, Firefly, and other optimizer approaches). The main measures to analyze this paper are accuracy, precision, recall, and F1-score assessment. Several datasets have been utilized in the included papers like EEG Bonn University, CHB-MIT, electrocardiography (ECG) dataset, and other datasets. The results have proven that the PSO and GWO algorithms have achieved the highest accuracy rate of around 99% compared with other techniques

    Phenotype Recognition with Combined Features and Random Subspace Classifier Ensemble

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Automated, image based high-content screening is a fundamental tool for discovery in biological science. Modern robotic fluorescence microscopes are able to capture thousands of images from massively parallel experiments such as RNA interference (RNAi) or small-molecule screens. As such, efficient computational methods are required for automatic cellular phenotype identification capable of dealing with large image data sets. In this paper we investigated an efficient method for the extraction of quantitative features from images by combining second order statistics, or Haralick features, with curvelet transform. A random subspace based classifier ensemble with multiple layer perceptron (MLP) as the base classifier was then exploited for classification. Haralick features estimate image properties related to second-order statistics based on the grey level co-occurrence matrix (GLCM), which has been extensively used for various image processing applications. The curvelet transform has a more sparse representation of the image than wavelet, thus offering a description with higher time frequency resolution and high degree of directionality and anisotropy, which is particularly appropriate for many images rich with edges and curves. A combined feature description from Haralick feature and curvelet transform can further increase the accuracy of classification by taking their complementary information. We then investigate the applicability of the random subspace (RS) ensemble method for phenotype classification based on microscopy images. A base classifier is trained with a RS sampled subset of the original feature set and the ensemble assigns a class label by majority voting.</p> <p>Results</p> <p>Experimental results on the phenotype recognition from three benchmarking image sets including HeLa, CHO and RNAi show the effectiveness of the proposed approach. The combined feature is better than any individual one in the classification accuracy. The ensemble model produces better classification performance compared to the component neural networks trained. For the three images sets HeLa, CHO and RNAi, the Random Subspace Ensembles offers the classification rates 91.20%, 98.86% and 91.03% respectively, which compares sharply with the published result 84%, 93% and 82% from a multi-purpose image classifier WND-CHARM which applied wavelet transforms and other feature extraction methods. We investigated the problem of estimation of ensemble parameters and found that satisfactory performance improvement could be brought by a relative medium dimensionality of feature subsets and small ensemble size.</p> <p>Conclusions</p> <p>The characteristics of curvelet transform of being multiscale and multidirectional suit the description of microscopy images very well. It is empirically demonstrated that the curvelet-based feature is clearly preferred to wavelet-based feature for bioimage descriptions. The random subspace ensemble of MLPs is much better than a number of commonly applied multi-class classifiers in the investigated application of phenotype recognition.</p

    Identifying stochastic oscillations in single-cell live imaging time series using Gaussian processes

    Full text link
    Multiple biological processes are driven by oscillatory gene expression at different time scales. Pulsatile dynamics are thought to be widespread, and single-cell live imaging of gene expression has lead to a surge of dynamic, possibly oscillatory, data for different gene networks. However, the regulation of gene expression at the level of an individual cell involves reactions between finite numbers of molecules, and this can result in inherent randomness in expression dynamics, which blurs the boundaries between aperiodic fluctuations and noisy oscillators. Thus, there is an acute need for an objective statistical method for classifying whether an experimentally derived noisy time series is periodic. Here we present a new data analysis method that combines mechanistic stochastic modelling with the powerful methods of non-parametric regression with Gaussian processes. Our method can distinguish oscillatory gene expression from random fluctuations of non-oscillatory expression in single-cell time series, despite peak-to-peak variability in period and amplitude of single-cell oscillations. We show that our method outperforms the Lomb-Scargle periodogram in successfully classifying cells as oscillatory or non-oscillatory in data simulated from a simple genetic oscillator model and in experimental data. Analysis of bioluminescent live cell imaging shows a significantly greater number of oscillatory cells when luciferase is driven by a {\it Hes1} promoter (10/19), which has previously been reported to oscillate, than the constitutive MoMuLV 5' LTR (MMLV) promoter (0/25). The method can be applied to data from any gene network to both quantify the proportion of oscillating cells within a population and to measure the period and quality of oscillations. It is publicly available as a MATLAB package.Comment: 36 pages, 17 figure

    Machine Learning and Deep Learning Approaches for Brain Disease Diagnosis : Principles and Recent Advances

    Get PDF
    This work was supported in part by the National Research Foundation of Korea-Grant funded by the Korean Government (Ministry of Science and ICT) under Grant NRF 2020R1A2B5B02002478, and in part by Sejong University through its Faculty Research Program under Grant 20212023.Peer reviewedPublisher PD

    Bioinformatics tools for cancer metabolomics

    Get PDF
    It is well known that significant metabolic change take place as cells are transformed from normal to malignant. This review focuses on the use of different bioinformatics tools in cancer metabolomics studies. The article begins by describing different metabolomics technologies and data generation techniques. Overview of the data pre-processing techniques is provided and multivariate data analysis techniques are discussed and illustrated with case studies, including principal component analysis, clustering techniques, self-organizing maps, partial least squares, and discriminant function analysis. Also included is a discussion of available software packages

    Statistical Data Modeling and Machine Learning with Applications

    Get PDF
    The modeling and processing of empirical data is one of the main subjects and goals of statistics. Nowadays, with the development of computer science, the extraction of useful and often hidden information and patterns from data sets of different volumes and complex data sets in warehouses has been added to these goals. New and powerful statistical techniques with machine learning (ML) and data mining paradigms have been developed. To one degree or another, all of these techniques and algorithms originate from a rigorous mathematical basis, including probability theory and mathematical statistics, operational research, mathematical analysis, numerical methods, etc. Popular ML methods, such as artificial neural networks (ANN), support vector machines (SVM), decision trees, random forest (RF), among others, have generated models that can be considered as straightforward applications of optimization theory and statistical estimation. The wide arsenal of classical statistical approaches combined with powerful ML techniques allows many challenging and practical problems to be solved. This Special Issue belongs to the section “Mathematics and Computer Science”. Its aim is to establish a brief collection of carefully selected papers presenting new and original methods, data analyses, case studies, comparative studies, and other research on the topic of statistical data modeling and ML as well as their applications. Particular attention is given, but is not limited, to theories and applications in diverse areas such as computer science, medicine, engineering, banking, education, sociology, economics, among others. The resulting palette of methods, algorithms, and applications for statistical modeling and ML presented in this Special Issue is expected to contribute to the further development of research in this area. We also believe that the new knowledge acquired here as well as the applied results are attractive and useful for young scientists, doctoral students, and researchers from various scientific specialties

    Novel Computer-Aided Diagnosis Schemes for Radiological Image Analysis

    Get PDF
    The computer-aided diagnosis (CAD) scheme is a powerful tool in assisting clinicians (e.g., radiologists) to interpret medical images more accurately and efficiently. In developing high-performing CAD schemes, classic machine learning (ML) and deep learning (DL) algorithms play an essential role because of their advantages in capturing meaningful patterns that are important for disease (e.g., cancer) diagnosis and prognosis from complex datasets. This dissertation, organized into four studies, investigates the feasibility of developing several novel ML-based and DL-based CAD schemes for different cancer research purposes. The first study aims to develop and test a unique radiomics-based CT image marker that can be used to detect lymph node (LN) metastasis for cervical cancer patients. A total of 1,763 radiomics features were first computed from the segmented primary cervical tumor depicted on one CT image with the maximal tumor region. Next, a principal component analysis algorithm was applied on the initial feature pool to determine an optimal feature cluster. Then, based on this optimal cluster, machine learning models (e.g., support vector machine (SVM)) were trained and optimized to generate an image marker to detect LN metastasis. The SVM based imaging marker achieved an AUC (area under the ROC curve) value of 0.841 ± 0.035. This study initially verifies the feasibility of combining CT images and the radiomics technology to develop a low-cost image marker for LN metastasis detection among cervical cancer patients. In the second study, the purpose is to develop and evaluate a unique global mammographic image feature analysis scheme to identify case malignancy for breast cancer. From the entire breast area depicted on the mammograms, 59 features were initially computed to characterize the breast tissue properties in both the spatial and frequency domain. Given that each case consists of two cranio-caudal and two medio-lateral oblique view images of left and right breasts, two feature pools were built, which contain the computed features from either two positive images of one breast or all the four images of two breasts. For each feature pool, a particle swarm optimization (PSO) method was applied to determine the optimal feature cluster followed by training an SVM classifier to generate a final score for predicting likelihood of the case being malignant. The classification performances measured by AUC were 0.79±0.07 and 0.75±0.08 when applying the SVM classifiers trained using image features computed from two-view and four-view images, respectively. This study demonstrates the potential of developing a global mammographic image feature analysis-based scheme to predict case malignancy without including an arduous segmentation of breast lesions. In the third study, given that the performance of DL-based models in the medical imaging field is generally bottlenecked by a lack of sufficient labeled images, we specifically investigate the effectiveness of applying the latest transferring generative adversarial networks (GAN) technology to augment limited data for performance boost in the task of breast mass classification. This transferring GAN model was first pre-trained on a dataset of 25,000 mammogram patches (without labels). Then its generator and the discriminator were fine-tuned on a much smaller dataset containing 1024 labeled breast mass images. A supervised loss was integrated with the discriminator, such that it can be used to directly classify the benign/malignant masses. Our proposed approach improved the classification accuracy by 6.002%, when compared with the classifiers trained without traditional data augmentation. This investigation may provide a new perspective for researchers to effectively train the GAN models on a medical imaging task with only limited datasets. Like the third study, our last study also aims to alleviate DL models’ reliance on large amounts of annotations but uses a totally different approach. We propose employing a semi-supervised method, i.e., virtual adversarial training (VAT), to learn and leverage useful information underlying in unlabeled data for better classification of breast masses. Accordingly, our VAT-based models have two types of losses, namely supervised and virtual adversarial losses. The former loss acts as in supervised classification, while the latter loss works towards enhancing the model’s robustness against virtual adversarial perturbation, thus improving model generalizability. A large CNN and a small CNN were used in this investigation, and both were trained with and without the adversarial loss. When the labeled ratios were 40% and 80%, VAT-based CNNs delivered the highest classification accuracy of 0.740±0.015 and 0.760±0.015, respectively. The experimental results suggest that the VAT-based CAD scheme can effectively utilize meaningful knowledge from unlabeled data to better classify mammographic breast mass images. In summary, several innovative approaches have been investigated and evaluated in this dissertation to develop ML-based and DL-based CAD schemes for the diagnosis of cervical cancer and breast cancer. The promising results demonstrate the potential of these CAD schemes in assisting radiologists to achieve a more accurate interpretation of radiological images
    corecore