384 research outputs found

    Machine learning methods for omics data integration

    Get PDF
    High-throughput technologies produce genome-scale transcriptomic and metabolomic (omics) datasets that allow for the system-level studies of complex biological processes. The limitation lies in the small number of samples versus the larger number of features represented in these datasets. Machine learning methods can help integrate these large-scale omics datasets and identify key features from each dataset. A novel class dependent feature selection method integrates the F statistic, maximum relevance binary particle swarm optimization (MRBPSO), and class dependent multi-category classification (CDMC) system. A set of highly differentially expressed genes are pre-selected using the F statistic as a filter for each dataset. MRBPSO and CDMC function as a wrapper to select desirable feature subsets for each class and classify the samples using those chosen class-dependent feature subsets. The results indicate that the class-dependent approaches can effectively identify unique biomarkers for each cancer type and improve classification accuracy compared to class independent feature selection methods. The integration of transcriptomics and metabolomics data is based on a classification framework. Compared to principal component analysis and non-negative matrix factorization based integration approaches, our proposed method achieves 20-30% higher prediction accuracies on Arabidopsis tissue development data. Metabolite-predictive genes and gene-predictive metabolites are selected from transcriptomic and metabolomic data respectively. The constructed gene-metabolite correlation network can infer the functions of unknown genes and metabolites. Tissue-specific genes and metabolites are identified by the class-dependent feature selection method. Evidence from subcellular locations, gene ontology, and biochemical pathways support the involvement of these entities in different developmental stages and tissues in Arabidopsis

    Optimized Hidden Markov Model based on Constrained Particle Swarm Optimization

    Get PDF
    International audienceAs one of Bayesian analysis tools, Hidden Markov Model (HMM) has been used to in extensive applications. Most HMMs are solved by Baum-Welch algorithm (BWHMM) to predict the model parameters, which is difficult to find global optimal solutions. This paper proposes an optimized Hidden Markov Model with Particle Swarm Optimization (PSO) algorithm and so is called PSOHMM. In order to overcome the statistical constraints in HMM, the paper develops re-normalization and re-mapping mechanisms to ensure the constraints in HMM. The experiments have shown that PSOHMM can search better solution than BWHMM, and has faster convergence speed

    Mutable composite firefly algorithm for gene selection in microarray based cancer classification

    Get PDF
    Cancer classification is critical due to the strenuous effort required in cancer treatment and the rising cancer mortality rate. Recent trends with high throughput technologies have led to discoveries in terms of biomarkers that successfully contributed to cancerrelated issues. A computational approach for gene selection based on microarray data analysis has been applied in many cancer classification problems. However, the existing hybrid approaches with metaheuristic optimization algorithms in feature selection (specifically in gene selection) are not generalized enough to efficiently classify most cancer microarray data while maintaining a small set of genes. This leads to the classification accuracy and genes subset size problem. Hence, this study proposed to modify the Firefly Algorithm (FA) along with the Correlation-based Feature Selection (CFS) filter for the gene selection task. An improved FA was proposed to overcome FA slow convergence by generating mutable size solutions for the firefly population. In addition, a composite position update strategy was designed for the mutable size solutions. The proposed strategy was to balance FA exploration and exploitation in order to address the local optima problem. The proposed hybrid algorithm known as CFS-Mutable Composite Firefly Algorithm (CFS-MCFA) was evaluated on cancer microarray data for biomarker selection along with the deployment of Support Vector Machine (SVM) as the classifier. Evaluation was performed based on two metrics: classification accuracy and size of feature set. The results showed that the CFS-MCFA-SVM algorithm outperforms benchmark methods in terms of classification accuracy and genes subset size. In particular, 100 percent accuracy was achieved on all four datasets and with only a few biomarkers (between one and four). This result indicates that the proposed algorithm is one of the competitive alternatives in feature selection, which later contributes to the analysis of microarray data

    Storage Capacity Estimation of Commercial Scale Injection and Storage of CO2 in the Jacksonburg-Stringtown Oil Field, West Virginia

    Get PDF
    Geological capture, utilization and storage (CCUS) of carbon dioxide (CO2) in depleted oil and gas reservoirs is one method to reduce greenhouse gas emissions with enhanced oil recovery (EOR) and extending the life of the field. Therefore CCUS coupled with EOR is considered to be an economic approach to demonstration of commercial-scale injection and storage of anthropogenic CO2. Several critical issues should be taken into account prior to injecting large volumes of CO2, such as storage capacity, project duration and long-term containment. Reservoir characterization and 3D geological modeling are the best way to estimate the theoretical CO 2 storage capacity in mature oil fields. The Jacksonburg-Stringtown field, located in northwestern West Virginia, has produced over 22 million barrels of oil (MMBO) since 1895. The sandstone of the Late Devonian Gordon Stray is the primary reservoir.;The Upper Devonian fluvial sandstone reservoirs in Jacksonburg-Stringtown oil field, which has produced over 22 million barrels of oil since 1895, are an ideal candidate for CO2 sequestration coupled with EOR. Supercritical depth (\u3e2500 ft.), minimum miscible pressure (941 psi), favorable API gravity (46.5°) and good water flood response are indicators that facilitate CO 2-EOR operations. Moreover, Jacksonburg-Stringtown oil field is adjacent to a large concentration of CO2 sources located along the Ohio River that could potentially supply enough CO2 for sequestration and EOR without constructing new pipeline facilities.;Permeability evaluation is a critical parameter to understand the subsurface fluid flow and reservoir management for primary and enhanced hydrocarbon recovery and efficient carbon storage. In this study, a rapid, robust and cost-effective artificial neural network (ANN) model is constructed to predict permeability using the model\u27s strong ability to recognize the possible interrelationships between input and output variables. Two commonly available conventional well logs, gamma ray and bulk density, and three logs derived variables, the slope of GR, the slope of bulk density and Vsh were selected as input parameters and permeability was selected as desired output parameter to train and test an artificial neural network. The results indicate that the ANN model can be applied effectively in permeability prediction.;Porosity is another fundamental property that characterizes the storage capability of fluid and gas bearing formations in a reservoir. In this study, a support vector machine (SVM) with mixed kernels function (MKF) is utilized to construct the relationship between limited conventional well log suites and sparse core data. The input parameters for SVM model consist of core porosity values and the same log suite as ANN\u27s input parameters, and porosity is the desired output. Compared with results from the SVM model with a single kernel function, mixed kernel function based SVM model provide more accurate porosity prediction values.;Base on the well log analysis, four reservoir subunits within a marine-dominated estuarine depositional system are defined: barrier sand, central bay shale, tidal channels and fluvial channel subunits. A 3-D geological model, which is used to estimate theoretical CO2 sequestration capacity, is constructed with the integration of core data, wireline log data and geological background knowledge. Depending on the proposed 3-D geological model, the best regions for coupled CCUS-EOR are located in southern portions of the field, and the estimated CO2 theoretical storage capacity for Jacksonburg-Stringtown oil field vary between 24 to 383 million metric tons. The estimation results of CO2 sequestration and EOR potential indicate that the Jacksonburg-Stringtown oilfield has significant potential for CO2 storage and value-added EOR

    Monte Carlo Method with Heuristic Adjustment for Irregularly Shaped Food Product Volume Measurement

    Get PDF
    Volume measurement plays an important role in the production and processing of food products. Various methods have been proposed to measure the volume of food products with irregular shapes based on 3D reconstruction. However, 3D reconstruction comes with a high-priced computational cost. Furthermore, some of the volume measurement methods based on 3D reconstruction have a low accuracy. Another method for measuring volume of objects uses Monte Carlo method. Monte Carlo method performs volume measurements using random points. Monte Carlo method only requires information regarding whether random points fall inside or outside an object and does not require a 3D reconstruction. This paper proposes volume measurement using a computer vision system for irregularly shaped food products without 3D reconstruction based on Monte Carlo method with heuristic adjustment. Five images of food product were captured using five cameras and processed to produce binary images. Monte Carlo integration with heuristic adjustment was performed to measure the volume based on the information extracted from binary images. The experimental results show that the proposed method provided high accuracy and precision compared to the water displacement method. In addition, the proposed method is more accurate and faster than the space carving method

    Intelligent Computing for Big Data

    Get PDF
    Recent advances in artificial intelligence have the potential to further develop current big data research. The Special Issue on ‘Intelligent Computing for Big Data’ highlighted a number of recent studies related to the use of intelligent computing techniques in the processing of big data for text mining, autism diagnosis, behaviour recognition, and blockchain-based storage

    Automatic Malware Detection

    Get PDF
    The problem of automatic malware detection presents challenges for antivirus vendors. Since the manual investigation is not possible due to the massive number of samples being submitted every day, automatic malware classication is necessary. Our work is focused on an automatic malware detection framework based on machine learning algorithms. We proposed several static malware detection systems for the Windows operating system to achieve the primary goal of distinguishing between malware and benign software. We also considered the more practical goal of detecting as much malware as possible while maintaining a suciently low false positive rate. We proposed several malware detection systems using various machine learning techniques, such as ensemble classier, recurrent neural network, and distance metric learning. We designed architectures of the proposed detection systems, which are automatic in the sense that extraction of features, preprocessing, training, and evaluating the detection model can be automated. However, antivirus program relies on more complex system that consists of many components where several of them depends on malware analysts and researchers. Malware authors adapt their malicious programs frequently in order to bypass antivirus programs that are regularly updated. Our proposed detection systems are not automatic in the sense that they are not able to automatically adapt to detect the newest malware. However, we can partly solve this problem by running our proposed systems again if the training set contains the newest malware. Our work relied on static analysis only. In this thesis, we discuss advantages and drawbacks in comparison to dynamic analysis. Static analysis still plays an important role, and it is used as one component of a complex detection system.The problem of automatic malware detection presents challenges for antivirus vendors. Since the manual investigation is not possible due to the massive number of samples being submitted every day, automatic malware classication is necessary. Our work is focused on an automatic malware detection framework based on machine learning algorithms. We proposed several static malware detection systems for the Windows operating system to achieve the primary goal of distinguishing between malware and benign software. We also considered the more practical goal of detecting as much malware as possible while maintaining a suciently low false positive rate. We proposed several malware detection systems using various machine learning techniques, such as ensemble classier, recurrent neural network, and distance metric learning. We designed architectures of the proposed detection systems, which are automatic in the sense that extraction of features, preprocessing, training, and evaluating the detection model can be automated. However, antivirus program relies on more complex system that consists of many components where several of them depends on malware analysts and researchers. Malware authors adapt their malicious programs frequently in order to bypass antivirus programs that are regularly updated. Our proposed detection systems are not automatic in the sense that they are not able to automatically adapt to detect the newest malware. However, we can partly solve this problem by running our proposed systems again if the training set contains the newest malware. Our work relied on static analysis only. In this thesis, we discuss advantages and drawbacks in comparison to dynamic analysis. Static analysis still plays an important role, and it is used as one component of a complex detection system
    corecore