1,010 research outputs found

    Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications

    Get PDF
    Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies

    Computational Intelligence Based Classifier Fusion Models for Biomedical Classification Applications

    Get PDF
    The generalization abilities of machine learning algorithms often depend on the algorithms’ initialization, parameter settings, training sets, or feature selections. For instance, SVM classifier performance largely relies on whether the selected kernel functions are suitable for real application data. To enhance the performance of individual classifiers, this dissertation proposes classifier fusion models using computational intelligence knowledge to combine different classifiers. The first fusion model called T1FFSVM combines multiple SVM classifiers through constructing a fuzzy logic system. T1FFSVM can be improved by tuning the fuzzy membership functions of linguistic variables using genetic algorithms. The improved model is called GFFSVM. To better handle uncertainties existing in fuzzy MFs and in classification data, T1FFSVM can also be improved by applying type-2 fuzzy logic to construct a type-2 fuzzy classifier fusion model (T2FFSVM). T1FFSVM, GFFSVM, and T2FFSVM use accuracy as a classifier performance measure. AUC (the area under an ROC curve) is proved to be a better classifier performance metric. As a comparison study, AUC-based classifier fusion models are also proposed in the dissertation. The experiments on biomedical datasets demonstrate promising performance of the proposed classifier fusion models comparing with the individual composing classifiers. The proposed classifier fusion models also demonstrate better performance than many existing classifier fusion methods. The dissertation also studies one interesting phenomena in biology domain using machine learning and classifier fusion methods. That is, how protein structures and sequences are related each other. The experiments show that protein segments with similar structures also share similar sequences, which add new insights into the existing knowledge on the relation between protein sequences and structures: similar sequences share high structure similarity, but similar structures may not share high sequence similarity

    Protein Tertiary Model Assessment Using Granular Machine Learning Techniques

    Get PDF
    The automatic prediction of protein three dimensional structures from its amino acid sequence has become one of the most important and researched fields in bioinformatics. As models are not experimental structures determined with known accuracy but rather with prediction it’s vital to determine estimates of models quality. We attempt to solve this problem using machine learning techniques and information from both the sequence and structure of the protein. The goal is to generate a machine that understands structures from PDB and when given a new model, predicts whether it belongs to the same class as the PDB structures (correct or incorrect protein models). Different subsets of PDB (protein data bank) are considered for evaluating the prediction potential of the machine learning methods. Here we show two such machines, one using SVM (support vector machines) and another using fuzzy decision trees (FDT). First using a preliminary encoding style SVM could get around 70% in protein model quality assessment accuracy, and improved Fuzzy Decision Tree (IFDT) could reach above 80% accuracy. For the purpose of reducing computational overhead multiprocessor environment and basic feature selection method is used in machine learning algorithm using SVM. Next an enhanced scheme is introduced using new encoding style. In the new style, information like amino acid substitution matrix, polarity, secondary structure information and relative distance between alpha carbon atoms etc is collected through spatial traversing of the 3D structure to form training vectors. This guarantees that the properties of alpha carbon atoms that are close together in 3D space and thus interacting are used in vector formation. With the use of fuzzy decision tree, we obtained a training accuracy around 90%. There is significant improvement compared to previous encoding technique in prediction accuracy and execution time. This outcome motivates to continue to explore effective machine learning algorithms for accurate protein model quality assessment. Finally these machines are tested using CASP8 and CASP9 templates and compared with other CASP competitors, with promising results. We further discuss the importance of model quality assessment and other information from proteins that could be considered for the same

    Global Nonlinear Kernel Prediction for Large Dataset with a Particle Swarm Optimized Interval Support Vector Regression

    Get PDF
    A new global nonlinear predictor with a particle swarm-optimized interval support vector regression (PSO-ISVR) is proposed to address three issues (viz., kernel selection, model optimization, kernel method speed) encountered when applying SVR in the presence of large data sets. The novel prediction model can reduce the SVR computing overhead by dividing input space and adaptively selecting the optimized kernel functions to obtain optimal SVR parameter by PSO. To quantify the quality of the predictor, its generalization performance and execution speed are investigated based on statistical learning theory. In addition, experiments using synthetic data as well as the stock volume weighted average price are reported to demonstrate the effectiveness of the developed models. The experimental results show that the proposed PSO-ISVR predictor can improve the computational efficiency and the overall prediction accuracy compared with the results produced by the SVR and other regression methods. The proposed PSO-ISVR provides an important tool for nonlinear regression analysis of big data

    Application of P300 Event-Related Potential in Brain-Computer Interface

    Get PDF
    The primary purpose of this chapter is to demonstrate one of the applications of P300 event-related potential (ERP), i.e., brain-computer interface (BCI). Researchers and students will find the chapter appealing with a preliminary description of P300 ERP. This chapter also appreciates the importance and advantages of noninvasive ERP technique. In noninvasive BCI, the P300 ERPs are extracted from brain electrical activities [electroencephalogram (EEG)] as a signature of the underlying electrophysiological mechanism of brain responses to the external or internal changes and events. As the chapter proceeds, topics are covered on more relevant scholarly works about challenges and new directions in P300 BCI. Along with these, articles with the references on the advancement of this technique will be presented to ensure that the scholarly reviews are accessible to people who are new to this field. To enhance fundamental understanding, stimulation as well as signal processing methods will be discussed from some novel works with a comparison of the associated results. This chapter will meet the need for a concise and practical description of basic, as well as advanced P300 ERP techniques, which is suitable for a broad range of researchers extending from today’s novice to an experienced cognitive researcher

    Storage Capacity Estimation of Commercial Scale Injection and Storage of CO2 in the Jacksonburg-Stringtown Oil Field, West Virginia

    Get PDF
    Geological capture, utilization and storage (CCUS) of carbon dioxide (CO2) in depleted oil and gas reservoirs is one method to reduce greenhouse gas emissions with enhanced oil recovery (EOR) and extending the life of the field. Therefore CCUS coupled with EOR is considered to be an economic approach to demonstration of commercial-scale injection and storage of anthropogenic CO2. Several critical issues should be taken into account prior to injecting large volumes of CO2, such as storage capacity, project duration and long-term containment. Reservoir characterization and 3D geological modeling are the best way to estimate the theoretical CO 2 storage capacity in mature oil fields. The Jacksonburg-Stringtown field, located in northwestern West Virginia, has produced over 22 million barrels of oil (MMBO) since 1895. The sandstone of the Late Devonian Gordon Stray is the primary reservoir.;The Upper Devonian fluvial sandstone reservoirs in Jacksonburg-Stringtown oil field, which has produced over 22 million barrels of oil since 1895, are an ideal candidate for CO2 sequestration coupled with EOR. Supercritical depth (\u3e2500 ft.), minimum miscible pressure (941 psi), favorable API gravity (46.5°) and good water flood response are indicators that facilitate CO 2-EOR operations. Moreover, Jacksonburg-Stringtown oil field is adjacent to a large concentration of CO2 sources located along the Ohio River that could potentially supply enough CO2 for sequestration and EOR without constructing new pipeline facilities.;Permeability evaluation is a critical parameter to understand the subsurface fluid flow and reservoir management for primary and enhanced hydrocarbon recovery and efficient carbon storage. In this study, a rapid, robust and cost-effective artificial neural network (ANN) model is constructed to predict permeability using the model\u27s strong ability to recognize the possible interrelationships between input and output variables. Two commonly available conventional well logs, gamma ray and bulk density, and three logs derived variables, the slope of GR, the slope of bulk density and Vsh were selected as input parameters and permeability was selected as desired output parameter to train and test an artificial neural network. The results indicate that the ANN model can be applied effectively in permeability prediction.;Porosity is another fundamental property that characterizes the storage capability of fluid and gas bearing formations in a reservoir. In this study, a support vector machine (SVM) with mixed kernels function (MKF) is utilized to construct the relationship between limited conventional well log suites and sparse core data. The input parameters for SVM model consist of core porosity values and the same log suite as ANN\u27s input parameters, and porosity is the desired output. Compared with results from the SVM model with a single kernel function, mixed kernel function based SVM model provide more accurate porosity prediction values.;Base on the well log analysis, four reservoir subunits within a marine-dominated estuarine depositional system are defined: barrier sand, central bay shale, tidal channels and fluvial channel subunits. A 3-D geological model, which is used to estimate theoretical CO2 sequestration capacity, is constructed with the integration of core data, wireline log data and geological background knowledge. Depending on the proposed 3-D geological model, the best regions for coupled CCUS-EOR are located in southern portions of the field, and the estimated CO2 theoretical storage capacity for Jacksonburg-Stringtown oil field vary between 24 to 383 million metric tons. The estimation results of CO2 sequestration and EOR potential indicate that the Jacksonburg-Stringtown oilfield has significant potential for CO2 storage and value-added EOR

    Development of a Self-Learning Approach Applied to Pattern Recognition and Fuzzy Control

    Get PDF
    Systeme auf Basis von Fuzzy-Regeln sind in der Entwicklung der Mustererkennung und Steuersystemen weit verbreitet verwendet. Die meisten aktuellen Methoden des Designs der Fuzzy-Regel-basierte Systeme leiden unter folgenden Problemen 1. Das Verfahren der Fuzzifizierung berücksichtigt weder die statistischen Eigenschaften noch reale Verteilung der betrachteten Daten / Signale nicht. Daher sind die generierten Fuzzy- Zugehörigkeitsfunktionen nicht wirklich in der Lage, diese Daten zu äußern. Darüber hinaus wird der Prozess der Fuzzifizierung manuell definiert. 2. Die ursprüngliche Größe der Regelbasis ist pauschal bestimmt. Diese Feststellung bedeutet, dass dieses Verfahren eine Redundanz in den verwendeten Regeln produzieren kann. Somit wird diese Redundanz zum Auftreten der Probleme von Komplexität und Dimensionalität führen. Der Prozess der Vermeidung dieser Probleme durch das Auswahlverfahren der einschlägigen Regeln kann zum Rechenaufwandsproblem führen. 3. Die Form der Fuzzy-Regel leidet unter dem Problem des Verlusts von Informationen, was wiederum zur Zuschreibung diesen betrachteten Variablen anderen unrealen Bereich führen kann. 4. Ferner wird die Anpassung der Fuzzy- Zugehörigkeitsfunktionen mit den Problemen von Komplexität und Rechenaufwand, wegen der damit verbundenen Iteration und mehrerer Parameter, zugeordnet. Auch wird diese Anpassung im Bereich jeder einzelner Regel realisiert; das heißt, der Anpassungsprozess im Bereich der gesamten Fuzzy-Regelbasis wird nicht durchgeführt

    Deep Learning based Prediction of Clogging Occurrences during Lignocellulosic Biomass Feeding in Screw Conveyors

    Get PDF
    Over the last decades, there have been substantial government and private sector investments to establish a commercial biorefining industry that uses lignocellulosic biomass as feedstock to produce fuels, chemicals, and other products. However, several biorefining plants experienced material conveyance problems due to the variability and complexity of the biomass feedstock. While the problems were reported in most conveyance unit operations in the biorefining plants, screw conveyors merit special attention because they are the most common conveyors used in biomass conveyance and typically function as the last conveyance unit connected to the conversion reactors. Thus, their operating status affects the plant production rate. Therefore, detecting emerging clogging events and, ultimately, proactively adjusting operating conditions to avoid downtime is crucial to improving overall plant economics. One promising solution is the development of sensor systems to detect clogging to support automated decision-making and process control. In this study, two deep learning based algorithms are developed to detect an imminent clogging event based on the current signature and vibration signals extracted from the sensors connected to the benchtop screw conveyor system. The study focuses on three biomass materials (switchgrass, loblolly pine, and hybrid poplar) and is designed around three research objectives. The first research objective examines the relationship between the occurrence of clogging in a screw conveyor and the current and vibration signals on the different feedstocks to establish the presence of clogging event fingerprint that could be exploited in automated decision-making and process-control. The second research objective applies two deep learning algorithms to the current and vibration signals to detect the imminent occurrence of clogging and its severity for decision making with an optimization procedure. The third objective examines the robustness of the optimized deep learning algorithm to detection imminent clogging events when feedstock properties (size distribution and moisture contents) vary. In the long-term, the early clogging detection methodology developed in this study could be leveraged to develop smart process controls for biomass conveyance

    Development of Machine Learning Based Analytical Tools for Pavement Performance Assessment and Crack Detection

    Get PDF
    Pavement Management System (PMS) analytical tools mainly consist of pavement condition investigation and evaluation tools, pavement condition rating and assessment tools, pavement performance prediction tools, treatment prioritizations and implementation tools. The effectiveness of a PMS highly depends on the efficiency and reliability of its pavement condition evaluation tools. Traditionally, pavement condition investigation and evaluation practices are based on manual distress surveys and performance level assessments, which have been blamed for low efficiency low reliability. Those kinds of manually surveys are labor intensive and unsafe due to proximity to live traffic conditions. Meanwhile, the accuracy can be lower due to the subjective nature of the evaluators. Considering these factors, semiautomated and automated pavement condition evaluation tools had been developed for several years. In current years, it is undoubtable that highly advanced computerized technologies have resulted successful applications in diverse engineering fields. Therefore, these techniques can be successfully incorporated into pavement condition evaluation distress detection, the analytical tools can improve the performance of existing PMSs. Hence, this research aims to bridge the gaps between highly advanced Machine Learning Techniques (MLTs) and the existing analytical tools of current PMSs. The research outputs intend to provide pavement condition evaluation tools that meet the requirement of high efficiency, accuracy, and reliability. To achieve the objectives of this research, six pavement damage condition and performance evaluation methodologies are developed. The roughness condition of pavement surface directly influences the riding quality of the users. International Roughness Index (IRI) is used worldwide by research institutions, pavement condition evaluation and management agencies to evaluate the roughness condition of the pavement. IRI is a time-dependent variable which generally tends to increase with the increase of the pavement service life. In this consideration, a multi-granularity fuzzy time series analysis based IRI prediction model is developed. Meanwhile, Particle Swarm Optimization (PSO) method is used for model optimization to obtain satisfactory IRI prediction results. Historical IRI data extracted from the InfoPave website have been used for training and testing the model. Experiment results proved the effectiveness of this method. Automated pavement condition evaluation tools can provide overall performance indices, which can then be used for treatment planning. The calculations of those performance indices are required for surface distress level and roughness condition evaluations. However, pavement surface roughness conditions are hard to obtain from surface image indicators. With this consideration, an image indicators-based pavement roughness and the overall performance prediction tools are developed. The state-of-the-art machine learning technique, XGBoost, is utilized as the main method in model training, validating and testing. In order to find the dominant image indicators that influence the pavement roughness condition and the overall performance conditions, the comprehensive pavement performance evaluation data collected by ARAN 900 are analyzed. Back Propagation Neural Network (BPNN) is used to develop the performance prediction models. On this basis, the mean important values (MIVs) for each input factor are calculated to evaluate the contributions of the input indicators. It has been observed that indicators of the wheel path cracking have the highest MIVs, which emphasizes the importance of cracking-focused maintenance treatments. The same issue is also found that current automated pavement condition evaluation systems only include the analysis of pavement surface distresses, without considering the structural capacity of the actual pavement. Hence, the structural performance analysis-based pavement performance prediction tools are developed using the Support Vector Machines (SVMs). To guarantee the overall performance of the proposed methodologies, heuristic methods including Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) are selected to optimize the model. The experiments results show a promising future of machine learning based pavement structural performance prediction. Automated pavement condition analyzers usually detect pavement surface distress through the collected pavement surface images. Then, distress types, severities, quantities, and other parameters are calculated for the overall performance index calculation. Cracks are one of the most important pavement surface distresses that should be quantified. Traditional approaches are less accurate and efficient in locating, counting and quantifying various types of cracks initialed on the pavement surface. An integrated Crack Deep Net (CrackDN) is developed based on deep learning technologies. Through model training, validation and testing, it has proved that CrackDN can detect pavement surface cracks on complex background with high accuracy. Moreover, the combination of box-level pavement crack locating, and pixel-level crack calculation can achieve comprehensive crack analysis. Thereby, more effective maintenance treatments can be assigned. Hence, a methodology regarding pixel-level crack detection which is called CrackU-net, is proposed. CrackU-net is composed of several convolutional, maxpooling, and up-convolutional layers. The model is developed based on the innovations of deep learning-based segmentation. Pavement crack data are collected by multiple devices, including automated pavement condition survey vehicles, smartphones, and action cameras. The proposed CrackU-net is tested on a separate crack image set which has not been used for training the model. The results demonstrate a promising future of use in the PMSs. Finally, the proposed toolboxes are validated through comparative experiments in terms of accuracy (precision, recall, and F-measure) and error levels. The accuracies of all those models are higher than 0.9 and the errors are lower than 0.05. Meanwhile, the findings of this research suggest that the wheel path cracking should be a priority when conducting maintenance activity planning. Benefiting from the highly advanced machine learning technologies, pavement roughness condition and the overall performance levels have a promising future of being predicted by extraction of the image indicators. Moreover, deep learning methods can be utilized to achieve both box-level and pixel-level pavement crack detection with satisfactory performance. Therefore, it is suggested that those state-of-the-art toolboxes be integrated into current PMSs to upgrade their service levels
    • …
    corecore