1,026 research outputs found

    Big Data Analytics for Complex Systems

    Get PDF
    The evolution of technology in all fields led to the generation of vast amounts of data by modern systems. Using data to extract information, make predictions, and make decisions is the current trend in artificial intelligence. The advancement of big data analytics tools made accessing and storing data easier and faster than ever, and machine learning algorithms help to identify patterns in and extract information from data. The current tools and machines in health, computer technologies, and manufacturing can generate massive raw data about their products or samples. The author of this work proposes a modern integrative system that can utilize big data analytics, machine learning, super-computer resources, and industrial health machines’ measurements to build a smart system that can mimic the human intelligence skills of observations, detection, prediction, and decision-making. The applications of the proposed smart systems are included as case studies to highlight the contributions of each system. The first contribution is the ability to utilize big data revolutionary and deep learning technologies on production lines to diagnose incidents and take proper action. In the current digital transformational industrial era, Industry 4.0 has been receiving researcher attention because it can be used to automate production-line decisions. Reconfigurable manufacturing systems (RMS) have been widely used to reduce the setup cost of restructuring production lines. However, the current RMS modules are not linked to the cloud for online decision-making to take the proper decision; these modules must connect to an online server (super-computer) that has big data analytics and machine learning capabilities. The online means that data is centralized on cloud (supercomputer) and accessible in real-time. In this study, deep neural networks are utilized to detect the decisive features of a product and build a prediction model in which the iFactory will make the necessary decision for the defective products. The Spark ecosystem is used to manage the access, processing, and storing of the big data streaming. This contribution is implemented as a closed cycle, which for the best of our knowledge, no one in the literature has introduced big data analysis using deep learning on real-time applications in the manufacturing system. The code shows a high accuracy of 97% for classifying the normal versus defective items. The second contribution, which is in Bioinformatics, is the ability to build supervised machine learning approaches based on the gene expression of patients to predict proper treatment for breast cancer. In the trial, to personalize treatment, the machine learns the genes that are active in the patient cohort with a five-year survival period. The initial condition here is that each group must only undergo one specific treatment. After learning about each group (or class), the machine can personalize the treatment of a new patient by diagnosing the patients’ gene expression. The proposed model will help in the diagnosis and treatment of the patient. The future work in this area involves building a protein-protein interaction network with the selected genes for each treatment to first analyze the motives of the genes and target them with the proper drug molecules. In the learning phase, a couple of feature-selection techniques and supervised standard classifiers are used to build the prediction model. Most of the nodes show a high-performance measurement where accuracy, sensitivity, specificity, and F-measure ranges around 100%. The third contribution is the ability to build semi-supervised learning for the breast cancer survival treatment that advances the second contribution. By understanding the relations between the classes, we can design the machine learning phase based on the similarities between classes. In the proposed research, the researcher used the Euclidean matrix distance among each survival treatment class to build the hierarchical learning model. The distance information that is learned through a non-supervised approach can help the prediction model to select the classes that are away from each other to maximize the distance between classes and gain wider class groups. The performance measurement of this approach shows a slight improvement from the second model. However, this model reduced the number of discriminative genes from 47 to 37. The model in the second contribution studies each class individually while this model focuses on the relationships between the classes and uses this information in the learning phase. Hierarchical clustering is completed to draw the borders between groups of classes before building the classification models. Several distance measurements are tested to identify the best linkages between classes. Most of the nodes show a high-performance measurement where accuracy, sensitivity, specificity, and F-measure ranges from 90% to 100%. All the case study models showed high-performance measurements in the prediction phase. These modern models can be replicated for different problems within different domains. The comprehensive models of the newer technologies are reconfigurable and modular; any newer learning phase can be plugged-in at both ends of the learning phase. Therefore, the output of the system can be an input for another learning system, and a newer feature can be added to the input to be considered for the learning phase

    A data mining approach for training evaluation in simulation-based training

    Get PDF
    a b s t r a c t With the significant evolution of computer technologies, simulation has become a more realistic and effective experiential learning tool to assist in organizational training. Although simulation-based training can improve the effectiveness of training for company employees, there are still many management challenges that need to be overcome. This paper develops a hybrid framework that integrates data mining techniques with the simulation-based training to improve the effectiveness of training evaluation. The concept of confidence-based learning is applied to assess trainees' learning outcomes from the two dimensions of knowledge/skill level and confidence level. Data mining techniques are used to analyze trainees' profiles and data generated from simulation-based training for evaluating trainees' performance and their learning behaviors. The proposed methodology is illustrated with an example of a real case of simulation-based infantry marksmanship training in Taiwan. The results show that the proposed methodology can accurately evaluate trainees' performance and their learning behaviors and can discover latent knowledge for improving trainees' learning outcomes

    Predicting breast cancer risk, recurrence and survivability

    Full text link
    This thesis focuses on predicting breast cancer at early stages by using machine learning algorithms based on biological datasets. The accuracy of those algorithms has been improved to enable the physicians to enhance the success of treatment, thus saving lives and avoiding several further medical tests

    A method and application of machine learning in design

    Get PDF
    This thesis addresses the issue of developing machine learning techniques for the acquisition and organization of design knowledge to be used in knowledge-based design systems. It presents a general method of developing machine learning tools in the design domain. An identification tree is introduced to distinguish different approaches and strategies of machine learning in design. Three existing approaches are identified: the knowledge-oriented, the learner-oriented, and the design-oriented approach. The learner-oriented approach is critical, which focuses on the development of new machine learning tools for design knowledge acquisition. Four strategies that are suitable for this approach are: specialization, generalization, integration and exploration. A general method, called MLDS (Machine Learning in Design with 5 steps), of developing machine learning techniques in the design domain is presented. It consists of the following steps: 1) identify source data and target knowledge; 2) determine source representation and target representation; 3) identify the background knowledge available; 4) identify the features of data, knowledge and domain; and 5) develop (specialize, generalize, integrate or explore) a machine learning tool. The method is elaborated step by step and the dependencies between the components are illustrated with a corresponding framework. To assist in characterising the data, knowledge and domain, a set of formal measures are introduced. They include density of dataset, size of description space, homogeneity of dataset, complexity of domain, difficulty of domain, stability of domain, and usage of knowledge. Design knowledge is partitioned into two main types: empirical and causal. Empirical knowledge is modelled as empirical associations in categories of design attributes or empirical mappings between these meaningful categories. Eight types of empirical mappings are distinguished. Among them the mappings from one multiple dimensional space to another are recognized as the most important for both knowledge-based design systems and machine learning in design. The MLDS method is applied to the preliminary design of a learning model for the integration of design cases and design prototypes. Both source and target representations use the framework of design prototypes. The function-behaviour-structure categorization of design prototypes is used as background knowledge to improve both supervised and unsupervised learning in this task. Many-to-many mappings and time- or order-dependent data are discovered as the most important characteristics of the design domain for machine learning. Multiple attribute prediction and the capture of design concept ‘drift’ are identified as challenging tasks for machine learning in design. After the possibilities and limitations of solving the problem by modifying existing learning methods (both supervised and unsupervised) are considered, a learning model is created by integrating several learning techniques. The basic scheme of this model is that of goal-driven concept formation, which consists of flexible categorization, extensive generalization, temporary suspension, and cognitively-based sequence prediction in design. The learning process is described as follows: each time one category of attributes is treated as the predictive feature set and the remaining as the predicted feature set; a conceptual hierarchy or decision tree is constructed incrementally according the predictive features of design cases (but statistical information is generalized with both feature sets); whenever the predictive or the predicted feature set of a node becomes homogeneous, the construction process at that branch will temporarily suspend until a new case arrives and breaks this homogeneity; frequency—based prediction at indeterminate nodes is replaced with a cognitively-based sequence prediction, which allows the more recent cases to have stronger influence on the determination of the default or predicted values. An advantage of this scheme is that with the single learning algorithm, all the types of empirical mappings between function, behaviour and structure or between design problem specification and design solution description can be generalized from design cases. To enrich the indexing facilities in a conceptual hierarchy and improve its case retrieval ability, extensive generalization based memory organizations are investigated as alternatives for concept formation. An integration of the above learning techniques reduces the memory requirement of some existing extensive generalization models to a level applicable to practical problems in the design domain. The MLD5 method is particularly useful in the preliminary design of a learning system for the identification of a learning problem and of suitable strategies for solving the problem in the domain. Although the MLDS method is developed and demonstrated in the context of design, it is independent of any particular design problems and is applicable to some other domains as well. The cognitive model of sequence-based prediction developed with this method can be integrated with general concept formation methods to improve their performance in those domains where concepts drift or knowledge changes quickly, and where the degree of indeterminacy is high

    Novel data mining techniques for incompleted clinical data in diabetes management

    Get PDF
    An important part of health care involves upkeep and interpretation of medical databases containing patient records for clinical decision making, diagnosis and follow-up treatment. Missing clinical entries make it difficult to apply data mining algorithms for clinical decision support. This study demonstrates that higher predictive accuracy is possible using conventional data mining algorithms if missing values are dealt with appropriately. We propose a novel algorithm using a convolution of sub-problems to stage a super problem, where classes are defined by Cartesian Product of class values of the underlying problems, and Incomplete Information Dismissal and Data Completion techniques are applied for reducing features and imputing missing values. Predictive accuracies using Decision Branch, Nearest Neighborhood and Naïve Bayesian classifiers were compared to predict diabetes, cardiovascular disease and hypertension. Data is derived from Diabetes Screening Complications Research Initiative (DiScRi) conducted at a regional Australian university involving more than 2400 patient records with more than one hundred clinical risk factors (attributes). The results show substantial improvements in the accuracy achieved with each classifier for an effective diagnosis of diabetes, cardiovascular disease and hypertension as compared to those achieved without substituting missing values. The gain in improvement is 7% for diabetes, 21% for cardiovascular disease and 24% for hypertension, and our integrated novel approach has resulted in more than 90% accuracy for the diagnosis of any of the three conditions. This work advances data mining research towards achieving an integrated and holistic management of diabetes. - See more at: http://www.sciencedomain.org/abstract.php?iid=670&id=5&aid=6128#.VCSxDfmSx8

    Advances in Binders for Construction Materials

    Get PDF
    The global binder production for construction materials is approximately 7.5 billion tons per year, contributing ~6% to the global anthropogenic atmospheric CO2 emissions. Reducing this carbon footprint is a key aim of the construction industry, and current research focuses on developing new innovative ways to attain more sustainable binders and concrete/mortars as a real alternative to the current global demand for Portland cement.With this aim, several potential alternative binders are currently being investigated by scientists worldwide, based on calcium aluminate cement, calcium sulfoaluminate cement, alkali-activated binders, calcined clay limestone cements, nanomaterials, or supersulfated cements. This Special Issue presents contributions that address research and practical advances in i) alternative binder manufacturing processes; ii) chemical, microstructural, and structural characterization of unhydrated binders and of hydrated systems; iii) the properties and modelling of concrete and mortars; iv) applications and durability of concrete and mortars; and v) the conservation and repair of historic concrete/mortar structures using alternative binders.We believe this Special Issue will be of high interest in the binder industry and construction community, based upon the novelty and quality of the results and the real potential application of the findings to the practice and industry

    Vision-based neural network classifiers and their applications

    Get PDF
    A thesis submitted for the degree of Doctor of Philosophy of University of LutonVisual inspection of defects is an important part of quality assurance in many fields of production. It plays a very useful role in industrial applications in order to relieve human inspectors and improve the inspection accuracy and hence increasing productivity. Research has previously been done in defect classification of wood veneers using techniques such as neural networks, and a certain degree of success has been achieved. However, to improve results in tenus of both classification accuracy and running time are necessary if the techniques are to be widely adopted in industry, which has motivated this research. This research presents a method using rough sets based neural network with fuzzy input (RNNFI). Variable precision rough set (VPRS) method is proposed to remove redundant features utilising the characteristics of VPRS for data analysis and processing. The reduced data is fuzzified to represent the feature data in a more suitable foml for input to an improved BP neural network classifier. The improved BP neural network classifier is improved in three aspects: additional momentum, self-adaptive learning rates and dynamic error segmenting. Finally, to further consummate the classifier, a uniform design CUD) approach is introduced to optimise the key parameters because UD can generate a minimal set of uniform and representative design points scattered within the experiment domain. Optimal factor settings are achieved using a response surface (RSM) model and the nonlinear quadratic programming algorithm (NLPQL). Experiments have shown that the hybrid method is capable of classifying the defects of wood veneers with a fast convergence speed and high classification accuracy, comparing with other methods such as a neural network with fuzzy input and a rough sets based neural network. The research has demonstrated a methodology for visual inspection of defects, especially for situations where there is a large amount of data and a fast running speed is required. It is expected that this method can be applied to automatic visual inspection for production lines of other products such as ceramic tiles and strip steel

    Automated design of genetic programming of classification algorithms.

    Get PDF
    Doctoral Degree. University of KwaZulu-Natal, Pietermaritzburg.Over the past decades, there has been an increase in the use of evolutionary algorithms (EAs) for data mining and knowledge discovery in a wide range of application domains. Data classification, a real-world application problem is one of the areas EAs have been widely applied. Data classification has been extensively researched resulting in the development of a number of EA based classification algorithms. Genetic programming (GP) in particular has been shown to be one of the most effective EAs at inducing classifiers. It is widely accepted that the effectiveness of a parameterised algorithm like GP depends on its configuration. Currently, the design of GP classification algorithms is predominantly performed manually. Manual design follows an iterative trial and error approach which has been shown to be a menial, non-trivial time-consuming task that has a number of vulnerabilities. The research presented in this thesis is part of a large-scale initiative by the machine learning community to automate the design of machine learning techniques. The study investigates the hypothesis that automating the design of GP classification algorithms for data classification can still lead to the induction of effective classifiers. This research proposes using two evolutionary algorithms,namely,ageneticalgorithm(GA)andgrammaticalevolution(GE)toautomatethe design of GP classification algorithms. The proof-by-demonstration research methodology is used in the study to achieve the set out objectives. To that end two systems namely, a genetic algorithm system and a grammatical evolution system were implemented for automating the design of GP classification algorithms. The classification performance of the automated designed GP classifiers, i.e., GA designed GP classifiers and GE designed GP classifiers were compared to manually designed GP classifiers on real-world binary class and multiclass classification problems. The evaluation was performed on multiple domain problems obtained from the UCI machine learning repository and on two specific domains, cybersecurity and financial forecasting. The automated designed classifiers were found to outperform the manually designed GP classifiers on all the problems considered in this study. GP classifiers evolved by GE were found to be suitable for classifying binary classification problems while those evolved by a GA were found to be suitable for multiclass classification problems. Furthermore, the automated design time was found to be less than manual design time. Fitness landscape analysis of the design spaces searched by a GA and GE were carried out on all the class of problems considered in this study. Grammatical evolution found the search to be smoother on binary classification problems while the GA found multiclass problems to be less rugged than binary class problems
    corecore