1,370 research outputs found

    Optimised meta-clustering approach for clustering Time Series Matrices

    Get PDF
    The prognostics (health state) of multiple components represented as time series data stored in vectors and matrices were processed and clustered more effectively and efficiently using the newly devised ‘Meta-Clustering’ approach. These time series data gathered from large applications and systems in diverse fields such as communication, medicine, data mining, audio, visual applications, and sensors. The reason time series data was used as the domain of this research is that meaningful information could be extracted regarding the characteristics of systems and components found in large applications. Also when it came to clustering, only time series data would allow us to group these data according to their life cycle, i.e. from the time which they were healthy until the time which they start to develop faults and ultimately fail. Therefore by proposing a technique that can better process extracted time series data would significantly cut down on space and time consumption which are both crucial factors in data mining. This approach will, as a result, improve the current state of the art pattern recognition algorithms such as K-NM as the clusters will be identified faster while consuming less space. The project also has application implications in the sense that by calculating the distance between the similar components faster while also consuming less space means that the prognostics of multiple components clustered can be realised and understood more efficiently. This was achieved by using the Meta-Clustering approach to process and cluster the time series data by first extracting and storing the time series data as a two-dimensional matrix. Then implementing an enhance K-NM clustering algorithm based on the notion of Meta-Clustering and using the Euclidean distance tool to measure the similarity between the different set of failure patterns in space. This approach would initially classify and organise each component within its own refined individual cluster. This would provide the most relevant set of failure patterns that show the highest level of similarity and would also get rid of any unnecessary data that adds no value towards better understating the failure/health state of the component. Then during the second stage, once these clusters were effectively obtained, the following inner clusters initially formed are thereby grouped into one general cluster that now represents the prognostics of all the processed components. The approach was tested on multivariate time series data extracted from IGBT components within Matlab and the results achieved from this experiment showed that the optimised Meta-Clustering approach proposed does indeed consume less time and space to cluster the prognostics of IGBT components as compared to existing data mining techniques

    Efficient Decision Support Systems

    Get PDF
    This series is directed to diverse managerial professionals who are leading the transformation of individual domains by using expert information and domain knowledge to drive decision support systems (DSSs). The series offers a broad range of subjects addressed in specific areas such as health care, business management, banking, agriculture, environmental improvement, natural resource and spatial management, aviation administration, and hybrid applications of information technology aimed to interdisciplinary issues. This book series is composed of three volumes: Volume 1 consists of general concepts and methodology of DSSs; Volume 2 consists of applications of DSSs in the biomedical domain; Volume 3 consists of hybrid applications of DSSs in multidisciplinary domains. The book is shaped decision support strategies in the new infrastructure that assists the readers in full use of the creative technology to manipulate input data and to transform information into useful decisions for decision makers

    Strategies for sustainable socio-economic development and mechanisms their implementation in the global dimension

    Get PDF
    The authors of the book have come to the conclusion that it is necessary to effectively use modern approaches to developing and implementation strategies of sustainable socio-economic development in order to increase efficiency and competitiveness of economic entities. Basic research focuses on economic diagnostics of socio-economic potential and financial results of economic entities, transition period in the economy of individual countries and ensuring their competitiveness, assessment of educational processes and knowledge management. The research results have been implemented in the different models and strategies of supply and logistics management, development of non-profit organizations, competitiveness of tourism and transport, financing strategies for small and medium-sized enterprises, cross-border cooperation. The results of the study can be used in decision-making at the level the economic entities in different areas of activity and organizational-legal forms of ownership, ministries and departments that promote of development the economic entities on the basis of models and strategies for sustainable socio-economic development. The results can also be used by students and young scientists in modern concepts and mechanisms for management of sustainable socio-economic development of economic entities in the condition of global economic transformations and challenges

    Pattern Recognition

    Get PDF
    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

    Anomaly detection and explanation in big data

    Get PDF
    2021 Spring.Includes bibliographical references.Data quality tests are used to validate the data stored in databases and data warehouses, and to detect violations of syntactic and semantic constraints. Domain experts grapple with the issues related to the capturing of all the important constraints and checking that they are satisfied. The constraints are often identified in an ad hoc manner based on the knowledge of the application domain and the needs of the stakeholders. Constraints can exist over single or multiple attributes as well as records involving time series and sequences. The constraints involving multiple attributes can involve both linear and non-linear relationships among the attributes. We propose ADQuaTe as a data quality test framework that automatically (1) discovers different types of constraints from the data, (2) marks records that violate the constraints as suspicious, and (3) explains the violations. Domain knowledge is required to determine whether or not the suspicious records are actually faulty. The framework can incorporate feedback from domain experts to improve the accuracy of constraint discovery and anomaly detection. We instantiate ADQuaTe in two ways to detect anomalies in non-sequence and sequence data. The first instantiation (ADQuaTe2) uses an unsupervised approach called autoencoder for constraint discovery in non-sequence data. ADQuaTe2 is based on analyzing records in isolation to discover constraints among the attributes. We evaluate the effectiveness of ADQuaTe2 using real-world non-sequence datasets from the human health and plant diagnosis domains. We demonstrate that ADQuaTe2 can discover new constraints that were previously unspecified in existing data quality tests, and can report both previously detected and new faults in the data. We also use non-sequence datasets from the UCI repository to evaluate the improvement in the accuracy of ADQuaTe2 after incorporating ground truth knowledge and retraining the autoencoder model. The second instantiation (IDEAL) uses an unsupervised LSTM-autoencoder for constraint discovery in sequence data. IDEAL analyzes the correlations and dependencies among data records to discover constraints. We evaluate the effectiveness of IDEAL using datasets from Yahoo servers, NASA Shuttle, and Colorado State University Energy Institute. We demonstrate that IDEAL can detect previously known anomalies from these datasets. Using mutation analysis, we show that IDEAL can detect different types of injected faults. We also demonstrate that the accuracy of the approach improves after incorporating ground truth knowledge about the injected faults and retraining the LSTM-Autoencoder model. The novelty of this research lies in the development of a domain-independent framework that effectively and efficiently discovers different types of constraints from the data, detects and explains anomalous data, and minimizes false alarms through an interactive learning process

    Development and application of statistical and quantum mechanical methods for modelling molecular ensembles

    Get PDF
    The development of new quantum chemical methods requires extensive benchmarking to establish the accuracy and limitations of a method. Current benchmarking practices in computational chemistry use test sets that are subject to human biases and as such can be fundamentally flawed. This work presents a thorough benchmark of diffusion Monte Carlo (DMC) for a range of systems and properties as well as a novel method for developing new, unbiased test sets using multivariate statistical techniques. Firstly, the hydrogen abstraction of methanol is used as a test system to develop a more efficient protocol that minimises the computational cost of DMC without compromising accuracy. This protocol is then applied to three test sets of reaction energies, including 43 radical stabilisation energies, 14 Diels-Alder reactions and 76 barrier heights of hydrogen and non-hydrogen transfer reactions. The average mean absolute error for all three databases is just 0.9 kcal/mol. The accuracy of the explicitly correlated trial wavefunction used in DMC is demonstrated using the ionisation potentials and electron affinities of first- and second-row atoms. A multi-determinant trial wavefunction reduces the errors for systems with strong multi-configuration character, as well as for predominantly single-reference systems. It is shown that the use of pseudopotentials in place of all-electron basis sets slightly increases the error for these systems. DMC is then tested with a set of eighteen challenging reactions. Incorporating more determinants in the trial wavefunction reduced the errors for most systems but results are highly dependent on the active space used in the CISD wavefunction. The accuracy of multi-determinant DMC for strongly multi-reference systems is tested for the isomerisation of diazene. In this case no method was capable of reducing the error of the strongly-correlated rotational transition state. Finally, an improved method for selecting test sets is presented using multivariate statistical techniques. Bias-free test sets are constructed by selecting archetypes and prototypes based on numerical representations of molecules. Descriptors based on the one-, two- and three-dimensional structures of a molecule are tested. These new test sets are then used to benchmark a number of methods

    3dways internationalization project - Sme competitiveness field lab 2019/2020

    Get PDF
    3DWaysisa3Dprintingservice-providingcompany.Currently,thecompanywantstointernationalizeitsremotelymanagednetworkof3Dprintingfactories,specificallytargetingatthehealthcaresector.Therefore,thisprojectanalysesthecurrentandfuturestateofthe3DPrintingindustry,beforeassessingpotentialtargetmarkets.Subsequently,fivecountries,headlinedbytheUnitedKingdom,weredeterminedtobethemostsuitableforthecompany’sinternationalizationthatwillrelyondirectexportingasanentrystrategybecauseoftheuniquenatureofthenetwork.Complementary,amarketingplanandafinancialevaluationweredevelopedfor3DWaysintheUK

    On Practical machine Learning and Data Analysis

    Get PDF
    This thesis discusses and addresses some of the difficulties associated with practical machine learning and data analysis. Introducing data driven methods in e.g industrial and business applications can lead to large gains in productivity and efficiency, but the cost and complexity are often overwhelming. Creating machine learning applications in practise often involves a large amount of manual labour, which often needs to be performed by an experienced analyst without significant experience with the application area. We will here discuss some of the hurdles faced in a typical analysis project and suggest measures and methods to simplify the process. One of the most important issues when applying machine learning methods to complex data, such as e.g. industrial applications, is that the processes generating the data are modelled in an appropriate way. Relevant aspects have to be formalised and represented in a way that allow us to perform our calculations in an efficient manner. We present a statistical modelling framework, Hierarchical Graph Mixtures, based on a combination of graphical models and mixture models. It allows us to create consistent, expressive statistical models that simplify the modelling of complex systems. Using a Bayesian approach, we allow for encoding of prior knowledge and make the models applicable in situations when relatively little data are available. Detecting structures in data, such as clusters and dependency structure, is very important both for understanding an application area and for specifying the structure of e.g. a hierarchical graph mixture. We will discuss how this structure can be extracted for sequential data. By using the inherent dependency structure of sequential data we construct an information theoretical measure of correlation that does not suffer from the problems most common correlation measures have with this type of data. In many diagnosis situations it is desirable to perform a classification in an iterative and interactive manner. The matter is often complicated by very limited amounts of knowledge and examples when a new system to be diagnosed is initially brought into use. We describe how to create an incremental classification system based on a statistical model that is trained from empirical data, and show how the limited available background information can still be used initially for a functioning diagnosis system. To minimise the effort with which results are achieved within data analysis projects, we need to address not only the models used, but also the methodology and applications that can help simplify the process. We present a methodology for data preparation and a software library intended for rapid analysis, prototyping, and deployment. Finally, we will study a few example applications, presenting tasks within classification, prediction and anomaly detection. The examples include demand prediction for supply chain management, approximating complex simulators for increased speed in parameter optimisation, and fraud detection and classification within a media-on-demand system

    Advanced and novel modeling techniques for simulation, optimization and monitoring chemical engineering tasks with refinery and petrochemical unit applications

    Get PDF
    Engineers predict, optimize, and monitor processes to improve safety and profitability. Models automate these tasks and determine precise solutions. This research studies and applies advanced and novel modeling techniques to automate and aid engineering decision-making. Advancements in computational ability have improved modeling software’s ability to mimic industrial problems. Simulations are increasingly used to explore new operating regimes and design new processes. In this work, we present a methodology for creating structured mathematical models, useful tips to simplify models, and a novel repair method to improve convergence by populating quality initial conditions for the simulation’s solver. A crude oil refinery application is presented including simulation, simplification tips, and the repair strategy implementation. A crude oil scheduling problem is also presented which can be integrated with production unit models. Recently, stochastic global optimization (SGO) has shown to have success of finding global optima to complex nonlinear processes. When performing SGO on simulations, model convergence can become an issue. The computational load can be decreased by 1) simplifying the model and 2) finding a synergy between the model solver repair strategy and optimization routine by using the initial conditions formulated as points to perturb the neighborhood being searched. Here, a simplifying technique to merging the crude oil scheduling problem and the vertically integrated online refinery production optimization is demonstrated. To optimize the refinery production a stochastic global optimization technique is employed. Process monitoring has been vastly enhanced through a data-driven modeling technique Principle Component Analysis. As opposed to first-principle models, which make assumptions about the structure of the model describing the process, data-driven techniques make no assumptions about the underlying relationships. Data-driven techniques search for a projection that displays data into a space easier to analyze. Feature extraction techniques, commonly dimensionality reduction techniques, have been explored fervidly to better capture nonlinear relationships. These techniques can extend data-driven modeling’s process-monitoring use to nonlinear processes. Here, we employ a novel nonlinear process-monitoring scheme, which utilizes Self-Organizing Maps. The novel techniques and implementation methodology are applied and implemented to a publically studied Tennessee Eastman Process and an industrial polymerization unit
    • …
    corecore