7 research outputs found

    Kernel Fisher Discriminant Analysis Based on a Regularized Method for Multiclassification and Application in Lithological Identification

    Get PDF
    This study aimed to construct a kernel Fisher discriminant analysis (KFDA) method from well logs for lithology identification purposes. KFDA, via the use of a kernel trick, greatly improves the multiclassification accuracy compared with Fisher discriminant analysis (FDA). The optimal kernel Fisher projection of KFDA can be expressed as a generalized characteristic equation. However, it is difficult to solve the characteristic equation; therefore, a regularized method is used for it. In the absence of a method to determine the value of the regularized parameter, it is often determined based on expert human experience or is specified by tests. In this paper, it is proposed to use an improved KFDA (IKFDA) to obtain the optimal regularized parameter by means of a numerical method. The approach exploits the optimal regularized parameter selection ability of KFDA to obtain improved classification results. The method is simple and not computationally complex. The IKFDA was applied to the Iris data sets for training and testing purposes and subsequently to lithology data sets. The experimental results illustrated that it is possible to successfully separate data that is nonlinearly separable, thereby confirming that the method is effective

    A survey of the application of soft computing to investment and financial trading

    Get PDF

    Building a Strong Undergraduate Research Culture in African Universities

    Get PDF
    Africa had a late start in the race to setting up and obtaining universities with research quality fundamentals. According to Mamdani [5], the first colonial universities were few and far between: Makerere in East Africa, Ibadan and Legon in West Africa. This last place in the race, compared to other continents, has had tremendous implications in the development plans for the continent. For Africa, the race has been difficult from a late start to an insurmountable litany of problems that include difficulty in equipment acquisition, lack of capacity, limited research and development resources and lack of investments in local universities. In fact most of these universities are very recent with many less than 50 years in business except a few. To help reduce the labor costs incurred by the colonial masters of shipping Europeans to Africa to do mere clerical jobs, they started training ―workshops‖ calling them technical or business colleges. According to Mamdani, meeting colonial needs was to be achieved while avoiding the ―Indian disease‖ in Africa -- that is, the development of an educated middle class, a group most likely to carry the virus of nationalism. Upon independence, most of these ―workshops‖ were turned into national ―universities‖, but with no clear role in national development. These national ―universities‖ were catering for children of the new African political elites. Through the seventies and eighties, most African universities were still without development agendas and were still doing business as usual. Meanwhile, governments strapped with lack of money saw no need of putting more scarce resources into big white elephants. By mid-eighties, even the UN and IMF were calling for a limit on funding African universities. In today‘s African university, the traditional curiosity driven research model has been replaced by a market-driven model dominated by a consultancy culture according to Mamdani (Mamdani, Mail and Guardian Online). The prevailing research culture as intellectual life in universities has been reduced to bare-bones classroom activity, seminars and workshops have migrated to hotels and workshop attendance going with transport allowances and per diems (Mamdani, Mail and Guardian Online). There is need to remedy this situation and that is the focus of this paper

    IoT and Sensor Networks in Industry and Society

    Get PDF
    The exponential progress of Information and Communication Technology (ICT) is one of the main elements that fueled the acceleration of the globalization pace. Internet of Things (IoT), Artificial Intelligence (AI) and big data analytics are some of the key players of the digital transformation that is affecting every aspect of human's daily life, from environmental monitoring to healthcare systems, from production processes to social interactions. In less than 20 years, people's everyday life has been revolutionized, and concepts such as Smart Home, Smart Grid and Smart City have become familiar also to non-technical users. The integration of embedded systems, ubiquitous Internet access, and Machine-to-Machine (M2M) communications have paved the way for paradigms such as IoT and Cyber Physical Systems (CPS) to be also introduced in high-requirement environments such as those related to industrial processes, under the forms of Industrial Internet of Things (IIoT or I2oT) and Cyber-Physical Production Systems (CPPS). As a consequence, in 2011 the German High-Tech Strategy 2020 Action Plan for Germany first envisioned the concept of Industry 4.0, which is rapidly reshaping traditional industrial processes. The term refers to the promise to be the fourth industrial revolution. Indeed, the ïŹrst industrial revolution was triggered by water and steam power. Electricity and assembly lines enabled mass production in the second industrial revolution. In the third industrial revolution, the introduction of control automation and Programmable Logic Controllers (PLCs) gave a boost to factory production. As opposed to the previous revolutions, Industry 4.0 takes advantage of Internet access, M2M communications, and deep learning not only to improve production efficiency but also to enable the so-called mass customization, i.e. the mass production of personalized products by means of modularized product design and ïŹ‚exible processes. Less than five years later, in January 2016, the Japanese 5th Science and Technology Basic Plan took a further step by introducing the concept of Super Smart Society or Society 5.0. According to this vision, in the upcoming future, scientific and technological innovation will guide our society into the next social revolution after the hunter-gatherer, agrarian, industrial, and information eras, which respectively represented the previous social revolutions. Society 5.0 is a human-centered society that fosters the simultaneous achievement of economic, environmental and social objectives, to ensure a high quality of life to all citizens. This information-enabled revolution aims to tackle today’s major challenges such as an ageing population, social inequalities, depopulation and constraints related to energy and the environment. Accordingly, the citizens will be experiencing impressive transformations into every aspect of their daily lives. This book offers an insight into the key technologies that are going to shape the future of industry and society. It is subdivided into five parts: the I Part presents a horizontal view of the main enabling technologies, whereas the II-V Parts offer a vertical perspective on four different environments. The I Part, dedicated to IoT and Sensor Network architectures, encompasses three Chapters. In Chapter 1, Peruzzi and Pozzebon analyse the literature on the subject of energy harvesting solutions for IoT monitoring systems and architectures based on Low-Power Wireless Area Networks (LPWAN). The Chapter does not limit the discussion to Long Range Wise Area Network (LoRaWAN), SigFox and Narrowband-IoT (NB-IoT) communication protocols, but it also includes other relevant solutions such as DASH7 and Long Term Evolution MAchine Type Communication (LTE-M). In Chapter 2, Hussein et al. discuss the development of an Internet of Things message protocol that supports multi-topic messaging. The Chapter further presents the implementation of a platform, which integrates the proposed communication protocol, based on Real Time Operating System. In Chapter 3, Li et al. investigate the heterogeneous task scheduling problem for data-intensive scenarios, to reduce the global task execution time, and consequently reducing data centers' energy consumption. The proposed approach aims to maximize the efficiency by comparing the cost between remote task execution and data migration. The II Part is dedicated to Industry 4.0, and includes two Chapters. In Chapter 4, Grecuccio et al. propose a solution to integrate IoT devices by leveraging a blockchain-enabled gateway based on Ethereum, so that they do not need to rely on centralized intermediaries and third-party services. As it is better explained in the paper, where the performance is evaluated in a food-chain traceability application, this solution is particularly beneficial in Industry 4.0 domains. Chapter 5, by De Fazio et al., addresses the issue of safety in workplaces by presenting a smart garment that integrates several low-power sensors to monitor environmental and biophysical parameters. This enables the detection of dangerous situations, so as to prevent or at least reduce the consequences of workers accidents. The III Part is made of two Chapters based on the topic of Smart Buildings. In Chapter 6, Petroșanu et al. review the literature about recent developments in the smart building sector, related to the use of supervised and unsupervised machine learning models of sensory data. The Chapter poses particular attention on enhanced sensing, energy efficiency, and optimal building management. In Chapter 7, Oh examines how much the education of prosumers about their energy consumption habits affects power consumption reduction and encourages energy conservation, sustainable living, and behavioral change, in residential environments. In this Chapter, energy consumption monitoring is made possible thanks to the use of smart plugs. Smart Transport is the subject of the IV Part, including three Chapters. In Chapter 8, Roveri et al. propose an approach that leverages the small world theory to control swarms of vehicles connected through Vehicle-to-Vehicle (V2V) communication protocols. Indeed, considering a queue dominated by short-range car-following dynamics, the Chapter demonstrates that safety and security are increased by the introduction of a few selected random long-range communications. In Chapter 9, Nitti et al. present a real time system to observe and analyze public transport passengers' mobility by tracking them throughout their journey on public transport vehicles. The system is based on the detection of the active Wi-Fi interfaces, through the analysis of Wi-Fi probe requests. In Chapter 10, Miler et al. discuss the development of a tool for the analysis and comparison of efficiency indicated by the integrated IT systems in the operational activities undertaken by Road Transport Enterprises (RTEs). The authors of this Chapter further provide a holistic evaluation of efficiency of telematics systems in RTE operational management. The book ends with the two Chapters of the V Part on Smart Environmental Monitoring. In Chapter 11, He et al. propose a Sea Surface Temperature Prediction (SSTP) model based on time-series similarity measure, multiple pattern learning and parameter optimization. In this strategy, the optimal parameters are determined by means of an improved Particle Swarm Optimization method. In Chapter 12, Tsipis et al. present a low-cost, WSN-based IoT system that seamlessly embeds a three-layered cloud/fog computing architecture, suitable for facilitating smart agricultural applications, especially those related to wildfire monitoring. We wish to thank all the authors that contributed to this book for their efforts. We express our gratitude to all reviewers for the volunteering support and precious feedback during the review process. We hope that this book provides valuable information and spurs meaningful discussion among researchers, engineers, businesspeople, and other experts about the role of new technologies into industry and society

    Analysing functional genomics data using novel ensemble, consensus and data fusion techniques

    Get PDF
    Motivation: A rapid technological development in the biosciences and in computer science in the last decade has enabled the analysis of high-dimensional biological datasets on standard desktop computers. However, in spite of these technical advances, common properties of the new high-throughput experimental data, like small sample sizes in relation to the number of features, high noise levels and outliers, also pose novel challenges. Ensemble and consensus machine learning techniques and data integration methods can alleviate these issues, but often provide overly complex models which lack generalization capability and interpretability. The goal of this thesis was therefore to develop new approaches to combine algorithms and large-scale biological datasets, including novel approaches to integrate analysis types from different domains (e.g. statistics, topological network analysis, machine learning and text mining), to exploit their synergies in a manner that provides compact and interpretable models for inferring new biological knowledge. Main results: The main contributions of the doctoral project are new ensemble, consensus and cross-domain bioinformatics algorithms, and new analysis pipelines combining these techniques within a general framework. This framework is designed to enable the integrative analysis of both large- scale gene and protein expression data (including the tools ArrayMining, Top-scoring pathway pairs and RNAnalyze) and general gene and protein sets (including the tools TopoGSA , EnrichNet and PathExpand), by combining algorithms for different statistical learning tasks (feature selection, classification and clustering) in a modular fashion. Ensemble and consensus analysis techniques employed within the modules are redesigned such that the compactness and interpretability of the resulting models is optimized in addition to the predictive accuracy and robustness. The framework was applied to real-word biomedical problems, with a focus on cancer biology, providing the following main results: (1) The identification of a novel tumour marker gene in collaboration with the Nottingham Queens Medical Centre, facilitating the distinction between two clinically important breast cancer subtypes (framework tool: ArrayMining) (2) The prediction of novel candidate disease genes for Alzheimer’s disease and pancreatic cancer using an integrative analysis of cellular pathway definitions and protein interaction data (framework tool: PathExpand, collaboration with the Spanish National Cancer Centre) (3) The prioritization of associations between disease-related processes and other cellular pathways using a new rule-based classification method integrating gene expression data and pathway definitions (framework tool: Top-scoring pathway pairs) (4) The discovery of topological similarities between differentially expressed genes in cancers and cellular pathway definitions mapped to a molecular interaction network (framework tool: TopoGSA, collaboration with the Spanish National Cancer Centre) In summary, the framework combines the synergies of multiple cross-domain analysis techniques within a single easy-to-use software and has provided new biological insights in a wide variety of practical settings

    Analysing functional genomics data using novel ensemble, consensus and data fusion techniques

    Get PDF
    Motivation: A rapid technological development in the biosciences and in computer science in the last decade has enabled the analysis of high-dimensional biological datasets on standard desktop computers. However, in spite of these technical advances, common properties of the new high-throughput experimental data, like small sample sizes in relation to the number of features, high noise levels and outliers, also pose novel challenges. Ensemble and consensus machine learning techniques and data integration methods can alleviate these issues, but often provide overly complex models which lack generalization capability and interpretability. The goal of this thesis was therefore to develop new approaches to combine algorithms and large-scale biological datasets, including novel approaches to integrate analysis types from different domains (e.g. statistics, topological network analysis, machine learning and text mining), to exploit their synergies in a manner that provides compact and interpretable models for inferring new biological knowledge. Main results: The main contributions of the doctoral project are new ensemble, consensus and cross-domain bioinformatics algorithms, and new analysis pipelines combining these techniques within a general framework. This framework is designed to enable the integrative analysis of both large- scale gene and protein expression data (including the tools ArrayMining, Top-scoring pathway pairs and RNAnalyze) and general gene and protein sets (including the tools TopoGSA , EnrichNet and PathExpand), by combining algorithms for different statistical learning tasks (feature selection, classification and clustering) in a modular fashion. Ensemble and consensus analysis techniques employed within the modules are redesigned such that the compactness and interpretability of the resulting models is optimized in addition to the predictive accuracy and robustness. The framework was applied to real-word biomedical problems, with a focus on cancer biology, providing the following main results: (1) The identification of a novel tumour marker gene in collaboration with the Nottingham Queens Medical Centre, facilitating the distinction between two clinically important breast cancer subtypes (framework tool: ArrayMining) (2) The prediction of novel candidate disease genes for Alzheimer’s disease and pancreatic cancer using an integrative analysis of cellular pathway definitions and protein interaction data (framework tool: PathExpand, collaboration with the Spanish National Cancer Centre) (3) The prioritization of associations between disease-related processes and other cellular pathways using a new rule-based classification method integrating gene expression data and pathway definitions (framework tool: Top-scoring pathway pairs) (4) The discovery of topological similarities between differentially expressed genes in cancers and cellular pathway definitions mapped to a molecular interaction network (framework tool: TopoGSA, collaboration with the Spanish National Cancer Centre) In summary, the framework combines the synergies of multiple cross-domain analysis techniques within a single easy-to-use software and has provided new biological insights in a wide variety of practical settings

    Towards Efficient Intrusion Detection using Hybrid Data Mining Techniques

    Get PDF
    The enormous development in the connectivity among different type of networks poses significant concerns in terms of privacy and security. As such, the exponential expansion in the deployment of cloud technology has produced a massive amount of data from a variety of applications, resources and platforms. In turn, the rapid rate and volume of data creation in high-dimension has begun to pose significant challenges for data management and security. Handling redundant and irrelevant features in high-dimensional space has caused a long-term challenge for network anomaly detection. Eliminating such features with spectral information not only speeds up the classification process, but also helps classifiers make accurate decisions during attack recognition time, especially when coping with large-scale and heterogeneous data such as network traffic data. Furthermore, the continued evolution of network attack patterns has resulted in the emergence of zero-day cyber attacks, which nowadays has considered as a major challenge in cyber security. In this threat environment, traditional security protections like firewalls, anti-virus software, and virtual private networks are not always sufficient. With this in mind, most of the current intrusion detection systems (IDSs) are either signature-based, which has been proven to be insufficient in identifying novel attacks, or developed based on absolute datasets. Hence, a robust mechanism for detecting intrusions, i.e. anomaly-based IDS, in the big data setting has therefore become a topic of importance. In this dissertation, an empirical study has been conducted at the initial stage to identify the challenges and limitations in the current IDSs, providing a systematic treatment of methodologies and techniques. Next, a comprehensive IDS framework has been proposed to overcome the aforementioned shortcomings. First, a novel hybrid dimensionality reduction technique is proposed combining information gain (IG) and principal component analysis (PCA) methods with an ensemble classifier based on three different classification techniques, named IG-PCA-Ensemble. Experimental results show that the proposed dimensionality reduction method contributes more critical features and reduced the detection time significantly. The results show that the proposed IG-PCA-Ensemble approach has also exhibits better performance than the majority of the existing state-of-the-art approaches
    corecore