1,928 research outputs found

    A Survey On Data Mining Techniques and Applications

    Get PDF
    Data Mining refers to the analysis of experimental data sets to seek out relationships and to summarize the data in ways in which are each comprehensible and helpful. Compared with alternative DM techniques, Intelligent Systems (ISs) based mostly approaches that embody Artificial Neural Networks (ANNs), fuzzy pure mathematics, approximate reasoning, and derivative-free optimisation strategies similar to Genetic Algorithms (GAs), are tolerant of impreciseness, uncertainty, partial truth, and approximation. This paper reviews varieties of Data Mining techniques and applications

    New Anomaly Network Intrusion Detection System in Cloud Environment Based on Optimized Back Propagation Neural Network Using Improved Genetic Algorithm

    Get PDF
    Cloud computing is distributed architecture, providing computing facilities and storage resource as a service over an open environment (Internet), this lead to different matters related to the security and privacy in cloud computing. Thus, defending network accessible Cloud resources and services from various threats and attacks is of great concern. To address this issue, it is essential to create an efficient and effective Network Intrusion System (NIDS) to detect both outsider and insider intruders with high detection precision in the cloud environment. NIDS has become popular as an important component of the network security infrastructure, which detects malicious activities by monitoring network traffic. In this work, we propose to optimize a very popular soft computing tool widely used for intrusion detection namely, Back Propagation Neural Network (BPNN) using an Improved Genetic Algorithm (IGA). Genetic Algorithm (GA) is improved through optimization strategies, namely Parallel Processing and Fitness Value Hashing, which reduce execution time, convergence time and save processing power. Since,  Learning rate and Momentum term are among the most relevant parameters that impact the performance of BPNN classifier, we have employed IGA to find the optimal or near-optimal values of these two parameters which ensure high detection rate, high accuracy and low false alarm rate. The CloudSim simulator 4.0 and DARPA’s KDD cup datasets 1999 are used for simulation. From the detailed performance analysis, it is clear that the proposed system called “ANIDS BPNN-IGA” (Anomaly NIDS based on BPNN and IGA) outperforms several state-of-art methods and it is more suitable for network anomaly detection

    Cloud-based homomorphic encryption for privacy-preserving machine learning in clinical decision support

    Get PDF
    While privacy and security concerns dominate public cloud services, Homomorphic Encryption (HE) is seen as an emerging solution that ensures secure processing of sensitive data via untrusted networks in the public cloud or by third-party cloud vendors. It relies on the fact that some encryption algorithms display the property of homomorphism, which allows them to manipulate data meaningfully while still in encrypted form; although there are major stumbling blocks to overcome before the technology is considered mature for production cloud environments. Such a framework would find particular relevance in Clinical Decision Support (CDS) applications deployed in the public cloud. CDS applications have an important computational and analytical role over confidential healthcare information with the aim of supporting decision-making in clinical practice. Machine Learning (ML) is employed in CDS applications that typically learn and can personalise actions based on individual behaviour. A relatively simple-to-implement, common and consistent framework is sought that can overcome most limitations of Fully Homomorphic Encryption (FHE) in order to offer an expanded and flexible set of HE capabilities. In the absence of a significant breakthrough in FHE efficiency and practical use, it would appear that a solution relying on client interactions is the best known entity for meeting the requirements of private CDS-based computation, so long as security is not significantly compromised. A hybrid solution is introduced, that intersperses limited two-party interactions amongst the main homomorphic computations, allowing exchange of both numerical and logical cryptographic contexts in addition to resolving other major FHE limitations. Interactions involve the use of client-based ciphertext decryptions blinded by data obfuscation techniques, to maintain privacy. This thesis explores the middle ground whereby HE schemes can provide improved and efficient arbitrary computational functionality over a significantly reduced two-party network interaction model involving data obfuscation techniques. This compromise allows for the powerful capabilities of HE to be leveraged, providing a more uniform, flexible and general approach to privacy-preserving system integration, which is suitable for cloud deployment. The proposed platform is uniquely designed to make HE more practical for mainstream clinical application use, equipped with a rich set of capabilities and potentially very complex depth of HE operations. Such a solution would be suitable for the long-term privacy preserving-processing requirements of a cloud-based CDS system, which would typically require complex combinatorial logic, workflow and ML capabilities

    Configurable data center switch architectures

    Get PDF
    In this thesis, we explore alternative architectures for implementing con_gurable Data Center Switches along with the advantages that can be provided by such switches. Our first contribution centers around determining switch architectures that can be implemented on Field Programmable Gate Array (FPGA) to provide configurable switching protocols. In the process, we identify a gap in the availability of frameworks to realistically evaluate the performance of switch architectures in data centers and contribute a simulation framework that relies on realistic data center traffic patterns. Our framework is then used to evaluate the performance of currently existing as well as newly proposed FPGA-amenable switch designs. Through collaborative work with Meng and Papaphilippou, we establish that only small-medium range switches can be implemented on today's FPGAs. Our second contribution is a novel switch architecture that integrates a custom in-network hardware accelerator with a generic switch to accelerate Deep Neural Network training applications in data centers. Our proposed accelerator architecture is prototyped on an FPGA, and a scalability study is conducted to demonstrate the trade-offs of an FPGA implementation when compared to an ASIC implementation. In addition to the hardware prototype, we contribute a light weight load-balancing and congestion control protocol that leverages the unique communication patterns of ML data-parallel jobs to enable fair sharing of network resources across different jobs. Our large-scale simulations demonstrate the ability of our novel switch architecture and light weight congestion control protocol to both accelerate the training time of machine learning jobs by up to 1.34x and benefit other latency-sensitive applications by reducing their 99%-tile completion time by up to 4.5x. As for our final contribution, we identify the main requirements of in-network applications and propose a Network-on-Chip (NoC)-based architecture for supporting a heterogeneous set of applications. Observing the lack of tools to support such research, we provide a tool that can be used to evaluate NoC-based switch architectures.Open Acces

    Frequent Itemset Mining for Big Data

    Get PDF
    Traditional data mining tools, developed to extract actionable knowledge from data, demonstrated to be inadequate to process the huge amount of data produced nowadays. Even the most popular algorithms related to Frequent Itemset Mining, an exploratory data analysis technique used to discover frequent items co-occurrences in a transactional dataset, are inefficient with larger and more complex data. As a consequence, many parallel algorithms have been developed, based on modern frameworks able to leverage distributed computation in commodity clusters of machines (e.g., Apache Hadoop, Apache Spark). However, frequent itemset mining parallelization is far from trivial. The search-space exploration, on which all the techniques are based, is not easily partitionable. Hence, distributed frequent itemset mining is a challenging problem and an interesting research topic. In this context, our main contributions consist in an (i) exhaustive theoretical and experimental analysis of the best-in-class approaches, whose outcomes and open issues motivated (ii) the development of a distributed high-dimensional frequent itemset miner. The dissertation introduces also a data mining framework which takes strongly advantage of distributed frequent itemset mining for the extraction of a specific type of itemsets (iii). The theoretical analysis highlights the challenges related to the distribution and the preliminary partitioning of the frequent itemset mining problem (i.e. the search-space exploration) describing the most adopted distribution strategies. The extensive experimental campaign, instead, compares the expectations related to the algorithmic choices against the actual performances of the algorithms. We run more than 300 experiments in order to evaluate and discuss the performances of the algorithms with respect to different real life use cases and data distributions. The outcomes of the review is that no algorithm is universally superior and performances are heavily skewed by the data distribution. Moreover, we were able to identify a concrete lack as regards frequent pattern extraction within high-dimensional use cases. For this reason, we have developed our own distributed high-dimensional frequent itemset miner based on Apache Hadoop. The algorithm splits the search-space exploration into independent sub-tasks. However, since the exploration strongly benefits of a full-knowledge of the problem, we introduced an interleaving synchronization phase. The result is a trade-off between the benefits of a centralized state and the ones related to the additional computational power due to parallelism. The experimental benchmarks, performed on real-life high-dimensional use cases, show the efficiency of the proposed approach in terms of execution time, load balancing and reliability to memory issues. Finally, the dissertation introduces a data mining framework in which distributed itemset mining is a fundamental component of the processing pipeline. The aim of the framework is the extraction of a new type of itemsets, called misleading generalized itemsets

    Secure and efficient data storage operations by using intelligent classification technique and RSA algorithm in IoT-based cloud computing

    Get PDF
    In mobile cloud services, smartphones may depend on IoT-based cloud infrastructure and information storage tools to conduct technical errands, such as quest, information processing, and combined networks. In addition to traditional finding institutions, the smart IoT-cloud often upgrades the normal impromptu structure by treating mobile devices as corporate hubs, e.g., by identifying institutions. This has many benefits from the start, with several significant problems to be overcome in order to enhance the unwavering consistency of the cloud environment while Internet of things connects and improves decision support system of the entire network. In fact, similar issues apply to monitor loading, resistance, and other security risks in the cloud state. Right now, we are looking at changed arrangement procedures in MATLAB utilizing cardiovascular failure information and afterward protecting that information with the assistance of RSA calculation in mobile cloud. The calculations tried are SVM, RF, DT, NB, and KNN. In the outcome, the order strategies that have the best exactness result to test respiratory failure information will be recommended for use for enormous scope information. Instead, the collected data will be transferred to the mobile cloud for preservation using the RSA encryption algorithm

    Performance Evaluation of Smart Decision Support Systems on Healthcare

    Get PDF
    Medical activity requires responsibility not only from clinical knowledge and skill but also on the management of an enormous amount of information related to patient care. It is through proper treatment of information that experts can consistently build a healthy wellness policy. The primary objective for the development of decision support systems (DSSs) is to provide information to specialists when and where they are needed. These systems provide information, models, and data manipulation tools to help experts make better decisions in a variety of situations. Most of the challenges that smart DSSs face come from the great difficulty of dealing with large volumes of information, which is continuously generated by the most diverse types of devices and equipment, requiring high computational resources. This situation makes this type of system susceptible to not recovering information quickly for the decision making. As a result of this adversity, the information quality and the provision of an infrastructure capable of promoting the integration and articulation among different health information systems (HIS) become promising research topics in the field of electronic health (e-health) and that, for this same reason, are addressed in this research. The work described in this thesis is motivated by the need to propose novel approaches to deal with problems inherent to the acquisition, cleaning, integration, and aggregation of data obtained from different sources in e-health environments, as well as their analysis. To ensure the success of data integration and analysis in e-health environments, it is essential that machine-learning (ML) algorithms ensure system reliability. However, in this type of environment, it is not possible to guarantee a reliable scenario. This scenario makes intelligent SAD susceptible to predictive failures, which severely compromise overall system performance. On the other hand, systems can have their performance compromised due to the overload of information they can support. To solve some of these problems, this thesis presents several proposals and studies on the impact of ML algorithms in the monitoring and management of hypertensive disorders related to pregnancy of risk. The primary goals of the proposals presented in this thesis are to improve the overall performance of health information systems. In particular, ML-based methods are exploited to improve the prediction accuracy and optimize the use of monitoring device resources. It was demonstrated that the use of this type of strategy and methodology contributes to a significant increase in the performance of smart DSSs, not only concerning precision but also in the computational cost reduction used in the classification process. The observed results seek to contribute to the advance of state of the art in methods and strategies based on AI that aim to surpass some challenges that emerge from the integration and performance of the smart DSSs. With the use of algorithms based on AI, it is possible to quickly and automatically analyze a larger volume of complex data and focus on more accurate results, providing high-value predictions for a better decision making in real time and without human intervention.A atividade médica requer responsabilidade não apenas com base no conhecimento e na habilidade clínica, mas também na gestão de uma enorme quantidade de informações relacionadas ao atendimento ao paciente. É através do tratamento adequado das informações que os especialistas podem consistentemente construir uma política saudável de bem-estar. O principal objetivo para o desenvolvimento de sistemas de apoio à decisão (SAD) é fornecer informações aos especialistas onde e quando são necessárias. Esses sistemas fornecem informações, modelos e ferramentas de manipulação de dados para ajudar os especialistas a tomar melhores decisões em diversas situações. A maioria dos desafios que os SAD inteligentes enfrentam advêm da grande dificuldade de lidar com grandes volumes de dados, que é gerada constantemente pelos mais diversos tipos de dispositivos e equipamentos, exigindo elevados recursos computacionais. Essa situação torna este tipo de sistemas suscetível a não recuperar a informação rapidamente para a tomada de decisão. Como resultado dessa adversidade, a qualidade da informação e a provisão de uma infraestrutura capaz de promover a integração e a articulação entre diferentes sistemas de informação em saúde (SIS) tornam-se promissores tópicos de pesquisa no campo da saúde eletrônica (e-saúde) e que, por essa mesma razão, são abordadas nesta investigação. O trabalho descrito nesta tese é motivado pela necessidade de propor novas abordagens para lidar com os problemas inerentes à aquisição, limpeza, integração e agregação de dados obtidos de diferentes fontes em ambientes de e-saúde, bem como sua análise. Para garantir o sucesso da integração e análise de dados em ambientes e-saúde é importante que os algoritmos baseados em aprendizagem de máquina (AM) garantam a confiabilidade do sistema. No entanto, neste tipo de ambiente, não é possível garantir um cenário totalmente confiável. Esse cenário torna os SAD inteligentes suscetíveis à presença de falhas de predição que comprometem seriamente o desempenho geral do sistema. Por outro lado, os sistemas podem ter seu desempenho comprometido devido à sobrecarga de informações que podem suportar. Para tentar resolver alguns destes problemas, esta tese apresenta várias propostas e estudos sobre o impacto de algoritmos de AM na monitoria e gestão de transtornos hipertensivos relacionados com a gravidez (gestação) de risco. O objetivo das propostas apresentadas nesta tese é melhorar o desempenho global de sistemas de informação em saúde. Em particular, os métodos baseados em AM são explorados para melhorar a precisão da predição e otimizar o uso dos recursos dos dispositivos de monitorização. Ficou demonstrado que o uso deste tipo de estratégia e metodologia contribui para um aumento significativo do desempenho dos SAD inteligentes, não só em termos de precisão, mas também na diminuição do custo computacional utilizado no processo de classificação. Os resultados observados buscam contribuir para o avanço do estado da arte em métodos e estratégias baseadas em inteligência artificial que visam ultrapassar alguns desafios que advêm da integração e desempenho dos SAD inteligentes. Como o uso de algoritmos baseados em inteligência artificial é possível analisar de forma rápida e automática um volume maior de dados complexos e focar em resultados mais precisos, fornecendo previsões de alto valor para uma melhor tomada de decisão em tempo real e sem intervenção humana

    A Review of Rule Learning Based Intrusion Detection Systems and Their Prospects in Smart Grids

    Get PDF
    • …
    corecore