4,119 research outputs found

    Neural networks to intrusion detection

    Get PDF
    Recent research indicates a lot of attempts to create an Intrusion Detection System that is capable of learning and recognizing attacks it faces for the first time. Benchmark datasets were created by the MIT Lincoln Lab and by the International Knowledge Discovery and Data Mining group (KDD). A few competitions were held and many systems developed. The overall preference was given to Expert Systems that were based on Decision Making Tree algorithms. This work is devoted to the problem of Neural Networks as means of Intrusion Detection. After multiple techniques and methodologies are investigated, we show that properly trained Neural Networks are capable of fast recognition and classification of different attacks. The advantage of the taken approach allows us to demonstrate the superiority of the Neural Networks over the systems that were created by the winner of the KDD Cups competition and later researchers due to their capability to recognize an attack, to differentiate one attack from another, i.e. classify attacks, and, the most important, to detect new attacks that were not included into the training set. The results obtained through simulations indicate that it is possible to recognize attacks that the Intrusion Detection System never faced before on an acceptably high level

    Advancing efficiency analysis using data envelopment analysis: the case of German health care and higher education sectors

    Get PDF
    The main goal of this dissertation is to investigate the advancement of efficiency analysis through DEA. This is practically followed by the case of German health care and higher education organizations. Towards achieving the goal, this dissertation is driven by the following research questions: 1.How the quality of the different DEA models can be evaluated? 2.How can hospitals’ efficiency be reliably measured in light of the pitfalls of DEA applications? 3.In measuring teaching hospital efficiency, what should be considered? 4.At the crossroads of internationalization, how can we analyze university efficiency? Both the higher education and the health care industries are characterized by similar missions, organizational structures, and resource requirements. There has been increasing pressure on universities and health care delivery systems around the world to improve their performance during the past decade. That is, to bring costs under control while ensuring high-quality services and better public accessibility. Achieving superior performance in higher education and health care is a challenging and intractable issue. Although many statistical methods have been used, DEA is increasingly used by researchers to find best practices and evaluate inefficiencies in productivity. By comparing DMU behavior to actual behavior, DEA produces best practices frontier rather than central tendencies, that is, the best attainable results in practice. The dissertation primarily focuses on the advancement of DEA models primarily for use in hospitals and universities. In Section 1 of this dissertation, the significance of hospital and university efficiency measurement, as well as the fundamentals of DEA models, are thoroughly described. The main research questions that drive this dissertation are then outlined after a brief review of the considerations that must be taken into account when employing DEA. Section 2 consists of a summary of the four contributions. Each contribution is presented in its entirety in the appendices. According to these contributions, Section 3 answers and critically discusses the research questions posed. Using the Translog production function, a sophisticated data generation process is developed in the first contribution based on a Monte Carlo simulation. Thus, we can generate a wide range of diverse scenarios that behave under VRS. Using the artificially generated DMUs, different DEA models are used to calculate the DEA efficiency scores. The quality of efficiency estimates derived from DEA models is measured based on five performance indicators, which are then aggregated into two benchmark-value and benchmark-rank indicators. Several hypothesis tests are also conducted to analyze the distributions of the efficiency scores of each scenario. In this way, it is possible to make a general statement regarding the parameters that negatively or positively affect the quality of DEA estimations. In comparison with the most commonly used BCC model, AR and SBM DEA models perform much better under VRS. All DEA applications will be affected by this finding. In fact, the relevance of these results for university and health care DEA applications is evident in the answers to research questions 2 and 4, where the importance of using sophisticated models is stressed. To be able to handle violations of the assumptions in DEA, we need some complementary approaches when units operate in different environments. By combining complementary modeling techniques, Contribution 2 aims to develop and evaluate a framework for analyzing hospital performance. Machin learning techniques are developed to perform cluster analysis, heterogeneity, and best practice analyses. A large dataset consisting of more than 1,100 hospitals in Germany illustrates the applicability of the integrated framework. In addition to predicting the best performance, the framework can be used to determine whether differences in relative efficiency scores are due to heterogeneity in inputs and outputs. In this contribution, an approach to enhancing the reliability of DEA performance analyses of hospital markets is presented as part of the answer to research question 2. In real-world situations, integer-valued amounts and flexible measures pose two principal challenges. The traditional DEA models do not address either challenge. Contribution 3 proposes an extended SBM DEA model that accommodates such data irregularities and complexity. Further, an alternative DEA model is presented that calculates efficiency by directly addressing slacks. The proposed models are further applied to 28 universities hospitals in Germany. The majority of inefficiencies can be attributed to “third-party funding income” received by university hospitals from research-granting agencies. In light of the fact that most research-granting organizations prefer to support university hospitals with the greatest impact, it seems reasonable to conclude that targeting research missions may enhance the efficiency of German university hospitals. This finding contributes to answering research question 3. University missions are heavily influenced by internationalization, but the efficacy of this strategy and its relationship to overall university efficiency are largely unknown. Contribution 4 fills this gap by implementing a three-stage mathematical method to explore university internationalization and university business models. The approach is based on SBM DEA methods and regression/correlation analyses and is designed to determine the relative internationalization and relative efficiency of German universities and analyze the influence of environmental factors on them. The key question 4 posed can now be answered. It has been found that German universities are relatively efficient at both levels of analysis, but there is no direct correlation between them. In addition, the results show that certain locational factors do not significantly affect the university’s efficiency. For policymakers, it is important to point out that efficiency modeling methodology is highly contested and in its infancy. DEA efficiency results are affected by many technical judgments for which there is little guidance on best practices. In many cases, these judgments have more to do with political than technical aspects (such as output choices). This suggests a need for a discussion between analysts and policymakers. In a nutshell, there is no doubt that DEA models can contribute to any health care or university mission. Despite the limitations we have discussed previously to ensure that they are used appropriately, these methods still offer powerful insights into organizational performance. Even though these techniques are widely popular, they are seldom used in real clinical (rather than academic) settings. The only purpose of analytical tools such as DEA is to inform rather than determine regulatory judgments. They, therefore, have to be an essential part of any competent regulator’s analytical arsenal

    A reduced labeled samples (RLS) framework for classification of imbalanced concept-drifting streaming data.

    Get PDF
    Stream processing frameworks are designed to process the streaming data that arrives in time. An example of such data is stream of emails that a user receives every day. Most of the real world data streams are also imbalanced as is in the stream of emails, which contains few spam emails compared to a lot of legitimate emails. The classification of the imbalanced data stream is challenging due to the several reasons: First of all, data streams are huge and they can not be stored in the memory for one time processing. Second, if the data is imbalanced, the accuracy of the majority class mostly dominates the results. Third, data streams are changing over time, and that causes degradation in the model performance. Hence the model should get updated when such changes are detected. Finally, the true labels of the all samples are not available immediately after classification, and only a fraction of the data is possible to get labeled in real world applications. That is because the labeling is expensive and time consuming. In this thesis, a framework for modeling the streaming data when the classes of the data samples are imbalanced is proposed. This framework is called Reduced Labeled Samples (RLS). RLS is a chunk based learning framework that builds a model using partially labeled data stream, when the characteristics of the data change. In RLS, a fraction of the samples are labeled and are used in modeling, and the performance is not significantly different from that of the 100% labeling. RLS maintains an ensemble of classifiers to boost the performance. RLS uses the information from labeled data in a supervised fashion, and also is extended to use the information from unlabeled data in a semi supervised fashion. RLS addresses both binary and multi class partially labeled data stream and the results show the basis of RLS is effective even in the context of multi class classification problems. Overall, the RLS is shown to be an effective framework for processing imbalanced and partially labeled data streams

    Cancer characterization and feature set extraction by discriminative margin clustering

    Get PDF
    BACKGROUND: A central challenge in the molecular diagnosis and treatment of cancer is to define a set of molecular features that, taken together, distinguish a given cancer, or type of cancer, from all normal cells and tissues. RESULTS: Discriminative margin clustering is a new technique for analyzing high dimensional quantitative datasets, specially applicable to gene expression data from microarray experiments related to cancer. The goal of the analysis is find highly specialized sub-types of a tumor type which are similar in having a small combination of genes which together provide a unique molecular portrait for distinguishing the sub-type from any normal cell or tissue. Detection of the products of these genes can then, in principle, provide a basis for detection and diagnosis of a cancer, and a therapy directed specifically at the distinguishing constellation of molecular features can, in principle, provide a way to eliminate the cancer cells, while minimizing toxicity to any normal cell. CONCLUSIONS: The new methodology yields highly specialized tumor subtypes which are similar in terms of potential diagnostic markers

    Modeling and Recognition of Smart Grid Faults by a Combined Approach of Dissimilarity Learning and One-Class Classification

    Full text link
    Detecting faults in electrical power grids is of paramount importance, either from the electricity operator and consumer viewpoints. Modern electric power grids (smart grids) are equipped with smart sensors that allow to gather real-time information regarding the physical status of all the component elements belonging to the whole infrastructure (e.g., cables and related insulation, transformers, breakers and so on). In real-world smart grid systems, usually, additional information that are related to the operational status of the grid itself are collected such as meteorological information. Designing a suitable recognition (discrimination) model of faults in a real-world smart grid system is hence a challenging task. This follows from the heterogeneity of the information that actually determine a typical fault condition. The second point is that, for synthesizing a recognition model, in practice only the conditions of observed faults are usually meaningful. Therefore, a suitable recognition model should be synthesized by making use of the observed fault conditions only. In this paper, we deal with the problem of modeling and recognizing faults in a real-world smart grid system, which supplies the entire city of Rome, Italy. Recognition of faults is addressed by following a combined approach of multiple dissimilarity measures customization and one-class classification techniques. We provide here an in-depth study related to the available data and to the models synthesized by the proposed one-class classifier. We offer also a comprehensive analysis of the fault recognition results by exploiting a fuzzy set based reliability decision rule

    Detecting and reducing heterogeneity of error in acoustic classification

    Get PDF
    Passive acoustic monitoring can be an effective method for monitoring species, allowing the assembly of large audio datasets, removing logistical constraints in data collection and reducing anthropogenic monitoring disturbances. However, the analysis of large acoustic datasets is challenging and fully automated machine learning processes are rarely developed or implemented in ecological field studies. One of the greatest uncertainties hindering the development of these methods is spatial generalisability—can an algorithm trained on data from one place be used elsewhere? We demonstrate that heterogeneity of error across space is a problem that could go undetected using common classification accuracy metrics. Second, we develop a method to assess the extent of heterogeneity of error in a random forest classification model for six Amazonian bird species. Finally, we propose two complementary ways to reduce heterogeneity of error, by (i) accounting for it in the thresholding process and (ii) using a secondary classifier that uses contextual data. We found that using a thresholding approach that accounted for heterogeneity of precision error reduced the coefficient of variation of the precision score from a mean of 0.61 ± 0.17 (SD) to 0.41 ± 0.25 in comparison to the initial classification with threshold selection based on F-score. The use of a secondary, contextual classification with thresholding selection accounting for heterogeneity of precision reduced it further still, to 0.16 ± 0.13, and was significantly lower than the initial classification in all but one species. Mean average precision scores increased, from 0.66 ± 0.4 for the initial classification, to 0.95 ± 0.19, a significant improvement for all species. We recommend assessing—and if necessary correcting for—heterogeneity of precision error when using automated classification on acoustic data to quantify species presence as a function of an environmental, spatial or temporal predictor variable

    Energy Efficient Geo-Localization for a Wearable Device

    Get PDF
    During the last decade there has been a surge of smart devices on markets around the world. The latest trend is devices that can be worn, so called wearable devices. As for other mobile devices, effective localization are of great interest for many different applications of these devices. However they are small and usually set a high demand on energy efficiency, which makes traditional localization techniques unfeasible for them to use. In this thesis we investigate and succeed in providing a localization solution for a wearable camera that is both accurate and energy efficient. Localization is done through a combination of Wi-Fi and GPS positioning with a mean accuracy of 27 m. Furthermore we utilize an activity recognition algorithm with data from an accelerometer to decide when a new position estimate should be obtained. Our evaluation of the algorithm shows that by applying this method, 83.2 % of the position estimates can be avoided with an insignificant loss in accuracy
    corecore