7 research outputs found

    Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications

    Get PDF
    Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies

    Managing malicious transactions in mobile database systems

    Get PDF
    Title from PDF of title page, viewed on March 15, 2013Thesis advisor: Vijay KumarVitaIncludes bibliographic references (p. 53-55)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2012Database security is one of the most important issues for any organization, especially for financial institutions such as banks. Protecting database from external threats is relatively easier and a number of effective security schemes are available to organizations. Unfortunately, this is not so in the case of threats from insiders. Existing security schemes for such threats are some variation of external schemes that are not able to provide desirable security level. As a result, still authorized users (insiders) manage to misuse their privileges for fulfilling their malicious intent. It is a fact that most external security breaches succeed mainly with the help of insiders. An example for an insider is the Enron scandal of 2001 which led to bankruptcy of Enron Corporation. The firm was widely regarded as one of the most innovative, fastest growing and best managed business in the United States. When Enron filed for bankruptcy its share prices fall from US90to90 to 1 causing a loss of nearly 11billiondollartoitsstakeholders.Financialofficersandexecutivesmisledoutsideinvestors,auditorsandEnron′sboardofdirectorsaboutcorporation′snetincomeandliabilities.Theseinsiderskeptreportedincomeandreportedcashflowup,assetvalueinflatedandliabilitiesoffthebooktomeetWallStreetexpectations.Enron′s11 billion dollar to its stakeholders. Financial officers and executives misled outside investors, auditors and Enron's board of directors about corporation's net income and liabilities. These insiders kept reported income and reported cash flow up, asset value inflated and liabilities off the book to meet Wall Street expectations. Enron's 63.4 billion in assets made it the largest corporate bankruptcy in American history at that time. Existing security policies are inadequate to prevent the attacks from insiders. Current database protections mechanisms do not fully protect occurrence of these malicious transactions. These requires human intervention in some form or other to detect malicious transactions. In a database, a transaction can affect the execution of the subsequesnt transactions thereby spreading the damage and hence making the attack recovery more complex. The problem of malicious attack becomes more pronounced when we are dealing with mobile database systems. This thesis proposes a solution to mitigate insider attack by identifying such malicious transactions. It develops a formal framework for characterizing mobile transaction by identifying essential components like order of data access, order of operations and user profile.Introduction -- Mobile database system -- Research problem -- Solution and scheme -- Simulation and results -- Future work -- Conclusio

    Cyberthreats, Attacks and Intrusion Detection in Supervisory Control and Data Acquisition Networks

    Get PDF
    Supervisory Control and Data Acquisition (SCADA) systems are computer-based process control systems that interconnect and monitor remote physical processes. There have been many real world documented incidents and cyber-attacks affecting SCADA systems, which clearly illustrate critical infrastructure vulnerabilities. These reported incidents demonstrate that cyber-attacks against SCADA systems might produce a variety of financial damage and harmful events to humans and their environment. This dissertation documents four contributions towards increased security for SCADA systems. First, a set of cyber-attacks was developed. Second, each attack was executed against two fully functional SCADA systems in a laboratory environment; a gas pipeline and a water storage tank. Third, signature based intrusion detection system rules were developed and tested which can be used to generate alerts when the aforementioned attacks are executed against a SCADA system. Fourth, a set of features was developed for a decision tree based anomaly based intrusion detection system. The features were tested using the datasets developed for this work. This dissertation documents cyber-attacks on both serial based and Ethernet based SCADA networks. Four categories of attacks against SCADA systems are discussed: reconnaissance, malicious response injection, malicious command injection and denial of service. In order to evaluate performance of data mining and machine learning algorithms for intrusion detection systems in SCADA systems, a network dataset to be used for benchmarking intrusion detection systemswas generated. This network dataset includes different classes of attacks that simulate different attack scenarios on process control systems. This dissertation describes four SCADA network intrusion detection datasets; a full and abbreviated dataset for both the gas pipeline and water storage tank systems. Each feature in the dataset is captured from network flow records. This dataset groups two different categories of features that can be used as input to an intrusion detection system. First, network traffic features describe the communication patterns in a SCADA system. This research developed both signature based IDS and anomaly based IDS for the gas pipeline and water storage tank serial based SCADA systems. The performance of both types of IDS were evaluates by measuring detection rate and the prevalence of false positives

    Multimodal Journey Planning and Assignment in Public Transportation Networks

    Get PDF

    Étude exploratoire d'outils pour le Data Mining

    Get PDF

    Étude comparative des algorithmes dédiés à la classification

    Get PDF

    A new framework for clustering

    Get PDF
    The difficulty of clustering and the variety of clustering methods suggest the need for a theoretical study of clustering. Using the idea of a standard statistical framework, we propose a new framework for clustering. For a well-defined clustering goal we assume that the data to be clustered come from an underlying distribution and we aim to find a high-density cluster tree. We regard this tree as a parameter of interest for the underlying distribution. However, it is not obvious how to determine a connected subset in a discrete distribution whose support is located in a Euclidean space. Building a cluster tree for such a distribution is an open problem and presents interesting conceptual and computational challenges. We solve this problem using graph-based approaches and further parameterize clustering using the high-density cluster tree and its extension. Motivated by the connection between clustering outcomes and graphs, we propose a graph family framework. This framework plays an important role in our clustering framework. A direct application of the graph family framework is a new cluster-tree distance measure. This distance measure can be written as an inner product or kernel. It makes our clustering framework able to perform statistical assessment of clustering via simulation. Other applications such as a method for integrating partitions into a cluster tree and methods for cluster tree averaging and bagging are also derived from the graph family framework
    corecore