7 research outputs found
Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications
Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies
Managing malicious transactions in mobile database systems
Title from PDF of title page, viewed on March 15, 2013Thesis advisor: Vijay KumarVitaIncludes bibliographic references (p. 53-55)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2012Database security is one of the most important issues for any organization,
especially for financial institutions such as banks. Protecting database from external
threats is relatively easier and a number of effective security schemes are available to
organizations. Unfortunately, this is not so in the case of threats from insiders. Existing
security schemes for such threats are some variation of external schemes that are not
able to provide desirable security level. As a result, still authorized users (insiders)
manage to misuse their privileges for fulfilling their malicious intent. It is a fact that
most external security breaches succeed mainly with the help of insiders. An example
for an insider is the Enron scandal of 2001 which led to bankruptcy of Enron
Corporation. The firm was widely regarded as one of the most innovative, fastest
growing and best managed business in the United States. When Enron filed for
bankruptcy its share prices fall from US1 causing a loss of nearly 63.4 billion in assets
made it the largest corporate bankruptcy in American history at that time. Existing security policies are inadequate to prevent the attacks from insiders.
Current database protections mechanisms do not fully protect occurrence of these
malicious transactions. These requires human intervention in some form or other to
detect malicious transactions. In a database, a transaction can affect the execution of the
subsequesnt transactions thereby spreading the damage and hence making the attack
recovery more complex. The problem of malicious attack becomes more pronounced
when we are dealing with mobile database systems. This thesis proposes a solution to mitigate insider attack by identifying such
malicious transactions. It develops a formal framework for characterizing mobile
transaction by identifying essential components like order of data access, order of
operations and user profile.Introduction -- Mobile database system -- Research problem -- Solution and scheme -- Simulation and results -- Future work -- Conclusio
Cyberthreats, Attacks and Intrusion Detection in Supervisory Control and Data Acquisition Networks
Supervisory Control and Data Acquisition (SCADA) systems are computer-based process control systems that interconnect and monitor remote physical processes. There have been many real world documented incidents and cyber-attacks affecting SCADA systems, which clearly illustrate critical infrastructure vulnerabilities. These reported incidents demonstrate that cyber-attacks against SCADA systems might produce a variety of financial damage and harmful events to humans and their environment. This dissertation documents four contributions towards increased security for SCADA systems. First, a set of cyber-attacks was developed. Second, each attack was executed against two fully functional SCADA systems in a laboratory environment; a gas pipeline and a water storage tank. Third, signature based intrusion detection system rules were developed and tested which can be used to generate alerts when the aforementioned attacks are executed against a SCADA system. Fourth, a set of features was developed for a decision tree based anomaly based intrusion detection system. The features were tested using the datasets developed for this work. This dissertation documents cyber-attacks on both serial based and Ethernet based SCADA networks. Four categories of attacks against SCADA systems are discussed: reconnaissance, malicious response injection, malicious command injection and denial of service. In order to evaluate performance of data mining and machine learning algorithms for intrusion detection systems in SCADA systems, a network dataset to be used for benchmarking intrusion detection systemswas generated. This network dataset includes different classes of attacks that simulate different attack scenarios on process control systems. This dissertation describes four SCADA network intrusion detection datasets; a full and abbreviated dataset for both the gas pipeline and water storage tank systems. Each feature in the dataset is captured from network flow records. This dataset groups two different categories of features that can be used as input to an intrusion detection system. First, network traffic features describe the communication patterns in a SCADA system. This research developed both signature based IDS and anomaly based IDS for the gas pipeline and water storage tank serial based SCADA systems. The performance of both types of IDS were evaluates by measuring detection rate and the prevalence of false positives
A new framework for clustering
The difficulty of clustering and the variety of clustering methods suggest the need for a theoretical study of clustering. Using the idea of a standard statistical framework, we propose a new framework for clustering.
For a well-defined clustering goal we assume that the data to be clustered come from an underlying distribution and we aim to find a high-density cluster tree. We regard this tree as a parameter of interest for the underlying distribution. However, it is not obvious how to determine a connected subset in a discrete distribution whose support is located in a Euclidean space. Building a cluster tree for such a distribution is an open problem and presents interesting conceptual and computational challenges. We solve this problem using graph-based approaches and further parameterize clustering using the high-density cluster tree and its extension.
Motivated by the connection between clustering outcomes and graphs, we propose a graph family framework. This framework plays an important role in our clustering framework. A direct application of the graph family framework is a new cluster-tree distance measure. This distance measure can be written as an inner product or kernel. It makes our clustering framework able to perform statistical assessment of clustering via simulation. Other applications such as a method for integrating partitions into a cluster tree and methods for cluster tree averaging and bagging are also derived from the graph family framework