619 research outputs found
A scalable approach to fuzzy rough nearest neighbour classification with ordered weighted averaging operators
Fuzzy rough sets have been successfully applied in classification tasks, in particular in combination with OWA operators. There has been a lot of research into adapting algorithms for use with Big Data through parallelisation, but no concrete strategy exists to design a Big Data fuzzy rough sets based classifier. Existing Big Data approaches use fuzzy rough sets for feature and prototype selection, and have often not involved very large datasets. We fill this gap by presenting the first Big Data extension of an algorithm that uses fuzzy rough sets directly to classify test instances, a distributed implementation of FRNN-OWA in Apache Spark. Through a series of systematic tests involving generated datasets, we demonstrate that it can achieve a speedup effectively equal to the number of computing cores used, meaning that it can scale to arbitrarily large datasets
An Intelligent Decision Support System for Business IT Security Strategy
Cyber threat intelligence (CTI) is an emerging approach to improve cyber security of
business IT environment. It has information of an a ected business IT context. CTI
sharing tools are available for subscribers, and CTI feeds are increasingly available.
If another business IT context is similar to a CTI feed context, the threat described
in the CTI feed might also take place in the business IT context. Businesses can
take proactive defensive actions if relevant CTI is identi ed. However, a challenge is
how to develop an e ective connection strategy for CTI onto business IT contexts.
Businesses are still insu ciently using CTI because not all of them have su cient
knowledge from domain experts. Moreover, business IT contexts vary over time.
When the business IT contextual states have changed, the relevant CTI might be no
longer appropriate and applicable. Another challenge is how a connection strategy
has the ability to adapt to the business IT contextual changes.
To ll the gap, in this Ph.D project, a dynamic connection strategy for CTI onto
business IT contexts is proposed and the strategy is instantiated to be a dynamic
connection rule assembly system. The system can identify relevant CTI for a business
IT context and can modify its internal con gurations and structures to adapt
to the business IT contextual changes.
This thesis introduces the system development phases from design to delivery,
and the contributions to knowledge are explained as follows.
A hybrid representation of the dynamic connection strategy is proposed to generalise
and interpret the problem domain and the system development. The representation
uses selected computational intelligence models and software development
models.
In terms of the computational intelligence models, a CTI feed context and a
business IT context are generalised to be the same type, i.e., context object. Grey
number model is selected to represent the attribute values of context objects. Fuzzy
sets are used to represent the context objects, and linguistic densities of the attribute
values of context objects are reasoned. To assemble applicable connection
knowledge, the system constructs a set of connection objects based on the context
objects and uses rough set operations to extract applicable connection objects that
contain the connection knowledge.
Furthermore, to adapt to contextual changes, a rough set based incremental
updating approach with multiple operations is developed to incrementally update
the approximations. A set of propositions are proposed to describe how the system
changes based on the previous states and internal structures of the system, and their
complexities and e ciencies are analysed.
In terms of the software development models, some uni ed modelling language
(UML) models are selected to represent the system in design phase. Activity diagram
is used to represent the business process of the system. Use case diagram is used to
represent the human interactions with the system. Class diagram is used to represent
the internal components and relationships between them. Using the representation,
developers can develop a prototype of the system rapidly.
Using the representation, an application of the system is developed using mainstream
software development techniques. RESTful software architecture is used
for the communication of the business IT contextual information and the analysis
results using CTI between the server and the clients. A script based method is
deployed in the clients to collect the contextual information. Observer pattern and
a timer are used for the design and development of the monitor-trigger mechanism.
In summary, the representation generalises real-world cases in the problem domain
and interprets the system data. A speci c business can initialise an instance of
the representation to be a speci c system based on its IT context and CTI feeds, and
the knowledge assembled by the system can be used to identify relevant CTI feeds.
From the relevant CTI data, the system locates and retrieves the useful information
that can inform security decisions and then sends it to the client users. When the
system needs to modify itself to adapt to the business IT contextual changes, the
system can invoke the corresponding incremental updating functions and avoid a
time-consuming re-computation. With this updating strategy, the application can
provide its users in the client side with timely support and useful information that
can inform security decisions using CTI
Active Sample Selection Based Incremental Algorithm for Attribute Reduction with Rough Sets
Attribute reduction with rough sets is an effective technique for obtaining a compact and informative attribute set from a given dataset. However, traditional algorithms have no explicit provision for handling dynamic datasets where data present themselves in successive samples. Incremental algorithms for attribute reduction with rough sets have been recently introduced to handle dynamic datasets with large samples, though they have high complexity in time and space. To address the time/space complexity issue of the algorithms, this paper presents a novel incremental algorithm for attribute reduction with rough sets based on the adoption of an active sample selection process and an insight into the attribute reduction process. This algorithm first decides whether each incoming sample is useful with respect to the current dataset by the active sample selection process. A useless sample is discarded while a useful sample is selected to update a reduct. At the arrival of a useful sample, the attribute reduction process is then employed to guide how to add and/or delete attributes in the current reduct. The two processes thus constitute the theoretical framework of our algorithm. The proposed algorithm is finally experimentally shown to be efficient in time and space
EEG-Based Biometric Authentication Modelling Using Incremental Fuzzy-Rough Nearest Neighbour Technique
This paper proposes an Incremental Fuzzy-Rough Nearest Neighbour (IncFRNN) technique for biometric authentication modelling using feature extracted visual evoked. Only small training set is needed for model initialisation. The embedded heuristic update method adjusts the knowledge granules incrementally to maintain all representative electroencephalogram (EEG) signal patterns and eliminate those rarely used. It reshapes the personalized knowledge granules through insertion and deletion of a test object, based on similarity measures. A predefined window size can be used to reduce the overall processing time. This proposed algorithm was verified with test data from 37 healthy subjects. Signal pre-processing steps on segmentation, filtering and artefact rejection were carried out to improve the data quality before model building. The experimental paradigm was designed in three different conditions to evaluate the authentication performance of the IncFRNN technique against the benchmarked incremental K-Nearest Neighbour (KNN) technique. The performance was measured in terms of accuracy, area under the Receiver Operating Characteristic (ROC) curve (AUC) and Cohen's Kappa coefficient. The proposed IncFRNN technique is proven to be statistically better than the KNN technique in the controlled window size environment. Future work will focus on the use of dynamic data features to improve the robustness of the proposed model
Selecting Informative Features with Fuzzy-Rough Sets and its Application for Complex Systems Monitoring
One of the main obstacles facing current intelligent pattern recognition appli-cations is that of dataset dimensionality. To enable these systems to be effective, a redundancy-removing step is usually carried out beforehand. Rough Set Theory (RST) has been used as such a dataset pre-processor with much success, however it is reliant upon a crisp dataset; important information may be lost as a result of quantization of the underlying numerical features. This paper proposes a feature selection technique that employs a hybrid variant of rough sets, fuzzy-rough sets, to avoid this information loss. The current work retains dataset semantics, allowing for the creation of clear, readable fuzzy models. Experimental results, of applying the present work to complex systems monitoring, show that fuzzy-rough selection is more powerful than conventional entropy-based, PCA-based and random-based methods. Key words: feature selection; feature dependency; fuzzy-rough sets; reduct search; rule induction; systems monitoring.
- …