14 research outputs found
Image Superresolution Reconstruction via Granular Computing Clustering
The problem of generating a superresolution (SR) image from a single low-resolution (LR) input image is addressed via granular computing clustering in the paper. Firstly, and the training images are regarded as SR image and partitioned into some SR patches, which are resized into LS patches, the training set is composed of the SR patches and the corresponding LR patches. Secondly, the granular computing (GrC) clustering is proposed by the hypersphere representation of granule and the fuzzy inclusion measure compounded by the operation between two granules. Thirdly, the granule set (GS) including hypersphere granules with different granularities is induced by GrC and used to form the relation between the LR image and the SR image by lasso. Experimental results showed that GrC achieved the least root mean square errors between the reconstructed SR image and the original image compared with bicubic interpolation, sparse representation, and NNLasso
Machine learning based data pre-processing for the purpose of medical data mining and decision support
Building an accurate and reliable model for prediction for different application domains, is one of the most significant challenges in knowledge discovery and data mining. Sometimes, improved data quality is itself the goal of the analysis, usually to improve processes in a production database and the designing of decision support. As medicine moves forward there is a need for sophisticated decision support systems that make use of data mining to support more orthodox knowledge engineering and Health Informatics practice. However, the real-life medical data rarely complies with the requirements of various data mining tools. It is often inconsistent, noisy, containing redundant attributes, in an unsuitable format, containing missing values and imbalanced with regards to the outcome class label.Many real-life data sets are incomplete, with missing values. In medical data mining the problem with missing values has become a challenging issue. In many clinical trials, the medical report pro-forma allow some attributes to be left blank, because they are inappropriate for some class of illness or the person providing the information feels that it is not appropriate to record the values for some attributes. The research reported in this thesis has explored the use of machine learning techniques as missing value imputation methods. The thesis also proposed a new way of imputing missing value by supervised learning. A classifier was used to learn the data patterns from a complete data sub-set and the model was later used to predict the missing values for the full dataset. The proposed machine learning based missing value imputation was applied on the thesis data and the results are compared with traditional Mean/Mode imputation. Experimental results show that all the machine learning methods which we explored outperformed the statistical method (Mean/Mode).The class imbalance problem has been found to hinder the performance of learning systems. In fact, most of the medical datasets are found to be highly imbalance in their class label. The solution to this problem is to reduce the gap between the minority class samples and the majority class samples. Over-sampling can be applied to increase the number of minority class sample to balance the data. The alternative to over-sampling is under-sampling where the size of majority class sample is reduced. The thesis proposed one cluster based under-sampling technique to reduce the gap between the majority and minority samples. Different under-sampling and over-sampling techniques were explored as ways to balance the data. The experimental results show that for the thesis data the new proposed modified cluster based under-sampling technique performed better than other class balancing techniques.In further research it is found that the class imbalance problem not only affects the classification performance but also has an adverse effect on feature selection. The thesis proposed a new framework for feature selection for class imbalanced datasets. The research found that, using the proposed framework the classifier needs less attributes to show high accuracy, and more attributes are needed if the data is highly imbalanced.The research described in the thesis contains the flowing four novel main contributions.a) Improved data mining methodology for mining medical datab) Machine learning based missing value imputation methodc) Cluster Based semi-supervised class balancing methodd) Feature selection framework for class imbalance datasetsThe performance analysis and comparative study show that the use of proposed method of missing value imputation, class balancing and feature selection framework can provide an effective approach to data preparation for building medical decision support
A fuzzy probabilistic inference methodology for constrained 3D human motion classification
Enormous uncertainties in unconstrained human motions lead to a fundamental challenge that many recognising algorithms have to face in practice: efficient and correct motion recognition is a demanding task, especially when human kinematic motions are subject to variations of execution in the spatial and temporal domains, heavily overlap with each other,and are occluded. Due to the lack of a good solution to these problems, many existing methods tend to be either effective but computationally intensive or efficient but vulnerable to misclassification. This thesis presents a novel inference engine for recognising occluded 3D human motion assisted by the recognition context. First, uncertainties are wrapped into a fuzzy membership function via a novel method called Fuzzy Quantile Generation which employs metrics derived from the probabilistic quantile function. Then, time-dependent and context-aware rules are produced via a genetic programming to smooth the qualitative outputs represented by fuzzy membership functions. Finally, occlusion in motion recognition is taken care of by introducing new procedures for feature selection and feature reconstruction. Experimental results demonstrate the effectiveness of the proposed framework on motion capture data from real boxers in terms of fuzzy membership generation, context-aware rule generation, and motion occlusion. Future work might involve the extension of Fuzzy Quantile Generation in order to automate the choice of a probability distribution, the enhancement of temporal pattern recognition with probabilistic paradigms, the optimisation of the occlusion module, and the adaptation of the present framework to different application domains.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Relative-fuzzy: a novel approach for handling complex ambiguity for software engineering of data mining models
There are two main defined classes of uncertainty namely: fuzziness and ambiguity, where ambiguity is ‘one-to-many’ relationship between syntax and semantic of a proposition. This definition seems that it ignores ‘many-to-many’ relationship ambiguity type of uncertainty. In this thesis, we shall use complex-uncertainty to term many-to-many relationship ambiguity type of uncertainty.
This research proposes a new approach for handling the complex ambiguity type of uncertainty that may exist in data, for software engineering of predictive Data Mining (DM) classification models. The proposed approach is based on Relative-Fuzzy Logic (RFL), a novel type of fuzzy logic. RFL defines a new formulation of the problem of ambiguity type of uncertainty in terms of States Of Proposition (SOP). RFL describes its membership (semantic) value by using the new definition of Domain of Proposition (DOP), which is based on the relativity principle as defined by possible-worlds logic.
To achieve the goal of proposing RFL, a question is needed to be answered, which is: how these two approaches; i.e. fuzzy logic and possible-world, can be mixed to produce a new membership value set (and later logic) that able to handle fuzziness and multiple viewpoints at the same time? Achieving such goal comes via providing possible world logic the ability to quantifying multiple viewpoints and also model fuzziness in each of these multiple viewpoints and expressing that in a new set of membership value.
Furthermore, a new architecture of Hierarchical Neural Network (HNN) called ML/RFL-Based Net has been developed in this research, along with a new learning algorithm and new recalling algorithm. The architecture, learning algorithm and recalling algorithm of ML/RFL-Based Net follow the principles of RFL. This new type of HNN is considered to be a RFL computation machine.
The ability of the Relative Fuzzy-based DM prediction model to tackle the problem of complex ambiguity type of uncertainty has been tested. Special-purpose Integrated Development Environment (IDE) software, which generates a DM prediction model for speech recognition, has been developed in this research too, which is called RFL4ASR. This special purpose IDE is an extension of the definition of the traditional IDE.
Using multiple sets of TIMIT speech data, the prediction model of type ML/RFL-Based Net has classification accuracy of 69.2308%. This accuracy is higher than the best achievements of WEKA data mining machines given the same speech data
Theory and Applications for Advanced Text Mining
Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields
Recommended from our members
Automatic detection and classification of leukaemia cells
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Today, there is a substantial number of software and research groups that focus on the development of image processing software to extract useful information from medical images, in order to assist and improve patient diagnosis. The work presented in this thesis is centred on processing of images of blood and bone marrow smears of patients suffering from leukaemia, a common type of cancer. In general, cancer is due to aberrant gene expression, which is caused by either mutations or epigenetic changes in DNA. Poor diet and unhealthy lifestyle may trigger or contribute to these changes, although the underlying mechanism is often unknown. Importantly, many cancer types including leukaemia are curable and patient survival and treatment can be improved, subject to prompt diagnosis. In particular, this study focuses on Acute Myeloid Leukaemia (AML), which can be of eight distinct types (M0 to M7), with the main objective to develop a methodology to automatically detect and classify leukaemia cells into one of the above types. The data was collected from the Department of Haematology, Universiti Sains Malaysia, in Malaysia. Three main methods, namely Cellular Automata, Heuristic Search and classification using Neural Networks are facilitated. In the case of Cellular Automata, an improved method based on the 8-neighbourhood and rules were developed to remove noise from images and estimate the radius of the potential blast cells contained in them. The proposed methodology selects the starting points, corresponding to potential blast cells, for the subsequent seeded heuristic search. The Seeded Heuristic employs a new fitness function for blast cell detection. Furthermore, the WEKA software is utilised for classification of blast cells and hence images, into AML subtypes. As a result accuracy of 97.22% was achieved in the classification of blasts into M3 and other AML subtypes. Finally, these algorithms are integrated into an automated system for image processing. In brief, the research presented in this thesis involves the use of advanced computational techniques for processing and classification of medical images, that is, images of blood samples from patients suffering from leukaemia.The Institute of Higher Education of Malaysia and the Universiti Sains Islam Malaysia (USIM)
Agents in the market place an exploratory study on using intelligent agents to trade financial instruments
Tese de doutoramento em InformáticaThis dissertation documents our exploratory research aimed at investigating the utilization of
intelligent agents in the development of automated financial trading strategies. In order to
demonstrate this potential use for agent technology, we propose a hybrid cognitive architecture
meant for the creation of autonomous agents capable of trading different types of financial
instruments. This architecture was used to implement 10 currency trading agents and 25 stock
trading agents. Their overall performance, evaluated according to the cumulative return and the
maximum drawdown metrics, was found to be acceptable in a reasonably long simulation period. In
order to improve this performance, we defined negotiation protocols that allowed the integration of
the 35 trading agents in a multi-agent system, which proved to be better suited for withstanding
sudden market events, due to the diversification of the investments. This system obtained very
promising results, and remains open to many obvious improvements. Our findings lead us to
conclude that there is indeed a place for intelligent agents in the financial industry; in particular,
they hold the potential to be employed in the establishment of investment companies where
software agents make all the trading decisions, with human intervention being relegated to simple
administrative tasks.Esta dissertação documenta um estudo exploratório destinado a investigar a utilização de agentes
inteligentes no desenvolvimento de estratégias de investimento financeiro automatizadas. Para
demonstrar este uso potencial para tecnologia de agentes, foi proposta uma arquitectura cognitiva
híbrida destinada à criação de agentes autónomos capazes de negociar diferentes tipos de
instrumentos financeiros. Esta arquitectura foi utilizada para implementar 10 agentes que
negoceiam pares cambiais, e 25 agentes que negoceiam acções. A performance global destes
agentes, avaliada de acordo com as métricas de retorno acumulado e drawdown máximo, foi
considerada aceitável ao longo de um período de simulação relativamente longo. Para melhorar esta
performance, foram definidos protocolos de negociação que permitiram a integração dos 35 agentes
num sistema multi-agente, que demonstrou estar melhor preparado para enfrentar alterações
súbitas nos mercados, devido à diversificação dos investimentos. Este sistema obteve resultados
muito promissores, e pode ainda ser sujeito a diversos melhoramentos. Os nossos resultados
indiciam que os agentes inteligentes podem ocupar um lugar de relevo na indústria financeira; em
particular, aparentam ter potencial suficiente para serem aplicados na criação de fundos de
investimento onde todas as decisões de negociação são efectuadas por agentes de software, sendo a
intervenção humana relegada para tarefas administrativas básicas
A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium
When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available
A Statistical Approach to the Alignment of fMRI Data
Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods