451 research outputs found

    Fuzzy rough and evolutionary approaches to instance selection

    Get PDF

    U-Scores for Multivariate Data In Sports

    Get PDF
    In many sport competitions athletes, teams, or countries are evaluated based on several variables. The strong assumptions underlying traditional ‘linear weight’ scoring systems (that the relative importance, interactions and linearizing transformations of the variables are known) can often not be justified on theoretical grounds, and empirical ‘validation’ of weights, interactions and transformations, is problematic when a ‘gold standard’ is lacking. With ÎŒ-scores (u-scores for multivariate data) one can integrate information even if the variables have different scales and unknown interactions or if the events counted are not directly comparable, as long as the variables have an ‘orientation’. Using baseball as an example, we discuss how measures based on ÎŒ-scores can complement the existing measures for ‘performance’ (which may depend on the situation) by providing the first multivariate measures for ‘ability’ (which should be independent of the situation). Recently, ÎŒ-scores have been extended to situations where count variables are graded by importance or relevance, such as medals in the Olympics (Wittkowski 2003) or Tour-de-France jerseys (Cherchye and Vermeulen 2006, 2007). Here, we present extensions to ‘censored’ variables (life-time achievements of active athletes), penalties (counting a win more than two ties) and hierarchically structured variables (Nordic, alpine, outdoor, and indoor Olympic events). The methods presented are not restricted to sports. Other applications of the method include medicine (adverse events), finance (risk analysis), social choice theory (voting), and economy (long-term profit)

    Machine Learning Methods for Effectively Discovering Complex Relationships in Graph Data

    Get PDF
    Graphs are extensively employed in many systems due to their capability to capture the interactions (edges) among data (nodes) in many real-life scenarios. Social networks, biological networks and molecular graphs are some of the domains where data have inherent graph structural information. Built graphs can be used to make predictions in Machine Learning (ML) such as node classifications, link predictions, graph classifications, etc. But, existing ML algorithms hold a core assumption that data instances are independent of each other and hence prevent incorporating graph information into ML. This irregular and variable sized nature of non-Euclidean data makes learning underlying patterns of the graph more sophisticated. One approach is to convert the graph information into a lower dimensional space and use traditional learning methods on the reduced space. Meanwhile, Deep Learning has better performance than ML due to convolutional layers and recurrent layers which consider simple correlations in spatial and temporal data, respectively. This proves the importance of taking data interrelationships into account and Graph Convolutional Networks (GCNs) are inspired by this fact to exploit the structure of graphs to make better inference in both node-centric and graph-centric applications. In this dissertation, the graph based ML prediction is addressed in terms of both node classification and link prediction tasks. At first, GCN is thoroughly studied and compared with other graph embedding methods specific to biological networks. Next, we present several new GCN algorithms to improve the prediction performance related to biomedical networks and medical imaging tasks. A circularRNA (circRNA) and disease association network is modeled for both node classification and link prediction tasks to predict diseases relevant to circRNAs to demonstrate the effectiveness of graph convolutional learning. A GCN based chest X-ray image classification outperforms state-of-the-art transfer learning methods. Next, the graph representation is used to analyze the feature dependencies of data and select an optimal feature subset which respects the original data structure. Finally, the usability of this algorithm is discussed in identifying disease specific genes by exploiting gene-gene interactions

    Evaluation of the coordination between China’s technology and economy using a grey multivariate coupling model

    Get PDF
    As extremely complex interactions exist in the process of economic research and development, a novel grey multivariable coupling model called CFGM(1,N) is proposed to evaluate the coordination degree between China’s technology and economy with limited information. This proposed model improves the aggregation in GM(1,N) model through the Choquet integral among λ-fuzzy measure, which can reflect interactions among factor indexes. Meanwhile, it can estimate the coordinate parameters via the whale optimization algorithm and obtains the coupling coordination degree combining with grey comentropy. To verify the proposed model, a case study using a dataset from China’s technology and the economic system is conducted. The CFGM(1,N) model has a better performance in the convergence and interpretability, as compared to the three heuristic algorithm and two classical approaches. Our finding suggests that China’s technology and the economic system is still relatively coordinated. Results also reveal that there exists strong negative cooperation between the comprehensive human input and the comprehensive capital investment in this system. First published online 19 November 202

    An Intelligent System for Induction Motor Health Condition Monitoring

    Get PDF
    Induction motors (IMs) are commonly used in both industrial applications and household appliances. An IM online condition monitoring system is very useful to identify the IM fault at its initial stage, in order to prevent machinery malfunction, decreased productivity and even catastrophic failures. Although a series of research efforts have been conducted over decades for IM fault diagnosis using various approaches, it still remains a challenging task to accurately diagnose the IM fault due to the complex signal transmission path and environmental noise. The objective of this thesis is to develop a novel intelligent system for more reliable IM health condition monitoring. The developed intelligent monitor consists of two stages: feature extraction and decision-making. In feature extraction, a spectrum synch technique is proposed to extract representative features from collected stator current signals for fault detection in IM systems. The local bands related to IM health conditions are synchronized to enhance fault characteristic features; a central kurtosis method is suggested to extract representative information from the resulting spectrum and to formulate an index for fault diagnosis. In diagnostic pattern classification, an innovative selective boosting technique is proposed to effectively classify representative features into different IM health condition categories. On the other hand, IM health conditions can also be predicted by applying appropriate prognostic schemes. In system state forecasting, two forecasting techniques, a model-based pBoost predictor and a data-driven evolving fuzzy neural predictor, are proposed to forecast future states of the fault indices, which can be employed to further improve the accuracy of IM health condition monitoring. A novel fuzzy inference system is developed to integrate information from both the classifier and the predictor for IM health condition monitoring. The effectiveness of the proposed techniques and integrated monitor is verified through simulations and experimental tests corresponding to different IM states such as IMs with broken rotor bars and with the bearing outer race defect. The developed techniques, the selective boosting classifier, pBoost predictor and evolving fuzzy neural predictor, are effective tools that can be employed in a much wider range of applications. In order to select the most reliable technique in each processing module so as to provide a more positive assessment of IM health conditions, some more techniques are also proposed for each processing purpose. A conjugate Levebnerg-Marquardt method and a Laplace particle swarm technique are proposed for model parameter training, whereas a mutated particle filter technique is developed for system state prediction. These strong tools developed in this work could also be applied to fault diagnosis and other applications

    Multifractal techniques for analysis and classification of emphysema images

    Get PDF
    This thesis proposes, develops and evaluates different multifractal methods for detection, segmentation and classification of medical images. This is achieved by studying the structures of the image and extracting the statistical self-similarity measures characterized by the Holder exponent, and using them to develop texture features for segmentation and classification. The theoretical framework for fulfilling these goals is based on the efficient computation of fractal dimension, which has been explored and extended in this work. This thesis investigates different ways of computing the fractal dimension of digital images and validates the accuracy of each method with fractal images with predefined fractal dimension. The box counting and the Higuchi methods are used for the estimation of fractal dimensions. A prototype system of the Higuchi fractal dimension of the computed tomography (CT) image is used to identify and detect some of the regions of the image with the presence of emphysema. The box counting method is also used for the development of the multifractal spectrum and applied to detect and identify the emphysema patterns. We propose a multifractal based approach for the classification of emphysema patterns by calculating the local singularity coefficients of an image using four multifractal intensity measures. One of the primary statistical measures of self-similarity used in the processing of tissue images is the Holder exponent (α-value) that represents the power law, which the intensity distribution satisfies in the local pixel neighbourhoods. The fractal dimension corresponding to each α-value gives a multifractal spectrum f(α) that was used as a feature descriptor for classification. A feature selection technique is introduced and implemented to extract some of the important features that could increase the discriminating capability of the descriptors and generate the maximum classification accuracy of the emphysema patterns. We propose to further improve the classification accuracy of emphysema CT patterns by combining the features extracted from the alpha-histograms and the multifractal descriptors to generate a new descriptor. The performances of the classifiers are measured by using the error matrix and the area under the receiver operating characteristic curve (AUC). The results at this stage demonstrated the proposed cascaded approach significantly improves the classification accuracy. Another multifractal based approach using a direct determination approach is investigated to demonstrate how multifractal characteristic parameters could be used for the identification of emphysema patterns in HRCT images. This further analysis reveals the multi-scale structures and characteristic properties of the emphysema images through the generalized dimensions. The results obtained confirm that this approach can also be effectively used for detecting and identifying emphysema patterns in CT images. Two new descriptors are proposed for accurate classification of emphysema patterns by hybrid concatenation of the local features extracted from the local binary patterns (LBP) and the global features obtained from the multifractal images. The proposed combined feature descriptors of the LBP and f(α) produced a very good performance with an overall classification accuracy of 98%. These performances outperform other state-of-the-art methods for emphysema pattern classification and demonstrate the discriminating power and robustness of the combined features for accurate classification of emphysema CT images. Overall, experimental results have shown that the multifractal could be effectively used for the classifications and detections of emphysema patterns in HRCT images

    Uncertain Multi-Criteria Optimization Problems

    Get PDF
    Most real-world search and optimization problems naturally involve multiple criteria as objectives. Generally, symmetry, asymmetry, and anti-symmetry are basic characteristics of binary relationships used when modeling optimization problems. Moreover, the notion of symmetry has appeared in many articles about uncertainty theories that are employed in multi-criteria problems. Different solutions may produce trade-offs (conflicting scenarios) among different objectives. A better solution with respect to one objective may compromise other objectives. There are various factors that need to be considered to address the problems in multidisciplinary research, which is critical for the overall sustainability of human development and activity. In this regard, in recent decades, decision-making theory has been the subject of intense research activities due to its wide applications in different areas. The decision-making theory approach has become an important means to provide real-time solutions to uncertainty problems. Theories such as probability theory, fuzzy set theory, type-2 fuzzy set theory, rough set, and uncertainty theory, available in the existing literature, deal with such uncertainties. Nevertheless, the uncertain multi-criteria characteristics in such problems have not yet been explored in depth, and there is much left to be achieved in this direction. Hence, different mathematical models of real-life multi-criteria optimization problems can be developed in various uncertain frameworks with special emphasis on optimization problems

    Archives of Data Science, Series A. Vol. 1,1: Special Issue: Selected Papers of the 3rd German-Polish Symposium on Data Analysis and Applications

    Get PDF
    The first volume of Archives of Data Science, Series A is a special issue of a selection of contributions which have been originally presented at the {\em 3rd Bilateral German-Polish Symposium on Data Analysis and Its Applications} (GPSDAA 2013). All selected papers fit into the emerging field of data science consisting of the mathematical sciences (computer science, mathematics, operations research, and statistics) and an application domain (e.g. marketing, biology, economics, engineering)

    Dataset shift in land-use classification for optical remote sensing

    Get PDF
    Multimodal dataset shifts consisting of both concept and covariate shifts are addressed in this study to improve texture-based land-use classification accuracy for optical panchromatic and multispectral remote sensing. Multitemporal and multisensor variances between train and test data are caused by atmospheric, phenological, sensor, illumination and viewing geometry differences, which cause supervised classification inaccuracies. The first dataset shift reduction strategy involves input modification through shadow removal before feature extraction with gray-level co-occurrence matrix and local binary pattern features. Components of a Rayleigh quotient-based manifold alignment framework is investigated to reduce multimodal dataset shift at the input level of the classifier through unsupervised classification, followed by manifold matching to transfer classification labels by finding across-domain cluster correspondences. The ability of weighted hierarchical agglomerative clustering to partition poorly separated feature spaces is explored and weight-generalized internal validation is used for unsupervised cardinality determination. Manifold matching solves the Hungarian algorithm with a cost matrix featuring geometric similarity measurements that assume the preservation of intrinsic structure across the dataset shift. Local neighborhood geometric co-occurrence frequency information is recovered and a novel integration thereof is shown to improve matching accuracy. A final strategy for addressing multimodal dataset shift is multiscale feature learning, which is used within a convolutional neural network to obtain optimal hierarchical feature representations instead of engineered texture features that may be sub-optimal. Feature learning is shown to produce features that are robust against multimodal acquisition differences in a benchmark land-use classification dataset. A novel multiscale input strategy is proposed for an optimized convolutional neural network that improves classification accuracy to a competitive level for the UC Merced benchmark dataset and outperforms single-scale input methods. All the proposed strategies for addressing multimodal dataset shift in land-use image classification have resulted in significant accuracy improvements for various multitemporal and multimodal datasets.Thesis (PhD)--University of Pretoria, 2016.National Research Foundation (NRF)University of Pretoria (UP)Electrical, Electronic and Computer EngineeringPhDUnrestricte

    Socio-Cognitive and Affective Computing

    Get PDF
    Social cognition focuses on how people process, store, and apply information about other people and social situations. It focuses on the role that cognitive processes play in social interactions. On the other hand, the term cognitive computing is generally used to refer to new hardware and/or software that mimics the functioning of the human brain and helps to improve human decision-making. In this sense, it is a type of computing with the goal of discovering more accurate models of how the human brain/mind senses, reasons, and responds to stimuli. Socio-Cognitive Computing should be understood as a set of theoretical interdisciplinary frameworks, methodologies, methods and hardware/software tools to model how the human brain mediates social interactions. In addition, Affective Computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects, a fundamental aspect of socio-cognitive neuroscience. It is an interdisciplinary field spanning computer science, electrical engineering, psychology, and cognitive science. Physiological Computing is a category of technology in which electrophysiological data recorded directly from human activity are used to interface with a computing device. This technology becomes even more relevant when computing can be integrated pervasively in everyday life environments. Thus, Socio-Cognitive and Affective Computing systems should be able to adapt their behavior according to the Physiological Computing paradigm. This book integrates proposals from researchers who use signals from the brain and/or body to infer people's intentions and psychological state in smart computing systems. The design of this kind of systems combines knowledge and methods of ubiquitous and pervasive computing, as well as physiological data measurement and processing, with those of socio-cognitive and affective computing
