293 research outputs found

    Next challenges for adaptive learning systems

    Get PDF
    Learning from evolving streaming data has become a 'hot' research topic in the last decade and many adaptive learning algorithms have been developed. This research was stimulated by rapidly growing amounts of industrial, transactional, sensor and other business data that arrives in real time and needs to be mined in real time. Under such circumstances, constant manual adjustment of models is in-efficient and with increasing amounts of data is becoming infeasible. Nevertheless, adaptive learning models are still rarely employed in business applications in practice. In the light of rapidly growing structurally rich 'big data', new generation of parallel computing solutions and cloud computing services as well as recent advances in portable computing devices, this article aims to identify the current key research directions to be taken to bring the adaptive learning closer to application needs. We identify six forthcoming challenges in designing and building adaptive learning (pre-diction) systems: making adaptive systems scalable, dealing with realistic data, improving usability and trust, integrat-ing expert knowledge, taking into account various application needs, and moving from adaptive algorithms towards adaptive tools. Those challenges are critical for the evolving stream settings, as the process of model building needs to be fully automated and continuous.</jats:p

    Set-Codes with Small Intersections and Small Discrepancies

    Full text link
    We are concerned with the problem of designing large families of subsets over a common labeled ground set that have small pairwise intersections and the property that the maximum discrepancy of the label values within each of the sets is less than or equal to one. Our results, based on transversal designs, factorizations of packings and Latin rectangles, show that by jointly constructing the sets and labeling scheme, one can achieve optimal family sizes for many parameter choices. Probabilistic arguments akin to those used for pseudorandom generators lead to significantly suboptimal results when compared to the proposed combinatorial methods. The design problem considered is motivated by applications in molecular data storage and theoretical computer science

    Unsupervised Ensembles Techniques for Visualization

    Get PDF
    In this paper we introduce two unsupervised techniques for visualization purposes based on the use of ensemble methods. The unsupervised techniques which are often quite sensitive to the presence of outliers are combined with the ensemble approaches in order to overcome the influence of outliers. The first technique is based on the use of Principal Component Analysis and the second one is known for its topology preserving characteristics and is based on the combination of the Scale Invariant Map and Maximum Likelihood Hebbian learning. In order to show the advantage of these novel ensemble-based techniques the results of some experiments carried out on artificial and real data sets are included

    Toward Digital Twin Oriented Modeling of Complex Networked Systems and Their Dynamics: A Comprehensive Survey

    Full text link
    This paper aims to provide a comprehensive critical overview on how entities and their interactions in Complex Networked Systems (CNS) are modelled across disciplines as they approach their ultimate goal of creating a Digital Twin (DT) that perfectly matches the reality. We propose four complexity dimensions for the network representation and five generations of models for the dynamics modelling to describe the increasing complexity level of the CNS that will be developed towards achieving DT (e.g. CNS dynamics modelled offline in the 1st generation v.s. CNS dynamics modelled simultaneously with a two-way real time feedback between reality and the CNS in the 5th generation). Based on that, we propose a new framework to conceptually compare diverse existing modelling paradigms from different perspectives and create unified assessment criteria to evaluate their respective capabilities of reaching such an ultimate goal. Using the proposed criteria, we also appraise how far the reviewed current state-of-the-art approaches are from the idealised DTs. Finally, we identify and propose potential directions and ways of building a DT-orientated CNS based on the convergence and integration of CNS and DT utilising a variety of cross-disciplinary techniques

    Incremental Information Gain Analysis of Input Attribute Impact on RBF-Kernel SVM Spam Detection

    Get PDF
    The massive increase of spam is posing a very serious threat to email and SMS, which have become an important means of communication. Not only do spams annoy users, but they also become a security threat. Machine learning techniques have been widely used for spam detection. Email spams can be detected through detecting senders’ behaviour, the contents of an email, subject and source address, etc, while SMS spam detection usually is based on the tokens or features of messages due to short content. However, a comprehensive analysis of email/SMS content may provide cures for users to aware of email/SMS spams. We cannot completely depend on automatic tools to identify all spams. In this paper, we propose an analysis approach based on information entropy and incremental learning to see how various features affect the performance of an RBF-based SVM spam detector, so that to increase our awareness of a spam by sensing the features of a spam. The experiments were carried out on the spambase and SMSSpemCollection databases in UCI machine learning repository. The results show that some features have significant impacts on spam detection, of which users should be aware, and there exists a feature space that achieves Pareto efficiency in True Positive Rate and True Negative Rate

    Practicing, Materialising and Contesting Environmental Data (Introduction to Special Issue)

    Get PDF
    While there are now an increasing number of studies that critically and rigorously engage with Big Data discourses and practices, these analyses often focus on social media and other forms of online data typically generated about users. This introduction discusses how environmental Big Data is emerging as a parallel area of investigation within studies of Big Data. New practices, technologies, actors and issues are concretising that are distinct and specific to the operations of environmental data. Situating these developments in relation to the seven contributions to this special collection, the introduction outlines significant characteristics of environmental data practices, data materialisations and data contestations. In these contributions, it becomes evident that processes for validating, distributing and acting on environmental data become key sites of materialisation and contestation, where new engagements with environmental politics and citizenship are worked through and realised

    A Robust Comparative Analysis of Graph Neural Networks on Dynamic Link Prediction

    Full text link
    Graph neural networks (GNNs) are rapidly becoming the dominant way to learn on graph-structured data. Link prediction is a near-universal benchmark for new GNN models. Many advanced models such as Dynamic graph neural networks (DGNNs) specifically target dynamic graphs. However, these models, particularly DGNNs, are rarely compared to each other or existing heuristics. Different works evaluate their models in different ways, thus one cannot compare evaluation metrics and their results directly. Motivated by this, we perform a comprehensive comparison study. We compare link prediction heuristics, GNNs, discrete DGNNs, and continuous DGNNs on the dynamic link prediction task. In total we summarize the results of over 3200 experimental runs (≈ 1.5 years of computation time). We find that simple link prediction heuristics perform better than GNNs and DGNNs, different sliding window sizes greatly affect performance, and of all examined graph neural networks, that DGNNs consistently outperform static GNNs. This work is a continuation of our previous work, a foundation of dynamic networks and theoretical review of DGNNs. In combination with our survey, we provide both a theoretical and empirical comparison of DGNNs
    corecore