735 research outputs found

    What May Visualization Processes Optimize?

    Full text link
    In this paper, we present an abstract model of visualization and inference processes and describe an information-theoretic measure for optimizing such processes. In order to obtain such an abstraction, we first examined six classes of workflows in data analysis and visualization, and identified four levels of typical visualization components, namely disseminative, observational, analytical and model-developmental visualization. We noticed a common phenomenon at different levels of visualization, that is, the transformation of data spaces (referred to as alphabets) usually corresponds to the reduction of maximal entropy along a workflow. Based on this observation, we establish an information-theoretic measure of cost-benefit ratio that may be used as a cost function for optimizing a data visualization process. To demonstrate the validity of this measure, we examined a number of successful visualization processes in the literature, and showed that the information-theoretic measure can mathematically explain the advantages of such processes over possible alternatives.Comment: 10 page

    Big data analytics:Computational intelligence techniques and application areas

    Get PDF
    Big Data has significant impact in developing functional smart cities and supporting modern societies. In this paper, we investigate the importance of Big Data in modern life and economy, and discuss challenges arising from Big Data utilization. Different computational intelligence techniques have been considered as tools for Big Data analytics. We also explore the powerful combination of Big Data and Computational Intelligence (CI) and identify a number of areas, where novel applications in real world smart city problems can be developed by utilizing these powerful tools and techniques. We present a case study for intelligent transportation in the context of a smart city, and a novel data modelling methodology based on a biologically inspired universal generative modelling approach called Hierarchical Spatial-Temporal State Machine (HSTSM). We further discuss various implications of policy, protection, valuation and commercialization related to Big Data, its applications and deployment

    Edge Assignment and Data Valuation in Federated Learning

    Get PDF
    Federated Learning (FL) is a recent Machine Learning method for training with private data separately stored in local machines without gathering them into one place for central learning. It was born to address the following challenges when applying Machine Learning in practice: (1) Communication cost: Most real-world data that can be useful for training are locally collected; to bring them all to one place for central learning can be expensive, especially in real-time learning applications when time is of the essence, for example, predicting the next word when texting on a smartphone; and (2) Privacy protection: Many applications must protect data privacy, such as those in the healthcare field; the private data can only be seen by its local owner and as such the learning may only use a content-hiding representation of this data, which is much less informative. To fulfill FL’s promise, this dissertation addresses three important problems regarding the need for good training data, system scalability, and uncertainty robustness: 1. The effectiveness of FL depends critically on the quality of the local training data. We should not only incentivize participants who have good training data but also minimize the effect of bad training data on the overall learning procedure. The first problem of my research is to determine a score to value a participant’s contribution. My approach is to compute such a score based on Shapley Value (SV), a concept of cooperative game theory for profit allocation in a coalition game. In this direction, the main challenge is due to the exponential time complexity of the SV computation, which is further complicated by the iterative manner of the FL learning algorithm. I propose a fast and effective valuation method that overcomes this challenge. 2. On scalability, FL depends on a central server for repeated aggregation of local training models, which is prone to become a performance bottleneck. A reasonable approach is to combine FL with Edge Computing: introduce a layer of edge servers to each serve as a regional aggregator to offload the main server. The scalability is thus improved, however at the cost of learning accuracy. The second problem of my research is to optimize this tradeoff. This dissertation shows that this cost can be alleviated with a proper choice of edge server assignment: which edge servers should aggregate the training models from which local machines. Specifically, I propose an assignment solution that is especially useful for the case of non-IID training data which is well-known to hinder today’s FL performance. 3. FL participants may decide on their own what devices they run on, their computing capabilities, and how often they communicate the training model with the aggregation server. The workloads incurred by them are therefore time-varying, and unpredictably. The server capacities are finite and can vary too. The third problem of my research is to compute an edge server assignment that is robust to such dynamics and uncertainties. I propose a stochastic approach to solving this problem

    Implicit Study of Techniques and Tools for Data Analysis of Complex Sensory Data

    Get PDF
    The utility as well as contribution of applications in Wireless Sensor Network (WSN) has been experienced by the users from more than a decade. However, with the evolution of time, it has been found that there is a massive growth of data generation even in WSN. The smaller size of sensor with limited battery life and minimal computational capability cannot handle processive such a massive stream of complex data efficiently. Although, there are various types of mining techniques being practiced today, but such tools and techniques cannot be efficiently used for analyzing such complex and massively growing data. This paper therefore discusses about the generation of large data and issues of the existing research techniques by reviewing the literatures and frequently used tools. The study finally briefs about the significant research gap that calls for need of data analytical tools in extracting knowledge from complex sensory data

    Towards Mobility Data Science (Vision Paper)

    Full text link
    Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years.Comment: Updated arXiv metadata to include two authors that were missing from the metadata. PDF has not been change
    • …
    corecore