332 research outputs found

    AAPOR Report on Big Data

    Get PDF
    In recent years we have seen an increase in the amount of statistics in society describing different phenomena based on so called Big Data. The term Big Data is used for a variety of data as explained in the report, many of them characterized not just by their large volume, but also by their variety and velocity, the organic way in which they are created, and the new types of processes needed to analyze them and make inference from them. The change in the nature of the new types of data, their availability, the way in which they are collected, and disseminated are fundamental. The change constitutes a paradigm shift for survey research.There is a great potential in Big Data but there are some fundamental challenges that have to be resolved before its full potential can be realized. In this report we give examples of different types of Big Data and their potential for survey research. We also describe the Big Data process and discuss its main challenges

    Overview of Some Intelligent Control Structures and Dedicated Algorithms

    Get PDF
    Automatic control refers to the use of a control device to make the controlled object automatically run or keep the state unchanged without the participation of people. The guiding ideology of intelligent control is based on people’s way of thinking and ability to solve problems, in order to solve the current methods that require human intelligence. We already know that the complexity of the controlled object includes model uncertainty, high nonlinearity, distributed sensors/actuators, dynamic mutations, multiple time scales, complex information patterns, big data process, and strict characteristic indicators, etc. In addition, the complexity of the environment manifests itself in uncertainty and uncertainty of change. Based on this, various researches continue to suggest that the main methods of intelligent control can include expert control, fuzzy control, neural network control, hierarchical intelligent control, anthropomorphic intelligent control, integrated intelligent control, combined intelligent control, chaos control, wavelet theory, etc. However, it is difficult to want all the intelligent control methods in a chapter, so this chapter focuses on intelligent control based on fuzzy logic, intelligent control based on neural network, expert control and human-like intelligent control, and hierarchical intelligent control and learning control, and provide relevant and useful programming for readers to practice

    Big Data on Decision Making in Energetic Management of Copper Mining

    Get PDF
    Indexado en: Web of Science; Scopus.It is proposed an analysis of the related variables with the energetic consumption in the process of concentrate of copper; specifically ball mills and SAG. The methodology considers the analysis of great volumes of data, which allows to identify the variables of interest (tonnage, temperature and power) to reach to an improvement plan in the energetic efficiency. The correct processing of the great volumen of data, previous imputation to the null data, not informed and out of range, coming from the milling process of copper, a decision support systems integrated, it allows to obtain clear and on line information for the decision making. As results it is establish that exist correlation between the energetic consumption of the Ball and SAG Mills, regarding the East, West temperature and winding. Nevertheless, it is not observed correlation between the energetic consumption of the Ball Mills and the SAG Mills, regarding to the tonnages of feed of SAG Mill. In consequence, From the experimental design, a similarity of behavior between two groups of different mills was determined in lines process. In addition, it was determined that there is a difference in energy consumption between the mills of the same group. This approach modifies the method presented in [1].(a)http://www.univagora.ro/jour/index.php/ijccc/article/view/2784/106

    Process-oriented Iterative Multiple Alignment for Medical Process Mining

    Full text link
    Adapted from biological sequence alignment, trace alignment is a process mining technique used to visualize and analyze workflow data. Any analysis done with this method, however, is affected by the alignment quality. The best existing trace alignment techniques use progressive guide-trees to heuristically approximate the optimal alignment in O(N2L2) time. These algorithms are heavily dependent on the selected guide-tree metric, often return sum-of-pairs-score-reducing errors that interfere with interpretation, and are computationally intensive for large datasets. To alleviate these issues, we propose process-oriented iterative multiple alignment (PIMA), which contains specialized optimizations to better handle workflow data. We demonstrate that PIMA is a flexible framework capable of achieving better sum-of-pairs score than existing trace alignment algorithms in only O(NL2) time. We applied PIMA to analyzing medical workflow data, showing how iterative alignment can better represent the data and facilitate the extraction of insights from data visualization.Comment: accepted at ICDMW 201

    Distributed Graph Clustering using Modularity and Map Equation

    Full text link
    We study large-scale, distributed graph clustering. Given an undirected graph, our objective is to partition the nodes into disjoint sets called clusters. A cluster should contain many internal edges while being sparsely connected to other clusters. In the context of a social network, a cluster could be a group of friends. Modularity and map equation are established formalizations of this internally-dense-externally-sparse principle. We present two versions of a simple distributed algorithm to optimize both measures. They are based on Thrill, a distributed big data processing framework that implements an extended MapReduce model. The algorithms for the two measures, DSLM-Mod and DSLM-Map, differ only slightly. Adapting them for similar quality measures is straight-forward. We conduct an extensive experimental study on real-world graphs and on synthetic benchmark graphs with up to 68 billion edges. Our algorithms are fast while detecting clusterings similar to those detected by other sequential, parallel and distributed clustering algorithms. Compared to the distributed GossipMap algorithm, DSLM-Map needs less memory, is up to an order of magnitude faster and achieves better quality.Comment: 14 pages, 3 figures; v3: Camera ready for Euro-Par 2018, more details, more results; v2: extended experiments to include comparison with competing algorithms, shortened for submission to Euro-Par 201
    • …
    corecore