11,251 research outputs found

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Full text link
    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

    Interests Diffusion in Social Networks

    Full text link
    Understanding cultural phenomena on Social Networks (SNs) and exploiting the implicit knowledge about their members is attracting the interest of different research communities both from the academic and the business side. The community of complexity science is devoting significant efforts to define laws, models, and theories, which, based on acquired knowledge, are able to predict future observations (e.g. success of a product). In the mean time, the semantic web community aims at engineering a new generation of advanced services by defining constructs, models and methods, adding a semantic layer to SNs. In this context, a leapfrog is expected to come from a hybrid approach merging the disciplines above. Along this line, this work focuses on the propagation of individual interests in social networks. The proposed framework consists of the following main components: a method to gather information about the members of the social networks; methods to perform some semantic analysis of the Domain of Interest; a procedure to infer members' interests; and an interests evolution theory to predict how the interests propagate in the network. As a result, one achieves an analytic tool to measure individual features, such as members' susceptibilities and authorities. Although the approach applies to any type of social network, here it is has been tested against the computer science research community. The DBLP (Digital Bibliography and Library Project) database has been elected as test-case since it provides the most comprehensive list of scientific production in this field.Comment: 30 pages 13 figs 4 table

    A Fuzzy Clustering Algorithm for High Dimensional Streaming Data

    Get PDF
    In this paper we propose a dimension reduced weighted fuzzy clustering algorithm (sWFCM-HD). The algorithm can be used for high dimensional datasets having streaming behavior. Such datasets can be found in the area of sensor networks, data originated from web click stream and data collected by internet traffic flow etc. These data’s have two special properties which separate them from other datasets: a) They have streaming behavior and b) They have higher dimensions. Optimized fuzzy clustering algorithm has already been proposed for datasets having streaming behavior or higher dimensions. But as per our information, nobody has proposed any optimized fuzzy clustering algorithm for data sets having both the properties, i.e., data sets with higher dimension and also continuously arriving streaming behavior. Experimental analysis shows that our proposed algorithm (sWFCM-HD) improves performance in terms of memory consumption as well as execution time Keywords-K-Means, Fuzzy C-Means, Weighted Fuzzy C-Means, Dimension Reduction, Clustering

    Detection and Generalization of Spatio-temporal Trajectories for Motion Imagery

    Get PDF
    In today\u27s world of vast information availability users often confront large unorganized amounts of data with limited tools for managing them. Motion imagery datasets have become increasingly popular means for exposing and disseminating information. Commonly, moving objects are of primary interest in modeling such datasets. Users may require different levels of detail mainly for visualization and further processing purposes according to the application at hand. In this thesis we exploit the geometric attributes of objects for dataset summarization by using a series of image processing and neural network tools. In order to form data summaries we select representative time instances through the segmentation of an object\u27s spatio-temporal trajectory lines. High movement variation instances are selected through a new hybrid self-organizing map (SOM) technique to describe a single spatio-temporal trajectory. Multiple objects move in diverse yet classifiable patterns. In order to group corresponding trajectories we utilize an abstraction mechanism that investigates a vague moving relevance between the data in space and time. Thus, we introduce the spatio-temporal neighborhood unit as a variable generalization surface. By altering the unit\u27s dimensions, scaled generalization is accomplished. Common complications in tracking applications that include occlusion, noise, information gaps and unconnected segments of data sequences are addressed through the hybrid-SOM analysis. Nevertheless, entangled data sequences where no information on which data entry belongs to each corresponding trajectory are frequently evident. A multidimensional classification technique that combines geometric and backpropagation neural network implementation is used to distinguish between trajectory data. Further more, modeling and summarization of two-dimensional phenomena evolving in time brings forward the novel concept of spatio-temporal helixes as compact event representations. The phenomena models are comprised of SOM movement nodes (spines) and cardinality shape-change descriptors (prongs). While we focus on the analysis of MI datasets, the framework can be generalized to function with other types of spatio-temporal datasets. Multiple scale generalization is allowed in a dynamic significance-based scale rather than a constant one. The constructed summaries are not just a visualization product but they support further processing for metadata creation, indexing, and querying. Experimentation, comparisons and error estimations for each technique support the analyses discussed

    Cultural consequences of computing technology

    Get PDF
    Computing technology is clearly a technical revolution, but will most probably bring about a cultural revolution\ud as well. The effects of this technology on human culture will be dramatic and far-reaching. Yet, computers and\ud electronic networks are but the latest development in a long history of cognitive tools, such as writing and printing.\ud We will examine this history, which exhibits long-term trends toward an increasing democratization of culture,\ud before turning to today's technology. Within this framework, we will analyze the probable effects of computing on\ud culture: dynamical representations, generalized networking, constant modification and reproduction. To address the\ud problems posed by this new technical environment, we will suggest possible remedies. In particular, the role of\ud social institutions will be discussed, and we will outline the shape of new electronic institutions able to deal with the\ud information flow on the internet

    Efficient Snapshot Retrieval over Historical Graph Data

    Full text link
    We address the problem of managing historical data for large evolving information networks like social networks or citation networks, with the goal to enable temporal and evolutionary queries and analysis. We present the design and architecture of a distributed graph database system that stores the entire history of a network and provides support for efficient retrieval of multiple graphs from arbitrary time points in the past, in addition to maintaining the current state for ongoing updates. Our system exposes a general programmatic API to process and analyze the retrieved snapshots. We introduce DeltaGraph, a novel, extensible, highly tunable, and distributed hierarchical index structure that enables compactly recording the historical information, and that supports efficient retrieval of historical graph snapshots for single-site or parallel processing. Along with the original graph data, DeltaGraph can also maintain and index auxiliary information; this functionality can be used to extend the structure to efficiently execute queries like subgraph pattern matching over historical data. We develop analytical models for both the storage space needed and the snapshot retrieval times to aid in choosing the right parameters for a specific scenario. In addition, we present strategies for materializing portions of the historical graph state in memory to further speed up the retrieval process. Secondly, we present an in-memory graph data structure called GraphPool that can maintain hundreds of historical graph instances in main memory in a non-redundant manner. We present a comprehensive experimental evaluation that illustrates the effectiveness of our proposed techniques at managing historical graph information
    • …
    corecore