11,251 research outputs found
Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure
Big data research has attracted great attention in science, technology,
industry and society. It is developing with the evolving scientific paradigm,
the fourth industrial revolution, and the transformational innovation of
technologies. However, its nature and fundamental challenge have not been
recognized, and its own methodology has not been formed. This paper explores
and answers the following questions: What is big data? What are the basic
methods for representing, managing and analyzing big data? What is the
relationship between big data and knowledge? Can we find a mapping from big
data into knowledge space? What kind of infrastructure is required to support
not only big data management and analysis but also knowledge discovery, sharing
and management? What is the relationship between big data and science paradigm?
What is the nature and fundamental challenge of big data computing? A
multi-dimensional perspective is presented toward a methodology of big data
computing.Comment: 59 page
Interests Diffusion in Social Networks
Understanding cultural phenomena on Social Networks (SNs) and exploiting the
implicit knowledge about their members is attracting the interest of different
research communities both from the academic and the business side. The
community of complexity science is devoting significant efforts to define laws,
models, and theories, which, based on acquired knowledge, are able to predict
future observations (e.g. success of a product). In the mean time, the semantic
web community aims at engineering a new generation of advanced services by
defining constructs, models and methods, adding a semantic layer to SNs. In
this context, a leapfrog is expected to come from a hybrid approach merging the
disciplines above. Along this line, this work focuses on the propagation of
individual interests in social networks. The proposed framework consists of the
following main components: a method to gather information about the members of
the social networks; methods to perform some semantic analysis of the Domain of
Interest; a procedure to infer members' interests; and an interests evolution
theory to predict how the interests propagate in the network. As a result, one
achieves an analytic tool to measure individual features, such as members'
susceptibilities and authorities. Although the approach applies to any type of
social network, here it is has been tested against the computer science
research community.
The DBLP (Digital Bibliography and Library Project) database has been elected
as test-case since it provides the most comprehensive list of scientific
production in this field.Comment: 30 pages 13 figs 4 table
A Fuzzy Clustering Algorithm for High Dimensional Streaming Data
In this paper we propose a dimension reduced weighted fuzzy clustering algorithm (sWFCM-HD). The algorithm can be used for high dimensional datasets having streaming behavior. Such datasets can be found in the area of sensor networks, data originated from web click stream and data collected by internet traffic flow etc. These data’s have two special properties which separate them from other datasets: a) They have streaming behavior and b) They have higher dimensions. Optimized fuzzy clustering algorithm has already been proposed for datasets having streaming behavior or higher dimensions. But as per our information, nobody has proposed any optimized fuzzy clustering algorithm for data sets having both the properties, i.e., data sets with higher dimension and also continuously arriving streaming behavior. Experimental analysis shows that our proposed algorithm (sWFCM-HD) improves performance in terms of memory consumption as well as execution time Keywords-K-Means, Fuzzy C-Means, Weighted Fuzzy C-Means, Dimension Reduction, Clustering
Detection and Generalization of Spatio-temporal Trajectories for Motion Imagery
In today\u27s world of vast information availability users often confront large unorganized amounts of data with limited tools for managing them. Motion imagery datasets have become increasingly popular means for exposing and disseminating information. Commonly, moving objects are of primary interest in modeling such datasets. Users may require different levels of detail mainly for visualization and further processing purposes according to the application at hand. In this thesis we exploit the geometric attributes of objects for dataset summarization by using a series of image processing and neural network tools. In order to form data summaries we select representative time instances through the segmentation of an object\u27s spatio-temporal trajectory lines. High movement variation instances are selected through a new hybrid self-organizing map (SOM) technique to describe a single spatio-temporal trajectory. Multiple objects move in diverse yet classifiable patterns. In order to group corresponding trajectories we utilize an abstraction mechanism that investigates a vague moving relevance between the data in space and time. Thus, we introduce the spatio-temporal neighborhood unit as a variable generalization surface. By altering the unit\u27s dimensions, scaled generalization is accomplished. Common complications in tracking applications that include occlusion, noise, information gaps and unconnected segments of data sequences are addressed through the hybrid-SOM analysis. Nevertheless, entangled data sequences where no information on which data entry belongs to each corresponding trajectory are frequently evident. A multidimensional classification technique that combines geometric and backpropagation neural network implementation is used to distinguish between trajectory data. Further more, modeling and summarization of two-dimensional phenomena evolving in time brings forward the novel concept of spatio-temporal helixes as compact event representations. The phenomena models are comprised of SOM movement nodes (spines) and cardinality shape-change descriptors (prongs). While we focus on the analysis of MI datasets, the framework can be generalized to function with other types of spatio-temporal datasets. Multiple scale generalization is allowed in a dynamic significance-based scale rather than a constant one. The constructed summaries are not just a visualization product but they support further processing for metadata creation, indexing, and querying. Experimentation, comparisons and error estimations for each technique support the analyses discussed
Cultural consequences of computing technology
Computing technology is clearly a technical revolution, but will most probably bring about a cultural revolution\ud
as well. The effects of this technology on human culture will be dramatic and far-reaching. Yet, computers and\ud
electronic networks are but the latest development in a long history of cognitive tools, such as writing and printing.\ud
We will examine this history, which exhibits long-term trends toward an increasing democratization of culture,\ud
before turning to today's technology. Within this framework, we will analyze the probable effects of computing on\ud
culture: dynamical representations, generalized networking, constant modification and reproduction. To address the\ud
problems posed by this new technical environment, we will suggest possible remedies. In particular, the role of\ud
social institutions will be discussed, and we will outline the shape of new electronic institutions able to deal with the\ud
information flow on the internet
Efficient Snapshot Retrieval over Historical Graph Data
We address the problem of managing historical data for large evolving
information networks like social networks or citation networks, with the goal
to enable temporal and evolutionary queries and analysis. We present the design
and architecture of a distributed graph database system that stores the entire
history of a network and provides support for efficient retrieval of multiple
graphs from arbitrary time points in the past, in addition to maintaining the
current state for ongoing updates. Our system exposes a general programmatic
API to process and analyze the retrieved snapshots. We introduce DeltaGraph, a
novel, extensible, highly tunable, and distributed hierarchical index structure
that enables compactly recording the historical information, and that supports
efficient retrieval of historical graph snapshots for single-site or parallel
processing. Along with the original graph data, DeltaGraph can also maintain
and index auxiliary information; this functionality can be used to extend the
structure to efficiently execute queries like subgraph pattern matching over
historical data. We develop analytical models for both the storage space needed
and the snapshot retrieval times to aid in choosing the right parameters for a
specific scenario. In addition, we present strategies for materializing
portions of the historical graph state in memory to further speed up the
retrieval process. Secondly, we present an in-memory graph data structure
called GraphPool that can maintain hundreds of historical graph instances in
main memory in a non-redundant manner. We present a comprehensive experimental
evaluation that illustrates the effectiveness of our proposed techniques at
managing historical graph information
- …