3,977 research outputs found
Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure
Big data research has attracted great attention in science, technology,
industry and society. It is developing with the evolving scientific paradigm,
the fourth industrial revolution, and the transformational innovation of
technologies. However, its nature and fundamental challenge have not been
recognized, and its own methodology has not been formed. This paper explores
and answers the following questions: What is big data? What are the basic
methods for representing, managing and analyzing big data? What is the
relationship between big data and knowledge? Can we find a mapping from big
data into knowledge space? What kind of infrastructure is required to support
not only big data management and analysis but also knowledge discovery, sharing
and management? What is the relationship between big data and science paradigm?
What is the nature and fundamental challenge of big data computing? A
multi-dimensional perspective is presented toward a methodology of big data
computing.Comment: 59 page
Exploiting Emergence of New Topics via Anamoly Detection: A Survey
Detecting and generating new concepts has attracted much attention in data mining era, nowadays. The emergence of new topics in news data is a big challenge. The problem can be extended as “finding breaking news”. Years ago the emergence of new stories were detected and followed up by domain experts. But manually reading stories and concluding the misbehaviors is a critical and time consuming task. Further mapping these misbehaviors to various stories needs excellent knowledge about the news and old concepts. So automatically modeling breaking news has much interest in data mining. The anomalies in news published in newspapers are the basic clues for concluding the emergence of a new story(s). The anomalies are the keywords or phrases which doesn’t match the whole concept of the news. These anomalies then processed and mapped to the stories where these keywords and phrases doesn’t behave as anomalies. After mapping these anomalies one can conclude that these mapped topic by anomaly linking can generate a new concept which eventually can be modeled as emerging story. We survey some techniques which can be used to efficiently model the new concept. News Classification, Anomaly Detection, Concept Detection and Generation are some of those techniques which collectively can be the basics of modeling breaking news. We further discussed some data sources which can process and used as input stories or news for modeling emergence of new stories
Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications (Extended Version)
Although the ``scale-free'' literature is large and growing, it gives neither
a precise definition of scale-free graphs nor rigorous proofs of many of their
claimed properties. In fact, it is easily shown that the existing theory has
many inherent contradictions and verifiably false claims. In this paper, we
propose a new, mathematically precise, and structural definition of the extent
to which a graph is scale-free, and prove a series of results that recover many
of the claimed properties while suggesting the potential for a rich and
interesting theory. With this definition, scale-free (or its opposite,
scale-rich) is closely related to other structural graph properties such as
various notions of self-similarity (or respectively, self-dissimilarity).
Scale-free graphs are also shown to be the likely outcome of random
construction processes, consistent with the heuristic definitions implicit in
existing random graph approaches. Our approach clarifies much of the confusion
surrounding the sensational qualitative claims in the scale-free literature,
and offers rigorous and quantitative alternatives.Comment: 44 pages, 16 figures. The primary version is to appear in Internet
Mathematics (2005
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Recommended from our members
The inner and inter construct associations of the quality of data warehouse customer relationship data for problem enactment
This thesis was submitted for the degree of Doctor of Business Administration and awarded by Brunel University on behalf of Henley Management College.The literature identifies perceptions of data quality as a key factor influencing a wide
range of attitudes and behaviors related to data in organizational settings (e.g.
decision confidence). In particular, there is an overwhelming consensus that effective
customer relationship management, CRM, depends on the quality of customer data.
Data warehouses, if properly implemented, enable data integration which is a key
attribute of data quality. The literature highlights the relevance of formulating
problem statements because this will determine the course of action. CRM managers
formulate problem statements through a cognitive process known as enactment.
The literature on data quality is very fragmented. It posits that this construct is of a
high order nature (it is dimensional), it is contextual and situational, and it is closely
linked to a utilitarian value. This study addresses all these disperse views of the
nature of data quality from a holistic perspective. Social cognitive theory, SCT, is the
backbone for studying data quality in terms of information search behavior and
enhancements in formulating problem statements.
The main objective of this study is to explore the nature of a data warehouse's
customer relationship data quality in situations where there is a need for
understanding a customer relationship problem. The research question is What are the
inner and inter construct associations of the quality of data warehouse customer
relationship data for problem enactment?
To reach this objective, a positivistic approach was adopted complemented with
qualitative interventions along the research process. Observations were gathered with
a survey. Scales were adjusted using a construct-based approach. Research findings
confirm that data quality is a high order construct with a contextual dimension and a
situational dimension. Problem sense making enhancements is a dependent variable
of data quality in a confirmed positive association between both constructs. Problem
sense making enhancements is also a high order construct with a mastering
experience dimension and a self-efficacy dimension. Behavioral patterns for
information search mode (scanning mode orientation vs. focus mode orientation) and
for information search heuristic (template heuristic orientation vs. trial-and-error
heuristic orientation) have been identified. Focus is the predominant information
search mode orientation and template is the predominant information search heuristic
orientation. Overall, the research findings support the associations advocated by
SCT. The self-efficacy dimension in problem sense making enhancements is a
discriminant for information search mode orientation (focus mode orientation vs.
scanning mode orientation). The contextual dimension in data quality (i.e. data task
utility) is a discriminant for information search heuristic (template heuristic
orientation vs. trial-and-error heuristic orientation).
A data quality cognitive metamodel and a data quality for problem enactment model
are suggested for research in the areas of data quality, information search behavior,
and cognitive enhancements.
iiiTeradata, NC
Data Mining Algorithms for Internet Data: from Transport to Application Layer
Nowadays we live in a data-driven world. Advances in data generation, collection and storage technology have enabled organizations to gather data sets of massive size. Data mining is a discipline that blends traditional data analysis methods with sophisticated algorithms to handle the challenges posed by these new types of data sets.
The Internet is a complex and dynamic system with new protocols and applications that arise at a constant pace. All these characteristics designate the Internet a valuable and challenging data source and application domain for a research activity, both looking at Transport layer, analyzing network tra c flows, and going up to Application layer, focusing on the ever-growing next generation web services: blogs, micro-blogs, on-line social networks, photo sharing services and many other applications (e.g., Twitter, Facebook, Flickr, etc.).
In this thesis work we focus on the study, design and development of novel algorithms and frameworks to support large scale data mining activities over huge and heterogeneous data volumes, with a particular focus on Internet data as data source and targeting network tra c classification, on-line social network analysis, recommendation systems and cloud services and Big data
Layered evaluation of interactive adaptive systems : framework and formative methods
Peer reviewedPostprin
- …