Search CORE

3,977 research outputs found

個人化情報活用のためのパーソナルデータと挙動解析による統合モデリング手法

Author: Zhou Xiaokang
周暁康
Publication venue
Publication date: 01/01/2014
Field of study

早大学位記番号:新6782早稲田大

Waseda University Repository

Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

Author: Zhuge Hai
Publication venue
Publication date: 18/07/2015
Field of study

Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

arXiv.org e-Print Archive

CiteSeerX

Exploiting Emergence of New Topics via Anamoly Detection: A Survey

Author: Miss. S.V.Saswade,Prof. S. S. Nandgaonkar
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/12/2014
Field of study

Detecting and generating new concepts has attracted much attention in data mining era, nowadays. The emergence of new topics in news data is a big challenge. The problem can be extended as “finding breaking news”. Years ago the emergence of new stories were detected and followed up by domain experts. But manually reading stories and concluding the misbehaviors is a critical and time consuming task. Further mapping these misbehaviors to various stories needs excellent knowledge about the news and old concepts. So automatically modeling breaking news has much interest in data mining. The anomalies in news published in newspapers are the basic clues for concluding the emergence of a new story(s). The anomalies are the keywords or phrases which doesn’t match the whole concept of the news. These anomalies then processed and mapped to the stories where these keywords and phrases doesn’t behave as anomalies. After mapping these anomalies one can conclude that these mapped topic by anomaly linking can generate a new concept which eventually can be modeled as emerging story. We survey some techniques which can be used to efficiently model the new concept. News Classification, Anomaly Detection, Concept Detection and Generation are some of those techniques which collectively can be the basics of modeling breaking news. We further discussed some data sources which can process and used as input stories or news for modeling emergence of new stories

International Journal on Recent and Innovation Trends in Computing and Communication

Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications (Extended Version)

Author: Alderson David
Doyle John C.
Li Lun
Tanaka Reiko
Willinger Walter
Publication venue
Publication date: 09/01/2005
Field of study

Although the ``scale-free'' literature is large and growing, it gives neither a precise definition of scale-free graphs nor rigorous proofs of many of their claimed properties. In fact, it is easily shown that the existing theory has many inherent contradictions and verifiably false claims. In this paper, we propose a new, mathematically precise, and structural definition of the extent to which a graph is scale-free, and prove a series of results that recover many of the claimed properties while suggesting the potential for a rich and interesting theory. With this definition, scale-free (or its opposite, scale-rich) is closely related to other structural graph properties such as various notions of self-similarity (or respectively, self-dissimilarity). Scale-free graphs are also shown to be the likely outcome of random construction processes, consistent with the heuristic definitions implicit in existing random graph approaches. Our approach clarifies much of the confusion surrounding the sensational qualitative claims in the scale-free literature, and offers rigorous and quantitative alternatives.Comment: 44 pages, 16 figures. The primary version is to appear in Internet Mathematics (2005

arXiv.org e-Print Archive

Caltech Authors

A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

Author: Havinga P.J.M.
Meratnia N.
Zhang Yang
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

University of Twente Research Information

Recommended from our members

The inner and inter construct associations of the quality of data warehouse customer relationship data for problem enactment

Author: Abril Raul Mario
Publication venue
Publication date: 01/01/2005
Field of study

This thesis was submitted for the degree of Doctor of Business Administration and awarded by Brunel University on behalf of Henley Management College.The literature identifies perceptions of data quality as a key factor influencing a wide range of attitudes and behaviors related to data in organizational settings (e.g. decision confidence). In particular, there is an overwhelming consensus that effective customer relationship management, CRM, depends on the quality of customer data. Data warehouses, if properly implemented, enable data integration which is a key attribute of data quality. The literature highlights the relevance of formulating problem statements because this will determine the course of action. CRM managers formulate problem statements through a cognitive process known as enactment. The literature on data quality is very fragmented. It posits that this construct is of a high order nature (it is dimensional), it is contextual and situational, and it is closely linked to a utilitarian value. This study addresses all these disperse views of the nature of data quality from a holistic perspective. Social cognitive theory, SCT, is the backbone for studying data quality in terms of information search behavior and enhancements in formulating problem statements. The main objective of this study is to explore the nature of a data warehouse's customer relationship data quality in situations where there is a need for understanding a customer relationship problem. The research question is What are the inner and inter construct associations of the quality of data warehouse customer relationship data for problem enactment? To reach this objective, a positivistic approach was adopted complemented with qualitative interventions along the research process. Observations were gathered with a survey. Scales were adjusted using a construct-based approach. Research findings confirm that data quality is a high order construct with a contextual dimension and a situational dimension. Problem sense making enhancements is a dependent variable of data quality in a confirmed positive association between both constructs. Problem sense making enhancements is also a high order construct with a mastering experience dimension and a self-efficacy dimension. Behavioral patterns for information search mode (scanning mode orientation vs. focus mode orientation) and for information search heuristic (template heuristic orientation vs. trial-and-error heuristic orientation) have been identified. Focus is the predominant information search mode orientation and template is the predominant information search heuristic orientation. Overall, the research findings support the associations advocated by SCT. The self-efficacy dimension in problem sense making enhancements is a discriminant for information search mode orientation (focus mode orientation vs. scanning mode orientation). The contextual dimension in data quality (i.e. data task utility) is a discriminant for information search heuristic (template heuristic orientation vs. trial-and-error heuristic orientation). A data quality cognitive metamodel and a data quality for problem enactment model are suggested for research in the areas of data quality, information search behavior, and cognitive enhancements. iiiTeradata, NC

Brunel University Research Archive

Data Mining Algorithms for Internet Data: from Transport to Application Layer

Author: GRIMAUDO LUIGI
Publication venue: country:Italy
Publication date: 01/01/2014
Field of study

Nowadays we live in a data-driven world. Advances in data generation, collection and storage technology have enabled organizations to gather data sets of massive size. Data mining is a discipline that blends traditional data analysis methods with sophisticated algorithms to handle the challenges posed by these new types of data sets. The Internet is a complex and dynamic system with new protocols and applications that arise at a constant pace. All these characteristics designate the Internet a valuable and challenging data source and application domain for a research activity, both looking at Transport layer, analyzing network tra c flows, and going up to Application layer, focusing on the ever-growing next generation web services: blogs, micro-blogs, on-line social networks, photo sharing services and many other applications (e.g., Twitter, Facebook, Flickr, etc.). In this thesis work we focus on the study, design and development of novel algorithms and frameworks to support large scale data mining activities over huge and heterogeneous data volumes, with a particular focus on Internet data as data source and targeting network tra c classification, on-line social network analysis, recommendation systems and cloud services and Big data

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Layered evaluation of interactive adaptive systems : framework and formative methods

Author: Masthoff Judith
Paramythis Alexandros
Weibelzahl Stephan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/11/2010
Field of study

Peer reviewedPostprin

Aberdeen University Research

Crossref