86 research outputs found
Deep Data Analysis on the Web
Search engines are well known to people all over the world. People prefer to use keywords searching to open websites or retrieve information rather than type typical URLs. Therefore, collecting finite sequences of keywords that represent important concepts within a set of authors is important, in other words, we need knowledge mining. We use a simplicial concept method to speed up concept mining. Previous CS 298 project has studied this approach under Dr. Lin. This method is very fast, for example, to mine the concept, FP-growth takes 876 seconds from a database with 1257 columns 65k rows, simplicial complex only takes 5 seconds. The collection of such concepts can be interpreted geometrically into simplicial complex, which can be construed as the knowledge base of this set of documents. Furthermore, we use homology theory to analyze this knowledge base (deep data analysis). For example, in mining market basket data with {a, b, c, d}, we find out frequent item sets {abc, abd, acd, bcd}, and the homology group H2 = Z (the integer Abelian group), which implies that very few customers buy four items together {abcd}, then we may analysis possible causes, etc
Deep Data Analysis on the Web
Search engines are well known to people all over the world. People prefer to use keywords searching to open websites or retrieve information rather than type typical URLs. Therefore, collecting finite sequences of keywords that represent important concepts within a set of authors is important, in other words, we need knowledge mining. We use a simplicial concept method to speed up concept mining. Previous CS 298 project has studied this approach under Dr. Lin. This method is very fast, for example, to mine the concept, FP-growth takes 876 seconds from a database with 1257 columns 65k rows, simplicial complex only takes 5 seconds. The collection of such concepts can be interpreted geometrically into simplicial complex, which can be construed as the knowledge base of this set of documents. Furthermore, we use homology theory to analyze this knowledge base (deep data analysis). For example, in mining market basket data with {a, b, c, d}, we find out frequent item sets {abc, abd, acd, bcd}, and the homology group H2 = Z (the integer Abelian group), which implies that very few customers buy four items together {abcd}, then we may analysis possible causes, etc
Concept Based Semantic Search Engine
In the current day and age, search engines are the most relied on and critical ways to find out information on the World Wide Web (W3). With the ushering in of Big Data, traditional search engines are becoming inept and inadequate at dishing out relevant pages. It has become increasingly difficult to locate meaningful results from the mind boggling list of returns typical of returned search queries. Keywords, often times, alone cannot capture the intended concept with high precision. These and associated issues with the current search engines call for a more powerful and holistic search engine capability. The current project presents a new approach to resolving this widely relevant problem - a concept based search engine. It is known that a collection of concepts naturally forms a polyhedron. Combinatorial topology is, thus, used to manipulate the polyhedron of concepts that are mined from W3. Based on this triangulated polyhedron, the concepts are clustered together based on primitive concepts that are geometrically, simplexes of maximal dimensions. Such clustering is different from conventional clustering since the proposed model may have overlapping. Based on such clustering, the search results can then be categorized and users allowed to select a category more apt to their needs. The results displayed are based on aforementioned categorization thereby leading to more sharply gathered and, thus, semantically related relevant information
Scaling Up Network Analysis and Mining: Statistical Sampling, Estimation, and Pattern Discovery
Network analysis and graph mining play a prominent role in providing insights and studying phenomena across various domains, including social, behavioral, biological, transportation, communication, and financial domains. Across all these domains, networks arise as a natural and rich representation for data. Studying these real-world networks is crucial for solving numerous problems that lead to high-impact applications. For example, identifying the behavior and interests of users in online social networks (e.g., viral marketing), monitoring and detecting virus outbreaks in human contact networks, predicting protein functions in biological networks, and detecting anomalous behavior in computer networks. A key characteristic of these networks is that their complex structure is massive and continuously evolving over time, which makes it challenging and computationally intensive to analyze, query, and model these networks in their entirety. In this dissertation, we propose sampling as well as fast, efficient, and scalable methods for network analysis and mining in both static and streaming graphs
A cloud-based smart metering infrastructure for distribution grid services and automation
© 2017 The Authors The evolution of the power systems towards the smart grid paradigm is strictly dependent on the modernization of distribution grids. To achieve this target, new infrastructures, technologies and applications are increasingly required. This paper presents a smart metering infrastructure that unlocks a large set of possible services aimed at the automation and management of distribution grids. The proposed architecture is based on a cloud solution, which allows the communication with the smart meters from one side and provides the needed interfaces to the distribution grid services on the other one. While a large number of applications can be designed on top of the cloud, in this paper the focus will be on a real-time distributed state estimation algorithm that enables the automatic reconfiguration of the grid. The paper will present the key role of the cloud solution for obtaining scalability, interoperability and flexibility, and for enabling the integration of different services for the automation of the distribution system. The distributed state estimation algorithm and the automatic network reconfiguration will be presented as an example of coordinated operation of different distribution grid services through the cloud
Distributed k-core view materialization and maintenance for large dynamic graphs
Cataloged from PDF version of article.In graph theory, k-core is a key metric used to identify subgraphs of high cohesion, also known as the ‘dense’
regions of a graph. As the real world graphs such as social network graphs grow in size, the contents get richer and the
topologies change dynamically, we are challenged not only to materialize k-core subgraphs for one time but also to maintain
them in order to keep up with continuous updates. Adding to the challenge is that real world data sets are outgrowing the
capacity of a single server and its main memory. These challenges inspired us to propose a new set of distributed algorithms
for k-core view construction and maintenance on a horizontally scaling storage and computing platform. Our algorithms execute
against the partitioned graph data in parallel and take advantage of k-core properties to aggressively prune unnecessary
computation. Experimental evaluation results demonstrated orders of magnitude speedup and advantages of maintaining k-core
incrementally and in batch windows over complete reconstruction. Our algorithms thus enable practitioners to create and
maintain many k-core views on different topics in rich social network content simultaneously
Learning in Dynamic Data-Streams with a Scarcity of Labels
Analysing data in real-time is a natural and necessary progression from traditional data mining. However, real-time analysis presents additional challenges to batch-analysis; along with strict time and memory constraints, change is a major consideration. In a dynamic stream there is an assumption that the underlying process generating the stream is non-stationary and that concepts within the stream will drift and change over time. Adopting a false assumption that a stream is stationary will result in non-adaptive models degrading and eventually becoming obsolete. The challenge of recognising and reacting to change in a stream is compounded by the scarcity of labels problem. This refers to the very realistic situation in which the true class label of an incoming point is not immediately available (or will never be available) or in situations where manually labelling incoming points is prohibitively expensive. The goal of this thesis is to evaluate unsupervised learning as the basis for online classification in dynamic data-streams with a scarcity of labels. To realise this goal, a novel stream clustering algorithm based on the collective behaviour of ants (Ant Colony Stream Clustering (ACSC)) is proposed. This algorithm is shown to be faster and more accurate than comparative, peer stream-clustering algorithms while requiring fewer sensitive parameters. The principles of ACSC are extended in a second stream-clustering algorithm named Multi-Density Stream Clustering (MDSC). This algorithm has adaptive parameters and crucially, can track clusters and monitor their dynamic behaviour over time. A novel technique called a Dynamic Feature Mask (DFM) is proposed to ``sit on top’’ of these stream-clustering algorithms and can be used to observe and track change at the feature level in a data stream. This Feature Mask acts as an unsupervised feature selection method allowing high-dimensional streams to be clustered. Finally, data-stream clustering is evaluated as an approach to one-class classification and a novel framework (named COCEL: Clustering and One class Classification Ensemble Learning) for classification in dynamic streams with a scarcity of labels is described. The proposed framework can identify and react to change in a stream and hugely reduces the number of required labels (typically less than 0.05% of the entire stream)
Modelling and Design of Resilient Networks under Challenges
Communication networks, in particular the Internet, face a variety of challenges that can disrupt our daily lives resulting in the loss of human lives and significant financial costs in the worst cases. We define challenges as external events that trigger faults that eventually result in service failures. Understanding these challenges accordingly is essential for improvement of the current networks and for designing Future Internet architectures. This dissertation presents a taxonomy of challenges that can help evaluate design choices for the current and Future Internet. Graph models to analyse critical infrastructures are examined and a multilevel graph model is developed to study interdependencies between different networks. Furthermore, graph-theoretic heuristic optimisation algorithms are developed. These heuristic algorithms add links to increase the resilience of networks in the least costly manner and they are computationally less expensive than an exhaustive search algorithm. The performance of networks under random failures, targeted attacks, and correlated area-based challenges are evaluated by the challenge simulation module that we developed. The GpENI Future Internet testbed is used to conduct experiments to evaluate the performance of the heuristic algorithms developed
Applied Metaheuristic Computing
For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC
- …