9 research outputs found

    IMPLEMENTATION OF DYNAMIC AND FAST MINING ALGORITHMS ON INCREMENTAL DATASETS TO DISCOVER QUALITATIVE RULES

    Get PDF
    Association Rule Mining is an important field in knowledge mining that allows the rules of association needed for decision making. Frequent mining of objects presents a difficulty to huge datasets. As the dataset gets bigger and more time and burden to uncover the rules. In this paper, overhead and time-consuming overhead reduction techniques with an IPOC (Incremental Pre-ordered code) tree structure were examined. For the frequent usage of database mining items, those techniques require highly qualified data structures. FIN (Frequent itemset-Nodeset) employs a node-set, a unique and new data structure to extract frequently used Items and an IPOC tree to store frequent data progressively. Different methods have been modified to analyze and assess time and memory use in different data sets. The strategies suggested and executed shows increased performance when producing rules, using time and efficiency

    Spatial association discovery process using frequent subgraph mining

    Get PDF
    Spatial associations are one of the most relevant kinds of patterns used by business intelligence regarding spatial data. Due to the characteristics of this particular type of information, different approaches have been proposed for spatial association mining. This wide variety of methods has entailed the need for a process to integrate the activities for association discovery, one that is easy to implement and flexible enough to be adapted to any particular situation, particularly for small and medium-size projects to guide the useful pattern discovery process. Thus, this work proposes an adaptable knowledge discovery process that uses graph theory to model different spatial relationships from multiple scenarios, and frequent subgraph mining to discover spatial associations. A proof of concept is presented using real data

    Discovering Dense Correlated Subgraphs in Dynamic Networks

    Full text link
    Given a dynamic network, where edges appear and disappear over time, we are interested in finding sets of edges that have similar temporal behavior and form a dense subgraph. Formally, we define the problem as the enumeration of the maximal subgraphs that satisfy specific density and similarity thresholds. To measure the similarity of the temporal behavior, we use the correlation between the binary time series that represent the activity of the edges. For the density, we study two variants based on the average degree. For these problem variants we enumerate the maximal subgraphs and compute a compact subset of subgraphs that have limited overlap. We propose an approximate algorithm that scales well with the size of the network, while achieving a high accuracy. We evaluate our framework on both real and synthetic datasets. The results of the synthetic data demonstrate the high accuracy of the approximation and show the scalability of the framework.Comment: Full version of the paper included in the proceedings of the PAKDD 2021 conferenc

    Sketch-Based Streaming Anomaly Detection in Dynamic Graphs

    Full text link
    Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges and subgraphs in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? For example, in intrusion detection, existing work seeks to detect either anomalous edges or anomalous subgraphs, but not both. In this paper, we first extend the count-min sketch data structure to a higher-order sketch. This higher-order sketch has the useful property of preserving the dense subgraph structure (dense subgraphs in the input turn into dense submatrices in the data structure). We then propose four online algorithms that utilize this enhanced data structure, which (a) detect both edge and graph anomalies; (b) process each edge and graph in constant memory and constant update time per newly arriving edge, and; (c) outperform state-of-the-art baselines on four real-world datasets. Our method is the first streaming approach that incorporates dense subgraph search to detect graph anomalies in constant memory and time

    A survey on the development status and application prospects of knowledge graph in smart grids

    Full text link
    With the advent of the electric power big data era, semantic interoperability and interconnection of power data have received extensive attention. Knowledge graph technology is a new method describing the complex relationships between concepts and entities in the objective world, which is widely concerned because of its robust knowledge inference ability. Especially with the proliferation of measurement devices and exponential growth of electric power data empowers, electric power knowledge graph provides new opportunities to solve the contradictions between the massive power resources and the continuously increasing demands for intelligent applications. In an attempt to fulfil the potential of knowledge graph and deal with the various challenges faced, as well as to obtain insights to achieve business applications of smart grids, this work first presents a holistic study of knowledge-driven intelligent application integration. Specifically, a detailed overview of electric power knowledge mining is provided. Then, the overview of the knowledge graph in smart grids is introduced. Moreover, the architecture of the big knowledge graph platform for smart grids and critical technologies are described. Furthermore, this paper comprehensively elaborates on the application prospects leveraged by knowledge graph oriented to smart grids, power consumer service, decision-making in dispatching, and operation and maintenance of power equipment. Finally, issues and challenges are summarised.Comment: IET Generation, Transmission & Distributio

    Formulación de procesos para una ingeniería de explotación de información espacial

    Get PDF
    Los proyectos de explotación de información brindan un marco para la ejecución de actividades para la extracción de conocimiento a partir de conjuntos de datos para dar soporte a la toma de decisiones estratégicas. Esta extracción se realiza mediante la implementación de procesos de explotación de información que definen una secuencia lógica de actividades para la manipulación de datos y la búsqueda de patrones de conocimiento. Entre los tipos de datos factibles a ser minados se encuentran los datos espacialmente referenciados que poseen un conjunto de características y propiedades que no son tenidas en cuenta por los procesos de explotación de información previamente definidos. Entre estas características se pueden mencionar la multiplicidad de tipos de representación, la variedad de relaciones espaciales implícitas que pueden ser extraídas a partir de las instancias de los mismos y que suelen ser valiosas en la toma de decisiones, y los fenómenos de autocorrelación y heterogeneidad espacial, que afectan la confianza en los resultados obtenidos. Asimismo, la disponibilidad de herramientas software para el procesamiento de datos espaciales utilizadas en la implementación de los procesos es limitada, por lo que se busca posibilitar esta exploración mediante el uso de técnicas flexibles y presentes en multiplicidad de plataformas al día de la fecha. En este contexto, el trabajo de tesis doctoral propone un conjunto de cuatro procesos de explotación de información espacialmente referenciada: un proceso para el descubrimiento de grupos espaciales, uno para el descubrimiento de anomalías, uno para el descubrimiento de asociaciones espaciales y uno para el descubrimiento de patrones de co-localización. Estos procesos han sido diseñados considerando requerimientos no funcionales de utilidad, referida a la percepción de los usuarios sobre su uso para resolver problemas de negocios en contextos varios, interpretabilidad, relacionado con la facilidad de interpretación de los resultados, y adaptabilidad, considerando distintas formas de llevar a cabo su implementación sobre distintas plataformas. Finalmente, cada propuesta fue demostrada mediante el uso de distintos conjuntos de datos y sus requerimientos no funcionales fueron validados mediante el juicio de expertos, consolidando de esta forma un conjunto de estrategias para la resolución de problemas de negocios que involucran datos espacialmente referenciados considerando todas las particularidades que suceden a raíz de sus características contextuales.Knowledge discovery processes provide a common framework for the execution of activities aimed to retrieve knowledge from datasets to support strategic decisionmaking. This knowledge retrieval is performed by implementing these knowledge discovery processes, which provide a logical sequence of activities for data manipulation in order to find patterns that contribute to finding the solution for problems on particular domains. The possible data types to be mined include spatially-referenced data. This kind of data is linked to a set of properties and characteristics that are not taken into account by previously defined knowledge discovery processes. Some of these characteristics are the multiplicity of spatial data types; the variety of spatial relationships relevant for decision-making that are implicitly present among the instances of spatial objects; and, lastly, spatial autocorrelation and spatial heterogeneity, two properties that affect the trust in spatial data mining results. Furthermore, the amount of software tools for spatial data processing and data mining is limited. Because of this, this work is also aimed at allowing pattern discovery by means of flexible tools that are present in multiple platforms. In this context, this doctoral thesis proposes a set of four processes for spatial knowledge discovery: a process for discovery of spatial groups, a process for discovery of spatial outliers, a process for discovery of spatial associations, and a process for discovery of co-location patterns. These processes have been designed taking into consideration non-functional requirements such as utility, which refers to the user’s perceptions about the possibility of implementing them to solve business problems in different contexts; interpretability, related to the complexity of the outputs; and adaptability, considering different ways of implementing them on several platforms. Lastly, each proposal was demonstrated using real-world datasets and the aforementioned requirements were validated considering experts´opinions. In consequence, a set of new strategies to solve multiple business problems was consolidated, taking into account all the contextual characteristics related to spatial data.Asesor científico: Hernán MerlinoFacultad de Informátic
    corecore