13 research outputs found

    Mining Frequent Neighborhood Patterns in Large Labeled Graphs

    Full text link
    Over the years, frequent subgraphs have been an important sort of targeted patterns in the pattern mining literatures, where most works deal with databases holding a number of graph transactions, e.g., chemical structures of compounds. These methods rely heavily on the downward-closure property (DCP) of the support measure to ensure an efficient pruning of the candidate patterns. When switching to the emerging scenario of single-graph databases such as Google Knowledge Graph and Facebook social graph, the traditional support measure turns out to be trivial (either 0 or 1). However, to the best of our knowledge, all attempts to redefine a single-graph support resulted in measures that either lose DCP, or are no longer semantically intuitive. This paper targets mining patterns in the single-graph setting. We resolve the "DCP-intuitiveness" dilemma by shifting the mining target from frequent subgraphs to frequent neighborhoods. A neighborhood is a specific topological pattern where a vertex is embedded, and the pattern is frequent if it is shared by a large portion (above a given threshold) of vertices. We show that the new patterns not only maintain DCP, but also have equally significant semantics as subgraph patterns. Experiments on real-life datasets display the feasibility of our algorithms on relatively large graphs, as well as the capability of mining interesting knowledge that is not discovered in prior works.Comment: 9 page

    Framework for Automatic Bug Classification in Bug Triage System

    Get PDF
    A Software bug is an error, flaw, failure or fault in a computer program or system that causes it to produce an incorrect or unexpected result. When bugs arise, we have to fix them which is not easy. Most of the companies spend 40% of cost to fixing bugs. The process of fixing bug is bug triage or bug assortment. Triagingthis incoming report manually is error prone and time consuming .Software companies spend most of their cost in dealing with these bugs. In this paper we classifying the bugs so that we can determine the class of the bug at which class that bug is belongs and after applying the classification we can assign the particular bug to the exact developer for fixing them. This is efficient. In this paper we are using combination of two classification techniques ,na�ve Bayes (NB) and k nearest neighbor(KNN).In modern days company uses automatic bug triaging system but in Traditional manual Triaging system is used which is not efficient and taking too much time .For triaging the bug we require bug detail which is called bug repository. In this paper we also reducing the bug dataset because if we having more data with unused information which causes problem to assigning bugs. For implementing this we use instance selection and feature selection for reducing bug data. This paper describe the whole procedure of bug allotment from starting to end and at last result will show on the basis of graph .Graph represents the maximum possibility of class means at which class the bug will belongs

    Towards Effective Bug Triage with Software Data Reduction Techniques

    Get PDF
    International audienceSoftware companies spend over 45 percent of cost in dealing with software bugs. An inevitable step of fixing bugs is bug triage, which aims to correctly assign a developer to a new bug. To decrease the time cost in manual work, text classification techniques are applied to conduct automatic bug triage. In this paper, we address the problem of data reduction for bug triage, i.e., how to reduce the scale and improve the quality of bug data. We combine instance selection with feature selection to simultaneously reduce data scale on the bug dimension and the word dimension. To determine the order of applying instance selection and feature selection, we extract attributes from historical bug data sets and build a predictive model for a new bug data set. We empirically investigate the performance of data reduction on totally 600,000 bug reports of two large open source projects, namely Eclipse and Mozilla. The results show that our data reduction can effectively reduce the data scale and improve the accuracy of bug triage. Our work provides an approach to leveraging techniques on data processing to form reduced and high-quality bug data in software development and maintenance

    Modeling Scholar Profile in Expert Recommendation based on Multi-Layered Bibliographic Graph

    Get PDF
    A recommendation system requires the profile of researchers which called here as Scholar Profile for suggestions based on expertise. This dissertation contributes on modeling unbiased scholar profile for more objective expertise evidence that consider interest changes and less focused on citations. Interest changes lead to diverse topics and make the expertise levels on topics differ. Scholar profile is expected to capture expertise in terms of productivity aspect which often signified from the volume of publications and citations. We include researcher behavior in publishing articles to avoid misleading citation. Therefore, the expertise levels of researchers on topics is influenced by interest evolution, productivity, dynamicity, and behavior extracted from bibliographic data of published scholarly articles. As this dissertation output, the scholar profile model employed within a recommendation system for recommending productive researchers who provide academic guidance. The scholar profile is generated from multi layers of bibliographic data, such as layers of author, topic, and relations between those layers to represent academic social network. There is no predefined information of topics in a cold-start situation, such that procedures of topic mapping are necessary. Then, features of productivity, dynamicity and behavior of researchers within those layers are taken from some observed years to accommodate the behavior aspect. We experimented with AMiner dataset often used in the following bibliographic data related studies to empirically investigate: (a) topic mapping strategies to obtain interest of researchers, (b) feature extraction model for productivity, dynamicity, and behavior aspects based on the mapped topics, and (c) expertise rank that considers interest changes and less focused on citations from the scholar profile. Ensuring the validity results, our experiments worked on standard expert list of AMiner researchers. We selected Natural Language Processing and Information Extraction (NLP-IE) domains because of their familiarity and interrelated context to make it easier for introducing cases of interest changes. Using the mapped topics, we also made minor contributions on transformation procedures for visualizing researchers on maps of Scopus subjects and investigating the possibilities of conflict of interest

    On the application of graph neural networks for indoor positioning systems

    Get PDF
    Due to the inability of GPS (Global Positioning System) or other GNSS (Global Navigation Satellite System) methods to provide satisfactory precision for the indoor localization scenario, indoor positioning systems resort to other signals already available on-site, typically Wi-Fi given its ubiquity. However, instead of relying on an error-prone propagation model as in ranging methods, the popular fingerprinting positioning technique considers a more direct data-driven approach to the problem. First of all, the area of interest is divided into zones, and then a machine learning algorithm is trained to map between, for instance, power measurements from Access Points (APs) to the localization zone, thus effectively turning the problem into a classification one. However, although the positioning problem is a geometrical one, virtually all methods proposed in the literature disregard the underlying structure of the data, using generic machine learning algorithms. In this work we consider instead a graph-based learning method, Graph Neural Networks, a paradigm that has emerged in the last few years and constitutes the state-of-the-art for several problems. After presenting the pertinent theoretical background, we discuss two possibilities to construct the underlying graph for the positioning problem. We then perform a thorough evaluation of both possibilities, and compare it with some of the most popular machine learning alternatives. The main conclusion is that these graph-based methods obtain systematically better results, particularly with regards to practical aspects (e.g. gracefully tolerating faulty APs), which makes them a serious candidate to consider when deploying positioning systems.Debido a la incapacidad del GPS (Global Positioning System o Sistema de Posicionamiento Global) o de otros métodos de navegación por satélite (GNSS por sus siglas en inglés) de proporcionar un posicionamiento en espacios interiores con suficiente precisión, se suele recurrir a otras señales ya disponibles en el lugar, típicamente Wi-Fi por su gran adopción. Existen diversas técnincas que utilizan la señal de Wi-Fi para realizar el posicionamiento modelando la propagación de la señal para alcanzar el objetivo. Sin embargo, debido a su alta complejidad, estos modelos de propagación son propensos a errores. Una alternativa que se popularizó es el posicionamiento en base a huellas (fingerprinting) que considera un enfoque basado en datos más directo al problema. El método consiste en dividir el área de interés en zonas y entrenar un algoritmo de aprendizaje automático para establecer una relación entre, por ejemplo, las mediciones de potencia de los puntos de acceso (Access Points o APs) y la zona de localización, convirtiéndose así en un problema de clasificación. Si bien el problema de posicionamiento es en última instancia un problema geométrico, prácticamente todos los métodos propuestos en la literatura ignoran la estructura subyacente de los datos, utilizando para su resolución algoritmos genéricos de aprendizaje automático. Este trabajo propone utilizar un método de aprendizaje basado en grafos (Graph Neural Networks o GNN), un paradigma que ha surgido en los últimos años y que constituye el estado del arte para varios problemas. Tras presentar el marco teórico pertinente, discutimos dos posibilidades para construir el grafo subyacente al problema de posicionamiento. A continuación realizamos una evaluación exhaustiva de ambas posibilidades y las comparamos con algunas de las alternativas más populares de aprendizaje automático. La principal conclusión es que estos métodos basados en grafos obtienen sistemáticamente mejores resultados, especialmente en lo que respecta a los aspectos prácticos (por ejemplo, tolerar fallas en APs), lo que los convierte en excelentes candidatos a considerar a la hora de diseñar e implementar sistemas de posicionamiento
    corecore