13 research outputs found

    Online Spectral Clustering on Network Streams

    Get PDF
    Graph is an extremely useful representation of a wide variety of practical systems in data analysis. Recently, with the fast accumulation of stream data from various type of networks, significant research interests have arisen on spectral clustering for network streams (or evolving networks). Compared with the general spectral clustering problem, the data analysis of this new type of problems may have additional requirements, such as short processing time, scalability in distributed computing environments, and temporal variation tracking. However, to design a spectral clustering method to satisfy these requirements certainly presents non-trivial efforts. There are three major challenges for the new algorithm design. The first challenge is online clustering computation. Most of the existing spectral methods on evolving networks are off-line methods, using standard eigensystem solvers such as the Lanczos method. It needs to recompute solutions from scratch at each time point. The second challenge is the parallelization of algorithms. To parallelize such algorithms is non-trivial since standard eigen solvers are iterative algorithms and the number of iterations can not be predetermined. The third challenge is the very limited existing work. In addition, there exists multiple limitations in the existing method, such as computational inefficiency on large similarity changes, the lack of sound theoretical basis, and the lack of effective way to handle accumulated approximate errors and large data variations over time. In this thesis, we proposed a new online spectral graph clustering approach with a family of three novel spectrum approximation algorithms. Our algorithms incrementally update the eigenpairs in an online manner to improve the computational performance. Our approaches outperformed the existing method in computational efficiency and scalability while retaining competitive or even better clustering accuracy. We derived our spectrum approximation techniques GEPT and EEPT through formal theoretical analysis. The well established matrix perturbation theory forms a solid theoretic foundation for our online clustering method. We facilitated our clustering method with a new metric to track accumulated approximation errors and measure the short-term temporal variation. The metric not only provides a balance between computational efficiency and clustering accuracy, but also offers a useful tool to adapt the online algorithm to the condition of unexpected drastic noise. In addition, we discussed our preliminary work on approximate graph mining with evolutionary process, non-stationary Bayesian Network structure learning from non-stationary time series data, and Bayesian Network structure learning with text priors imposed by non-parametric hierarchical topic modeling

    Байєсівські мережі в системах підтримки прийняття рішень

    Get PDF
    Пропонується докладне висвітлення сучасних підходів до моделювання процесів довільної природи за допомогою байєсівських мереж (БМ) і дерев рішень. Байєсівська мережа – ймовірнісна модель, преставлена у формі спрямованого ациклічного графа, вершинами якого є змінні досліджуваного процесу. БМ – потужний сучасний інструмент моделювання процесів та об’єктів, які функціонують в умовах наявності невизначеностей довільної природи. Їх успішно використовують для розв’язання задач прогнозування, передбачення, медичної і технічної діагностики, прийняття управлінських рішень, автоматичного керування і т. ін. Розглянуто теорію побудови байєсівських мереж, яка включає задачі навчання структури мережі та формування ймовірнісного висновку на її основі. Наведено практичні методики побудови (оцінювання) структури мережі на основі статистичних даних і експертних оцінок. Докладно описано відповідні алгоритмічні процедури. Окремо розглянуто варіанти використання дискретних і неперервних змінних, а також можливості створення гібридної мережі. Наведено кілька методів обчислення ймовірнісного висновку за допомогою побудованої мережі, у тому числі методи формування точного і наближеного висновків. Докладно розглянуто приклади розв’язання практичних задач за допомогою мереж Байєса. Зокрема, задачі моделювання, прогнозування і розпізнавання образів. Наведено перелік відомих програмних продуктів та їх виробників для побудови та застосування байєсівських мереж, частина з яких є повністю доступними для використання у мережі Інтернет. Деякі системи можна доповнювати новими програмними модулями. Книга рекомендується як навчальний посібник для студентів, аспірантів та викладачів, а також для інженерів, які спеціалізуються у галузі розв’язання задач ймовірнісного математичного моделювання, прогнозування, передбачення і розпізнавання образів процесів довільної природи, інформація стосовно який представлена статистичними даними та експертними оцінками

    Learning Bayesian network equivalence classes using ant colony optimisation

    Get PDF
    Bayesian networks have become an indispensable tool in the modelling of uncertain knowledge. Conceptually, they consist of two parts: a directed acyclic graph called the structure, and conditional probability distributions attached to each node known as the parameters. As a result of their expressiveness, understandability and rigorous mathematical basis, Bayesian networks have become one of the first methods investigated, when faced with an uncertain problem domain. However, a recurring problem persists in specifying a Bayesian network. Both the structure and parameters can be difficult for experts to conceive, especially if their knowledge is tacit.To counteract these problems, research has been ongoing, on learning both the structure and parameters of Bayesian networks from data. Whilst there are simple methods for learning the parameters, learning the structure has proved harder. Part ofthis stems from the NP-hardness of the problem and the super-exponential space of possible structures. To help solve this task, this thesis seeks to employ a relatively new technique, that has had much success in tackling NP-hard problems. This technique is called ant colony optimisation. Ant colony optimisation is a metaheuristic based on the behaviour of ants acting together in a colony. It uses the stochastic activity of artificial ants to find good solutions to combinatorial optimisation problems. In the current work, this method is applied to the problem of searching through the space of equivalence classes of Bayesian networks, in order to find a good match against a set of data. The system uses operators that evaluate potential modifications to a current state. Each of the modifications is scored and the results used to inform the search. In order to facilitate these steps, other techniques are also devised, to speed up the learning process. The techniques includeThe techniques are tested by sampling data from gold standard networks and learning structures from this sampled data. These structures are analysed using various goodnessof-fit measures to see how well the algorithms perform. The measures include structural similarity metrics and Bayesian scoring metrics. The results are compared in depth against systems that also use ant colony optimisation and other methods, including evolutionary programming and greedy heuristics. Also, comparisons are made to well known state-of-the-art algorithms and a study performed on a real-life data set. The results show favourable performance compared to the other methods and on modelling the real-life data

    Self-Confidence Measures of a Decision Support System Based on Bayesian Networks

    Get PDF
    A prominent formalism used in decision support is decision theory, which relies on probability theory to model uncertainty about unknown information. A decision support system relying on this theory produces conditional probability as a response. The quality of a decision support system's response depends on three key factors: the amount of data available to train the model, the amount of information about the case at hand, and the adequacy of the system's model to the case at hand. In this dissertation, I investigate different approaches to measuring the confidence of decision support systems based on Bayesian networks addressing the three key factors mentioned above. Some of such confidence measures of the system response have been already proposed. I propose and discuss other measures based on analysis of joint probability distribution encoded by a Bayesian network. The main contribution of this dissertation is the analysis of the discussed measures whether they provide useful information about the performance of a Bayesian network model. I start the analysis with an investigation of interactions among these measures. Then, I investigate whether confidence measures help us predict an erroneous response of a classifier based on Bayesian networks when applied to a particular case. The results suggest that the discussed measures may be considered as indicators for possible mistakes in classification. Further, I conduct an experiment to check how confidence measures perform in combining the models' output in the ensemble of classifiers by weighting. Based on the findings presented in this dissertation, I conclude that the confidence measures may enrich the decision support system's output to serve as indicators for applicability of the model and its advice to a given case
    corecore