1,729 research outputs found

    Clustering based on Random Graph Model embedding Vertex Features

    Full text link
    Large datasets with interactions between objects are common to numerous scientific fields (i.e. social science, internet, biology...). The interactions naturally define a graph and a common way to explore or summarize such dataset is graph clustering. Most techniques for clustering graph vertices just use the topology of connections ignoring informations in the vertices features. In this paper, we provide a clustering algorithm exploiting both types of data based on a statistical model with latent structure characterizing each vertex both by a vector of features as well as by its connectivity. We perform simulations to compare our algorithm with existing approaches, and also evaluate our method with real datasets based on hyper-textual documents. We find that our algorithm successfully exploits whatever information is found both in the connectivity pattern and in the features

    Bayesian Methods and Machine Learning for Processing Text and Image Data

    Get PDF
    Classification/clustering is an important class of unstructured data processing problems. The classification (supervised, semi-supervised and unsupervised) aims to discover the clusters and group the similar data into categories for information organization and knowledge discovery. My work focuses on using the Bayesian methods and machine learning techniques to classify the free-text and image data, and address how to overcome the limitations of the traditional methods. The Bayesian approach provides a way to allow using more variations(numerical or categorical), and estimate the probabilities instead of explicit rules, which will benefit in the ambiguous cases. The MAP(maximum a posterior) estimation is used to deal with the local maximum problems which the ML(maximum likelihood) method gives inaccurate estimates. The EM(expectation-maximization) algorithm can be applied with MAP estimation for the incomplete/missing data problems. Our proposed framework can be used in both supervised and unsupervised classification. For natural language processing(NLP), we applied the machine learning techniques for sentence/text classification. For 3D CT image segmentation, MAP EM clustering approach is proposed to auto-detect the number of objects in the 3D CT luggage image, and the prior knowledge and constraints in MAP estimation are used to avoid/improve the local maximum problems. The algorithm can automatically determine the number of classes and find the optimal parameters for each class. As a result, it can automatically detect the number of objects and produce better segmentation for each object in the image. For segmented object recognition, we applied machine learning techniques to classify each object into targets or non-targets. We have achieved the good results with 90% PD(probability of detection) and 6% PFA(probability of false alarm). For image restoration, in X-ray imaging, scatter can produce noise, artifacts, and decreased contrast. In practice, hardware such as anti-scatter grid is often used to reduce scatter. However, the remaining scatter can still be significant and additional software-based correction is desirable. Furthermore, good software solutions can potentially reduce the amount of needed anti-scatter hardware, thereby reducing cost. In this work, the scatter correction is formulated as a Bayesian MAP (maximum a posteriori) problem with a non-local prior, which leads to better textural detail preservation in scatter reduction. The efficacy of our algorithm is demonstrated through experimental and simulation results

    Bayesian Item Response Modeling in R with brms and Stan

    Get PDF
    Item Response Theory (IRT) is widely applied in the human sciences to model persons' responses on a set of items measuring one or more latent constructs. While several R packages have been developed that implement IRT models, they tend to be restricted to respective prespecified classes of models. Further, most implementations are frequentist while the availability of Bayesian methods remains comparably limited. We demonstrate how to use the R package brms together with the probabilistic programming language Stan to specify and fit a wide range of Bayesian IRT models using flexible and intuitive multilevel formula syntax. Further, item and person parameters can be related in both a linear or non-linear manner. Various distributions for categorical, ordinal, and continuous responses are supported. Users may even define their own custom response distribution for use in the presented framework. Common IRT model classes that can be specified natively in the presented framework include 1PL and 2PL logistic models optionally also containing guessing parameters, graded response and partial credit ordinal models, as well as drift diffusion models of response times coupled with binary decisions. Posterior distributions of item and person parameters can be conveniently extracted and post-processed. Model fit can be evaluated and compared using Bayes factors and efficient cross-validation procedures.Comment: 54 pages, 16 figures, 3 table

    A unified formal framework for factorial and probabilistic topic modelling

    Get PDF
    Topic modelling has become a highly popular technique for extracting knowledge from texts. It encompasses various method families, including Factorial methods, Probabilistic methods, and Natural Language Processing methods. This paper introduces a unified conceptual framework for Factorial and Probabilistic methods by identifying shared elements and representing them using a homogeneous notation. The paper presents 12 different methods within this framework, enabling easy comparative analysis to assess the flexibility and how realistic the assumptions of each approach are. This establishes the initial stage of a broader analysis aimed at relating all method families to this common framework, comprehensively understanding their strengths and weaknesses, and establishing general application guidelines. Also, an experimental setup reinforces the convenience of having harmonized notational schema. The paper concludes with a discussion on the presented methods and outlines future research directions.Peer ReviewedPostprint (published version

    Availability, outage, and capacity of spatially correlated, Australasian free-space optical networks

    Full text link
    Network capacity and reliability for free space optical communication (FSOC) is strongly driven by ground station availability, dominated by local cloud cover causing an outage, and how availability relations between stations produce network diversity. We combine remote sensing data and novel methods to provide a generalised framework for assessing and optimising optical ground station networks. This work is guided by an example network of eight Australian and New Zealand optical communication ground stations which would span approximately 60∘60^\circ in longitude and 20∘20^\circ in latitude. Utilising time-dependent cloud cover data from five satellites, we present a detailed analysis determining the availability and diversity of the network, finding the Australasian region is well-suited for an optical network with a 69% average site availability and low spatial cloud cover correlations. Employing methods from computational neuroscience, we provide a Monte Carlo method for sampling the joint probability distribution of site availabilities for an arbitrarily sized and point-wise correlated network of ground stations. Furthermore, we develop a general heuristic for site selection under availability and correlation optimisations, and combine this with orbital propagation simulations to compare the data capacity between optimised networks and the example network. We show that the example network may be capable of providing tens of terabits per day to a LEO satellite, and up to 99.97% reliability to GEO satellites. We therefore use the Australasian region to demonstrate novel, generalised tools for assessing and optimising FSOC ground station networks, and additionally, the suitability of the region for hosting such a network.Comment: Accepted in Journal of Optical Communications and Networking. 16 pages, 16 figure
    • …
    corecore