1,729 research outputs found
Clustering based on Random Graph Model embedding Vertex Features
Large datasets with interactions between objects are common to numerous
scientific fields (i.e. social science, internet, biology...). The interactions
naturally define a graph and a common way to explore or summarize such dataset
is graph clustering. Most techniques for clustering graph vertices just use the
topology of connections ignoring informations in the vertices features. In this
paper, we provide a clustering algorithm exploiting both types of data based on
a statistical model with latent structure characterizing each vertex both by a
vector of features as well as by its connectivity. We perform simulations to
compare our algorithm with existing approaches, and also evaluate our method
with real datasets based on hyper-textual documents. We find that our algorithm
successfully exploits whatever information is found both in the connectivity
pattern and in the features
Bayesian Methods and Machine Learning for Processing Text and Image Data
Classification/clustering is an important class of unstructured data processing problems. The classification (supervised, semi-supervised and unsupervised) aims to discover the clusters and group the similar data into categories for information organization and knowledge discovery. My work focuses on using the Bayesian methods and machine learning techniques to classify the free-text and image data, and address how to overcome the limitations of the traditional methods. The Bayesian approach provides a way to allow using more variations(numerical or categorical), and estimate the probabilities instead of explicit rules, which will benefit in the ambiguous cases. The MAP(maximum a posterior) estimation is used to deal with the local maximum problems which the ML(maximum likelihood) method gives inaccurate estimates. The EM(expectation-maximization) algorithm can be applied with MAP estimation for the incomplete/missing data problems. Our proposed framework can be used in both supervised and unsupervised classification. For natural language processing(NLP), we applied the machine learning techniques for sentence/text classification. For 3D CT image segmentation, MAP EM clustering approach is proposed to auto-detect the number of objects in the 3D CT luggage image, and the prior knowledge and constraints in MAP estimation are used to avoid/improve the local maximum problems. The algorithm can automatically determine the number of classes and find the optimal parameters for each class. As a result, it can automatically detect the number of objects and produce better segmentation for each object in the image. For segmented object recognition, we applied machine learning techniques to classify each object into targets or non-targets. We have achieved the good results with 90% PD(probability of detection) and 6% PFA(probability of false alarm). For image restoration, in X-ray imaging, scatter can produce noise, artifacts, and decreased contrast. In practice, hardware such as anti-scatter grid is often used to reduce scatter. However, the remaining scatter can still be significant and additional software-based correction is desirable. Furthermore, good software solutions can potentially reduce the amount of needed anti-scatter hardware, thereby reducing cost. In this work, the scatter correction is formulated as a Bayesian MAP (maximum a posteriori) problem with a non-local prior, which leads to better textural detail preservation in scatter reduction. The efficacy of our algorithm is demonstrated through experimental and simulation results
Bayesian Item Response Modeling in R with brms and Stan
Item Response Theory (IRT) is widely applied in the human sciences to model
persons' responses on a set of items measuring one or more latent constructs.
While several R packages have been developed that implement IRT models, they
tend to be restricted to respective prespecified classes of models. Further,
most implementations are frequentist while the availability of Bayesian methods
remains comparably limited. We demonstrate how to use the R package brms
together with the probabilistic programming language Stan to specify and fit a
wide range of Bayesian IRT models using flexible and intuitive multilevel
formula syntax. Further, item and person parameters can be related in both a
linear or non-linear manner. Various distributions for categorical, ordinal,
and continuous responses are supported. Users may even define their own custom
response distribution for use in the presented framework. Common IRT model
classes that can be specified natively in the presented framework include 1PL
and 2PL logistic models optionally also containing guessing parameters, graded
response and partial credit ordinal models, as well as drift diffusion models
of response times coupled with binary decisions. Posterior distributions of
item and person parameters can be conveniently extracted and post-processed.
Model fit can be evaluated and compared using Bayes factors and efficient
cross-validation procedures.Comment: 54 pages, 16 figures, 3 table
A unified formal framework for factorial and probabilistic topic modelling
Topic modelling has become a highly popular technique for extracting knowledge from texts. It encompasses various method families, including Factorial methods, Probabilistic methods, and Natural Language Processing methods. This paper introduces a unified conceptual framework for Factorial and Probabilistic methods by identifying shared elements and representing them using a homogeneous notation. The paper presents 12 different methods within this framework, enabling easy comparative analysis to assess the flexibility and how realistic the assumptions of each approach are. This establishes the initial stage of a broader analysis aimed at relating all method families to this common framework, comprehensively understanding their strengths and weaknesses, and establishing general application guidelines. Also, an experimental setup reinforces the convenience of having harmonized notational schema. The paper concludes with a discussion on the presented methods and outlines future research directions.Peer ReviewedPostprint (published version
Availability, outage, and capacity of spatially correlated, Australasian free-space optical networks
Network capacity and reliability for free space optical communication (FSOC)
is strongly driven by ground station availability, dominated by local cloud
cover causing an outage, and how availability relations between stations
produce network diversity. We combine remote sensing data and novel methods to
provide a generalised framework for assessing and optimising optical ground
station networks. This work is guided by an example network of eight Australian
and New Zealand optical communication ground stations which would span
approximately in longitude and in latitude. Utilising
time-dependent cloud cover data from five satellites, we present a detailed
analysis determining the availability and diversity of the network, finding the
Australasian region is well-suited for an optical network with a 69% average
site availability and low spatial cloud cover correlations. Employing methods
from computational neuroscience, we provide a Monte Carlo method for sampling
the joint probability distribution of site availabilities for an arbitrarily
sized and point-wise correlated network of ground stations. Furthermore, we
develop a general heuristic for site selection under availability and
correlation optimisations, and combine this with orbital propagation
simulations to compare the data capacity between optimised networks and the
example network. We show that the example network may be capable of providing
tens of terabits per day to a LEO satellite, and up to 99.97% reliability to
GEO satellites. We therefore use the Australasian region to demonstrate novel,
generalised tools for assessing and optimising FSOC ground station networks,
and additionally, the suitability of the region for hosting such a network.Comment: Accepted in Journal of Optical Communications and Networking. 16
pages, 16 figure
- …