3,868 research outputs found
Semantic distillation: a method for clustering objects by their contextual specificity
Techniques for data-mining, latent semantic analysis, contextual search of
databases, etc. have long ago been developed by computer scientists working on
information retrieval (IR). Experimental scientists, from all disciplines,
having to analyse large collections of raw experimental data (astronomical,
physical, biological, etc.) have developed powerful methods for their
statistical analysis and for clustering, categorising, and classifying objects.
Finally, physicists have developed a theory of quantum measurement, unifying
the logical, algebraic, and probabilistic aspects of queries into a single
formalism. The purpose of this paper is twofold: first to show that when
formulated at an abstract level, problems from IR, from statistical data
analysis, and from physical measurement theories are very similar and hence can
profitably be cross-fertilised, and, secondly, to propose a novel method of
fuzzy hierarchical clustering, termed \textit{semantic distillation} --
strongly inspired from the theory of quantum measurement --, we developed to
analyse raw data coming from various types of experiments on DNA arrays. We
illustrate the method by analysing DNA arrays experiments and clustering the
genes of the array according to their specificity.Comment: Accepted for publication in Studies in Computational Intelligence,
Springer-Verla
Bibliographic Analysis on Research Publications using Authors, Categorical Labels and the Citation Network
Bibliographic analysis considers the author's research areas, the citation
network and the paper content among other things. In this paper, we combine
these three in a topic model that produces a bibliographic model of authors,
topics and documents, using a nonparametric extension of a combination of the
Poisson mixed-topic link model and the author-topic model. This gives rise to
the Citation Network Topic Model (CNTM). We propose a novel and efficient
inference algorithm for the CNTM to explore subsets of research publications
from CiteSeerX. The publication datasets are organised into three corpora,
totalling to about 168k publications with about 62k authors. The queried
datasets are made available online. In three publicly available corpora in
addition to the queried datasets, our proposed model demonstrates an improved
performance in both model fitting and document clustering, compared to several
baselines. Moreover, our model allows extraction of additional useful knowledge
from the corpora, such as the visualisation of the author-topics network.
Additionally, we propose a simple method to incorporate supervision into topic
modelling to achieve further improvement on the clustering task.Comment: Preprint for Journal Machine Learnin
Recommended from our members
Artificial Immune Systems - Models, algorithms and applications
Copyright © 2010 Academic Research Publishing Agency.This article has been made available through the Brunel Open Access Publishing Fund.Artificial Immune Systems (AIS) are computational paradigms that belong to the computational intelligence family and are inspired by the biological immune system. During the past decade, they have attracted a lot of interest from researchers aiming to develop immune-based models and techniques to solve complex computational or engineering problems. This work presents a survey of existing AIS models and algorithms with a focus on the last five years.This article is available through the Brunel Open Access Publishing Fun
Current Studies and Applications of Krill Herd and Gravitational Search Algorithms in Healthcare
Nature-Inspired Computing or NIC for short is a relatively young field that
tries to discover fresh methods of computing by researching how natural
phenomena function to find solutions to complicated issues in many contexts. As
a consequence of this, ground-breaking research has been conducted in a variety
of domains, including synthetic immune functions, neural networks, the
intelligence of swarm, as well as computing of evolutionary. In the domains of
biology, physics, engineering, economics, and management, NIC techniques are
used. In real-world classification, optimization, forecasting, and clustering,
as well as engineering and science issues, meta-heuristics algorithms are
successful, efficient, and resilient. There are two active NIC patterns: the
gravitational search algorithm and the Krill herd algorithm. The study on using
the Krill Herd Algorithm (KH) and the Gravitational Search Algorithm (GSA) in
medicine and healthcare is given a worldwide and historical review in this
publication. Comprehensive surveys have been conducted on some other
nature-inspired algorithms, including KH and GSA. The various versions of the
KH and GSA algorithms and their applications in healthcare are thoroughly
reviewed in the present article. Nonetheless, no survey research on KH and GSA
in the healthcare field has been undertaken. As a result, this work conducts a
thorough review of KH and GSA to assist researchers in using them in diverse
domains or hybridizing them with other popular algorithms. It also provides an
in-depth examination of the KH and GSA in terms of application, modification,
and hybridization. It is important to note that the goal of the study is to
offer a viewpoint on GSA with KH, particularly for academics interested in
investigating the capabilities and performance of the algorithm in the
healthcare and medical domains.Comment: 35 page
A Novel Quantum Algorithm for Ant Colony Optimization
Quantum ant colony optimization (QACO) has drew much attention since it
combines the advantages of quantum computing and ant colony optimization (ACO)
algorithms and overcomes some limitations of the traditional ACO algorithm.
However, due to the hardware resource limitations of currently available
quantum computers, such as the limited number of qubits, lack of high-fidelity
gating operation, and low noisy tolerance, the practical application of the
QACO is quite challenging. In this paper, we introduce a hybrid
quantum-classical algorithm by combining the clustering algorithm with QACO
algorithm, so that this extended QACO can handle large-scale optimization
problems, which makes the practical application of QACO based on available
quantum computation resource possible. To verify the effectiveness and
performance of the algorithm, we tested the developed QACO algorithm with the
Travelling Salesman Problem (TSP) as benchmarks. The developed QACO algorithm
shows better performance under multiple data set. In addition, the developed
QACO algorithm also manifests the robustness to noise of calculation process,
which is typically a major barrier for practical application of quantum
computers. Our work shows that the combination of the clustering algorithm with
QACO has effectively extended the application scenario of QACO in current NISQ
era of quantum computing
A Survey on Evolutionary Computation for Computer Vision and Image Analysis: Past, Present, and Future Trends
Computer vision (CV) is a big and important field
in artificial intelligence covering a wide range of applications.
Image analysis is a major task in CV aiming to extract, analyse
and understand the visual content of images. However, imagerelated
tasks are very challenging due to many factors, e.g., high
variations across images, high dimensionality, domain expertise
requirement, and image distortions. Evolutionary computation
(EC) approaches have been widely used for image analysis with
significant achievement. However, there is no comprehensive
survey of existing EC approaches to image analysis. To fill
this gap, this paper provides a comprehensive survey covering
all essential EC approaches to important image analysis tasks
including edge detection, image segmentation, image feature
analysis, image classification, object detection, and others. This
survey aims to provide a better understanding of evolutionary
computer vision (ECV) by discussing the contributions of different
approaches and exploring how and why EC is used for
CV and image analysis. The applications, challenges, issues, and
trends associated to this research field are also discussed and
summarised to provide further guidelines and opportunities for
future research
The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch
Recent and forthcoming advances in instrumentation, and giant new surveys,
are creating astronomical data sets that are not amenable to the methods of
analysis familiar to astronomers. Traditional methods are often inadequate not
merely because of the size in bytes of the data sets, but also because of the
complexity of modern data sets. Mathematical limitations of familiar algorithms
and techniques in dealing with such data sets create a critical need for new
paradigms for the representation, analysis and scientific visualization (as
opposed to illustrative visualization) of heterogeneous, multiresolution data
across application domains. Some of the problems presented by the new data sets
have been addressed by other disciplines such as applied mathematics,
statistics and machine learning and have been utilized by other sciences such
as space-based geosciences. Unfortunately, valuable results pertaining to these
problems are mostly to be found only in publications outside of astronomy. Here
we offer brief overviews of a number of concepts, techniques and developments,
some "old" and some new. These are generally unknown to most of the
astronomical community, but are vital to the analysis and visualization of
complex datasets and images. In order for astronomers to take advantage of the
richness and complexity of the new era of data, and to be able to identify,
adopt, and apply new solutions, the astronomical community needs a certain
degree of awareness and understanding of the new concepts. One of the goals of
this paper is to help bridge the gap between applied mathematics, artificial
intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in
Astronomy, special issue "Robotic Astronomy
Recommended from our members
Prediction of progression in idiopathic pulmonary fibrosis using CT scans atbaseline: A quantum particle swarm optimization - Random forest approach
Idiopathic pulmonary fibrosis (IPF) is a fatal lung disease characterized by an unpredictable progressive declinein lung function. Natural history of IPF is unknown and the prediction of disease progression at the time ofdiagnosis is notoriously difficult. High resolution computed tomography (HRCT) has been used for the diagnosisof IPF, but not generally for monitoring purpose. The objective of this work is to develop a novel predictivemodel for the radiological progression pattern at voxel-wise level using only baseline HRCT scans. Mainly, thereare two challenges: (a) obtaining a data set of features for region of interest (ROI) on baseline HRCT scans andtheir follow-up status; and (b) simultaneously selecting important features from high-dimensional space, andoptimizing the prediction performance. We resolved the first challenge by implementing a study design andhaving an expert radiologist contour ROIs at baseline scans, depending on its progression status in follow-upvisits. For the second challenge, we integrated the feature selection with prediction by developing an algorithmusing a wrapper method that combines quantum particle swarm optimization to select a small number of featureswith random forest to classify early patterns of progression. We applied our proposed algorithm to analyzeanonymized HRCT images from 50 IPF subjects from a multi-center clinical trial. We showed that it yields aparsimonious model with 81.8% sensitivity, 82.2% specificity and an overall accuracy rate of 82.1% at the ROIlevel. These results are superior to other popular feature selections and classification methods, in that ourmethod produces higher accuracy in prediction of progression and more balanced sensitivity and specificity witha smaller number of selected features. Our work is the first approach to show that it is possible to use onlybaseline HRCT scans to predict progressive ROIs at 6 months to 1year follow-ups using artificial intelligence
Detecting Multiple Communities Using Quantum Annealing on the D-Wave System
A very important problem in combinatorial optimization is partitioning a
network into communities of densely connected nodes; where the connectivity
between nodes inside a particular community is large compared to the
connectivity between nodes belonging to different ones. This problem is known
as community detection, and has become very important in various fields of
science including chemistry, biology and social sciences. The problem of
community detection is a twofold problem that consists of determining the
number of communities and, at the same time, finding those communities. This
drastically increases the solution space for heuristics to work on, compared to
traditional graph partitioning problems. In many of the scientific domains in
which graphs are used, there is the need to have the ability to partition a
graph into communities with the ``highest quality'' possible since the presence
of even small isolated communities can become crucial to explain a particular
phenomenon. We have explored community detection using the power of quantum
annealers, and in particular the D-Wave 2X and 2000Q machines. It turns out
that the problem of detecting at most two communities naturally fits into the
architecture of a quantum annealer with almost no need of reformulation. This
paper addresses a systematic study of detecting two or more communities in a
network using a quantum annealer
- …