11 research outputs found
Unsupervised Detection of Emergent Patterns in Large Image Collections
With the advent of modern image acquisition and sharing technologies, billions of images are added to the Internet every day. This huge repository contains useful information, but it is very hard to analyze. If labeled information is available for this data, then supervised learning techniques can be used to extract useful information. Visual pattern mining approaches provide a way to discover visual structures and patterns in an image collection without the need of any supervision.
The Internet contains images of various objects, scenes, patterns, and shapes. The majority of approaches for visual pattern discovery, on the other hand, find patterns that are related to object or scene categories.Emergent pattern mining techniques provide a way to extract generic, complex and hidden structures in images.
This thesis describes research, experiments, and analysis conducted to explore various approaches to mine emergent patterns from image collections in an unsupervised way. These approaches are based on itemset mining and graph theoretic strategies. The itemset mining strategy uses frequent itemset mining and rare itemset mining techniques to discover patterns.The mining is performed on a transactional dataset which is obtained from the BoW representation of images. The graph-based approach represents visual word co-occurrences obtained from images in a co-occurrence graph.Emergent patterns form dense clusters in this graph that are extracted using normalized cuts. The patterns that are discovered using itemset mining approaches are:stripes and parallel lines;dots and checks;bright dots;single lines;intersections; and frames. The graph based approach revealed various interesting patterns, including some patterns that are related to object categories
Image similarity in medical images
Recent experiments have indicated a strong influence of the substrate grain orientation on the self-ordering in anodic porous alumina. Anodic porous alumina with straight pore channels grown in a stable, self-ordered manner is formed on (001) oriented Al grain, while disordered porous pattern is formed on (101) oriented Al grain with tilted pore channels growing in an unstable manner. In this work, numerical simulation of the pore growth process is carried out to understand this phenomenon. The rate-determining step of the oxide growth is assumed to be the Cabrera-Mott barrier at the oxide/electrolyte (o/e) interface, while the substrate is assumed to determine the ratio β between the ionization and oxidation reactions at the metal/oxide (m/o) interface. By numerically solving the electric field inside a growing porous alumina during anodization, the migration rates of the ions and hence the evolution of the o/e and m/o interfaces are computed. The simulated results show that pore growth is more stable when β is higher. A higher β corresponds to more Al ionized and migrating away from the m/o interface rather than being oxidized, and hence a higher retained O:Al ratio in the oxide. Experimentally measured oxygen content in the self-ordered porous alumina on (001) Al is indeed found to be about 3% higher than that in the disordered alumina on (101) Al, in agreement with the theoretical prediction. The results, therefore, suggest that ionization on (001) Al substrate is relatively easier than on (101) Al, and this leads to the more stable growth of the pore channels on (001) Al
Topological Feature Selection: A Graph-Based Filter Feature Selection Approach
In this paper, we introduce a novel unsupervised, graph-based filter feature
selection technique which exploits the power of topologically constrained
network representations. We model dependency structures among features using a
family of chordal graphs (the Triangulated Maximally Filtered Graph), and we
maximise the likelihood of features' relevance by studying their relative
position inside the network. Such an approach presents three aspects that are
particularly satisfactory compared to its alternatives: (i) it is highly
tunable and easily adaptable to the nature of input data; (ii) it is fully
explainable, maintaining, at the same time, a remarkable level of simplicity;
(iii) it is computationally cheaper compared to its alternatives. We test our
algorithm on 16 benchmark datasets from different applicative domains showing
that it outperforms or matches the current state-of-the-art under heterogeneous
evaluation conditions.Comment: 23 pages, 2 figures, 13 table
M3DISEEN: A Novel Machine Learning Approach for Predicting the 3D Printability of Medicines
Artificial intelligence (AI) has the potential to reshape pharmaceutical formulation development through its ability to analyze and continuously monitor large datasets. Fused deposition modeling (FDM) 3-dimensional printing (3DP) has made significant advancements in the field of oral drug delivery with personalized drug-loaded formulations being designed, developed and dispensed for the needs of the patient. However, the optimization of the fabrication parameters is a time-consuming, empirical trial approach, requiring expert knowledge. Here, M3DISEEN, a web-based pharmaceutical software, was developed to accelerate FDM 3D printing, which includes producing filaments by hot melt extrusion (HME), using AI machine learning techniques (MLTs). In total, 614 drug-loaded formulations were designed from a comprehensive list of 145 different pharmaceutical excipients, 3D printed and assessed in-house. To build the predictive tool, a dataset was constructed and models were trained and tested at a ratio of 75:25. Significantly, the AI models predicted key fabrication parameters with accuracies of 76% and 67% for the printability and the filament characteristics, respectively. Furthermore, the AI models predicted the HME and FDM processing temperatures with a mean absolute error of 8.9 °C and 8.3 °C, respectively. Strikingly, the AI models achieved high levels of accuracy by solely inputting the pharmaceutical excipient trade names. Therefore, AI provides an effective holistic modeling technology and software to streamline and advance 3DP as a significant technology within drug development. M3DISEEN is available at (http://m3diseen.com/predictions/)
Handbook of Vascular Biometrics
This open access handbook provides the first comprehensive overview of biometrics exploiting the shape of human blood vessels for biometric recognition, i.e. vascular biometrics, including finger vein recognition, hand/palm vein recognition, retina recognition, and sclera recognition. After an introductory chapter summarizing the state of the art in and availability of commercial systems and open datasets/open source software, individual chapters focus on specific aspects of one of the biometric modalities, including questions of usability, security, and privacy. The book features contributions from both academia and major industrial manufacturers
Learning to hash for large scale image retrieval
This thesis is concerned with improving the effectiveness of nearest neighbour search.
Nearest neighbour search is the problem of finding the most similar data-points to a
query in a database, and is a fundamental operation that has found wide applicability
in many fields. In this thesis the focus is placed on hashing-based approximate
nearest neighbour search methods that generate similar binary hashcodes for similar
data-points. These hashcodes can be used as the indices into the buckets of hashtables
for fast search. This work explores how the quality of search can be improved by
learning task specific binary hashcodes.
The generation of a binary hashcode comprises two main steps carried out sequentially:
projection of the image feature vector onto the normal vectors of a set of hyperplanes
partitioning the input feature space followed by a quantisation operation that
uses a single threshold to binarise the resulting projections to obtain the hashcodes.
The degree to which these operations preserve the relative distances between the datapoints
in the input feature space has a direct influence on the effectiveness of using
the resulting hashcodes for nearest neighbour search. In this thesis I argue that the
retrieval effectiveness of existing hashing-based nearest neighbour search methods can
be increased by learning the thresholds and hyperplanes based on the distribution of
the input data.
The first contribution is a model for learning multiple quantisation thresholds. I
demonstrate that the best threshold positioning is projection specific and introduce a
novel clustering algorithm for threshold optimisation. The second contribution extends
this algorithm by learning the optimal allocation of quantisation thresholds per hyperplane.
In doing so I argue that some hyperplanes are naturally more effective than others
at capturing the distribution of the data and should therefore attract a greater allocation
of quantisation thresholds. The third contribution focuses on the complementary
problem of learning the hashing hyperplanes. I introduce a multi-step iterative model
that, in the first step, regularises the hashcodes over a data-point adjacency graph,
which encourages similar data-points to be assigned similar hashcodes. In the second
step, binary classifiers are learnt to separate opposing bits with maximum margin. This
algorithm is extended to learn hyperplanes that can generate similar hashcodes for similar
data-points in two different feature spaces (e.g. text and images). Individually the
performance of these algorithms is often superior to competitive baselines. I unify my
contributions by demonstrating that learning hyperplanes and thresholds as part of the
same model can yield an additive increase in retrieval effectiveness