115 research outputs found
Hybrid Graph: A Unified Graph Representation with Datasets and Benchmarks for Complex Graphs
Graphs are widely used to encapsulate a variety of data formats, but
real-world networks often involve complex node relations beyond only being
pairwise. While hypergraphs and hierarchical graphs have been developed and
employed to account for the complex node relations, they cannot fully represent
these complexities in practice. Additionally, though many Graph Neural Networks
(GNNs) have been proposed for representation learning on higher-order graphs,
they are usually only evaluated on simple graph datasets. Therefore, there is a
need for a unified modelling of higher-order graphs, and a collection of
comprehensive datasets with an accessible evaluation framework to fully
understand the performance of these algorithms on complex graphs. In this
paper, we introduce the concept of hybrid graphs, a unified definition for
higher-order graphs, and present the Hybrid Graph Benchmark (HGB). HGB contains
23 real-world hybrid graph datasets across various domains such as biology,
social media, and e-commerce. Furthermore, we provide an extensible evaluation
framework and a supporting codebase to facilitate the training and evaluation
of GNNs on HGB. Our empirical study of existing GNNs on HGB reveals various
research opportunities and gaps, including (1) evaluating the actual
performance improvement of hypergraph GNNs over simple graph GNNs; (2)
comparing the impact of different sampling strategies on hybrid graph learning
methods; and (3) exploring ways to integrate simple graph and hypergraph
information. We make our source code and full datasets publicly available at
https://zehui127.github.io/hybrid-graph-benchmark/.Comment: Preprint. Under review. 16 pages, 5 figures, 11 table
Beyond Flatland : exploring graphs in many dimensions
Societies, technologies, economies, ecosystems, organisms, . . . Our world is composed of complex networks—systems with many elements that interact in nontrivial ways. Graphs are natural models of these systems, and scientists have made tremendous progress in developing tools for their analysis. However, research has long focused on relatively simple graph representations and problem specifications, often discarding valuable real-world information in the process. In recent years, the limitations of this approach have become increasingly apparent, but we are just starting to comprehend how more intricate data representations and problem formulations might benefit our understanding of relational phenomena. Against this background, our thesis sets out to explore graphs in five dimensions: descriptivity, multiplicity, complexity, expressivity, and responsibility. Leveraging tools from graph theory, information theory, probability theory, geometry, and topology, we develop methods to (1) descriptively compare individual graphs, (2) characterize similarities and differences between groups of multiple graphs, (3) critically assess the complexity of relational data representations and their associated scientific culture, (4) extract expressive features from and for hypergraphs, and (5) responsibly mitigate the risks induced by graph-structured content recommendations. Thus, our thesis is naturally situated at the intersection of graph mining, graph learning, and network analysis.Gesellschaften, Technologien, Volkswirtschaften, Ökosysteme, Organismen, . . . Unsere Welt besteht aus komplexen Netzwerken—Systemen mit vielen Elementen, die auf nichttriviale Weise interagieren. Graphen sind natürliche Modelle dieser Systeme, und die Wissenschaft hat bei der Entwicklung von Methoden zu ihrer Analyse große Fortschritte gemacht. Allerdings hat sich die Forschung lange auf relativ einfache Graphrepräsentationen und Problemspezifikationen beschränkt, oft unter Vernachlässigung wertvoller Informationen aus der realen Welt. In den vergangenen Jahren sind die Grenzen dieser Herangehensweise zunehmend deutlich geworden, aber wir beginnen gerade erst zu erfassen, wie unser Verständnis relationaler Phänomene von intrikateren Datenrepräsentationen und Problemstellungen profitieren kann. Vor diesem Hintergrund erkundet unsere Dissertation Graphen in fünf Dimensionen: Deskriptivität, Multiplizität, Komplexität, Expressivität, und Verantwortung. Mithilfe von Graphentheorie, Informationstheorie, Wahrscheinlichkeitstheorie, Geometrie und Topologie entwickeln wir Methoden, welche (1) einzelne Graphen deskriptiv vergleichen, (2) Gemeinsamkeiten und Unterschiede zwischen Gruppen multipler Graphen charakterisieren, (3) die Komplexität relationaler Datenrepräsentationen und der mit ihnen verbundenen Wissenschaftskultur kritisch beleuchten, (4) expressive Merkmale von und für Hypergraphen extrahieren, und (5) verantwortungsvoll den Risiken begegnen, welche die Graphstruktur von Inhaltsempfehlungen mit sich bringt. Damit liegt unsere Dissertation naturgemäß an der Schnittstelle zwischen Graph Mining, Graph Learning und Netzwerkanalyse
Indoor Scene Recognition for Micro Aerial Vehicles Navigation using Enhanced-GIST Descriptors
An indoor scene recognition algorithm combining histogram of horizontal and vertical directional morphological gradient features and GIST features is proposed in this paper. New visual descriptor is called enhanced-GIST. Three different classifiers, k-nearest neighbour classifier, NaĂŻve Bayes classifier and support vector machine, are employed for the classification of indoor scenes into corridor, staircase or room. The evaluation was performed on two indoor scene datasets. The scene recognition algorithm consists of training phase and a testing phase. In the training phase, GIST, CENTRIST, LBP, HODMG and enhanced-GIST feature vectors are extracted for all the training images in the datasets and classifiers are trained for these image feature vectors and image labels (corridor-1, staircase-2 and room-3). In the test phase, GIST, CENTRIST, LBP, HODMG and enhanced-GIST feature vectors are extracted for each unknown test image sample and classification is performed using a trained scene recognition model. The experimental results show that indoor scene recognition algorithm employing SVM with enhanced GIST descriptors produces very high recognition rates of 97.22 per cent and 99.33 per cent for dataset-1 and dataset-2, compared to kNN and NaĂŻve Bayes classifiers. In addition to its accuracy and robustness, the algorithm is suitable for real-time operations
Towards a unified approach
"Decision-making in the presence of uncertainty is a pervasive computation.
Latent variable decoding—inferring hidden causes underlying visible
effects—is commonly observed in nature, and it is an unsolved challenge
in modern machine learning.
On many occasions, animals need to base their choices on uncertain
evidence; for instance, when deciding whether to approach or avoid an
obfuscated visual stimulus that could be either a prey or a predator. Yet,
their strategies are, in general, poorly understood.
In simple cases, these problems admit an optimal, explicit solution.
However, in more complex real-life scenarios, it is difficult to determine the
best possible behavior. The most common approach in modern machine
learning relies on artificial neural networks—black boxes that map each
input to an output. This input-output mapping depends on a large number
of parameters, the weights of the synaptic connections, which are optimized
during learning.(...)
Machine learning with neuroimaging data to identify autism spectrum disorder: a systematic review and meta-analysis
Purpose: Autism Spectrum Disorder (ASD) is diagnosed through observation or interview assessments, which is time-consuming, subjective, and with questionable validity and reliability. Thus, we aimed to evaluate the role of machine learning (ML) with neuroimaging data to provide a reliable classification of ASD. Methods: A systematic search of PubMed, Scopus, and Embase was conducted to identify relevant publications. Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) was used to assess the studies’ quality. A bivariate random-effects model meta-analysis was employed to evaluate the pooled sensitivity, the pooled specificity, and the diagnostic performance through the hierarchical summary receiver operating characteristic (HSROC) curve of ML with neuroimaging data in classifying ASD. Meta-regression was also performed. Results: Forty-four studies (5697 ASD and 6013 typically developing individuals [TD] in total) were included in the quantitative analysis. The pooled sensitivity for differentiating ASD from TD individuals was 86.25 95% confidence interval [CI] (81.24, 90.08), while the pooled specificity was 83.31 95% CI (78.12, 87.48) with a combined area under the HSROC (AUC) of 0.889. Higgins I2 (> 90%) and Cochran’s Q (p < 0.0001) suggest a high degree of heterogeneity. In the bivariate model meta-regression, a higher pooled specificity was observed in studies not using a brain atlas (90.91 95% CI [80.67, 96.00], p = 0.032). In addition, a greater pooled sensitivity was seen in studies recruiting both males and females (89.04 95% CI [83.84, 92.72], p = 0.021), and combining imaging modalities (94.12 95% [85.43, 97.76], p = 0.036). Conclusion: ML with neuroimaging data is an exciting prospect in detecting individuals with ASD but further studies are required to improve its reliability for usage in clinical practice
Classification of Cognitive States using Task-Specific Connectivity Features
Human brain activity maps are produced by functional MRI (fMRI) research that describes the average level of engagement during a specific task of various brain regions. Functional connectivity describes the interrelationship, integrated performance, and organization of these different brain regions. This study investigates functional connectivity to quantify the interactions between different brain regions engaged concurrently in a specific task. The key focus of this study was to introduce and demonstrate task-specific functional connectivity among brain regions using fMRI data and decode cognitive states by proposing a novel classifier using connectivity features. Two connectivity models were considered: a graph-based task-specific functional connectivity and a Granger causality-transfer entropy framework. Connectivity strengths obtained among brain regions were used for cognitive state classification. The parameters of the nodal and global graph analysis from the graph-based connectivity framework were considered, and the transfer entropy values of the causal connectivity model were considered as features for the cognitive state classification. The proposed model achieved an average accuracy of 95% on the StarPlus fMRI dataset and showed an improvement of 5% compared to the existing Tensor-SVD classification algorithm
Recommended from our members
Nonlinear opinion models and other networked systems
Networks play a critical role in many physical, biological, and social systems. In this thesis, we investigate tools to model and analyze networked systems. We first examine some of the ways in which we can model social dynamics that take place on networks. We then study two recently developed data-analysis methods that employ a network framework and explore new ways in which they can be used to find meaningful signals in large data sets. In the first half of the thesis, we study opinion dynamics on networks. We begin by examining a class of opinion models, known as coevolving voter models (CVM), that couple the mechanisms of opinion formation and changing social connections. We then propose a version of CVMs that incorporates nonlinearity. In our models, we assume that individuals strive to achieve harmony and avoid disagreement, both by changing their social connections to reflect their opinions and by changing their opinions to reflect their social connections. By taking a minimalist approach to modeling social dynamics, we hope to gain a deeper understanding of how these two mechanisms can give rise to social phenomena such as the ``majority illusion''. Comparing several versions of CVMs, we find that seemingly small changes in update rules can lead to strikingly different behaviors. A particularly interesting feature of our nonlinear CVMs is that, under certain conditions, the opinion state that is held initially by a minority of the nodes can effectively spread to almost every node in a network if the minority nodes view themselves as the majority. We then discuss an ongoing project that involves another class of opinion models called bounded-confidence models. Specifically, we examine extensions of bounded-confidence models on hypergraphs and discuss some preliminary findings. In the second half of the thesis, we study problems in data analysis. We begin by considering topological structures as a tool to study integrated circuit (IC) devices. In particular, we examine a problem in the design and manufacturing of IC devices using topological data analysis (TDA), which is based on network structures called simplicial complexes. Failures in IC devices generally occur near the tolerance limits of photolithography systems, such as at the minimum separation distance between adjacent electronic components. However, for complex arrangements of electronic components, simply ensuring minimal separation is insufficient to guarantee that one can manufacture an IC design accurately and reliably. We apply tools from TDA to compare data from IC designs. Without inputting domain knowledge, we are able to infer several results about the IC design-manufacturing process. Finally, we discuss an ongoing project in the analysis of network data. Specifically, we explore applications of a recently developed algorithm called network dictionary learning (NDL) and discuss problems of network reconstruction and denoising using NDL on both synthetic and real-world networks
On the role of metaheuristic optimization in bioinformatics
Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics
- …