51 research outputs found
Graph Convolutional Networks for Road Networks
Machine learning techniques for road networks hold the potential to
facilitate many important transportation applications. Graph Convolutional
Networks (GCNs) are neural networks that are capable of leveraging the
structure of a road network by utilizing information of, e.g., adjacent road
segments. While state-of-the-art GCNs target node classification tasks in
social, citation, and biological networks, machine learning tasks in road
networks differ substantially from such tasks. In road networks, prediction
tasks concern edges representing road segments, and many tasks involve
regression. In addition, road networks differ substantially from the networks
assumed in the GCN literature in terms of the attribute information available
and the network characteristics. Many implicit assumptions of GCNs do therefore
not apply. We introduce the notion of Relational Fusion Network (RFN), a novel
type of GCN designed specifically for machine learning on road networks. In
particular, we propose methods that outperform state-of-the-art GCNs on both a
road segment regression task and a road segment classification task by 32-40%
and 21-24%, respectively. In addition, we provide experimental evidence of the
short-comings of state-of-the-art GCNs in the context of road networks: unlike
our method, they cannot effectively leverage the road network structure for
road segment classification and fail to outperform a regular multi-layer
perceptron.Comment: Ten-page pre-print version of a four-page ACM SIGSPATIAL 2019 poster
pape
Hybrid Spatio-Temporal Graph Convolutional Network: Improving Traffic Prediction with Navigation Data
Traffic forecasting has recently attracted increasing interest due to the
popularity of online navigation services, ridesharing and smart city projects.
Owing to the non-stationary nature of road traffic, forecasting accuracy is
fundamentally limited by the lack of contextual information. To address this
issue, we propose the Hybrid Spatio-Temporal Graph Convolutional Network
(H-STGCN), which is able to "deduce" future travel time by exploiting the data
of upcoming traffic volume. Specifically, we propose an algorithm to acquire
the upcoming traffic volume from an online navigation engine. Taking advantage
of the piecewise-linear flow-density relationship, a novel transformer
structure converts the upcoming volume into its equivalent in travel time. We
combine this signal with the commonly-utilized travel-time signal, and then
apply graph convolution to capture the spatial dependency. Particularly, we
construct a compound adjacency matrix which reflects the innate traffic
proximity. We conduct extensive experiments on real-world datasets. The results
show that H-STGCN remarkably outperforms state-of-the-art methods in various
metrics, especially for the prediction of non-recurring congestion
cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate
Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose ‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor
A data science roadmap for open science organizations engaged in early-stage drug discovery
The Structural Genomics Consortium is an international open science research organization with a focus on accelerating early-stage drug discovery, namely hit discovery and optimization. We, as many others, believe that artificial intelligence (AI) is poised to be a main accelerator in the field. The question is then how to best benefit from recent advances in AI and how to generate, format and disseminate data to enable future breakthroughs in AI-guided drug discovery. We present here the recommendations of a working group composed of experts from both the public and private sectors. Robust data management requires precise ontologies and standardized vocabulary while a centralized database architecture across laboratories facilitates data integration into high-value datasets. Lab automation and opening electronic lab notebooks to data mining push the boundaries of data sharing and data modeling. Important considerations for building robust machine-learning models include transparent and reproducible data processing, choosing the most relevant data representation, defining the right training and test sets, and estimating prediction uncertainty. Beyond data-sharing, cloud-based computing can be harnessed to build and disseminate machine-learning models. Important vectors of acceleration for hit and chemical probe discovery will be (1) the real-time integration of experimental data generation and modeling workflows within design-make-test-analyze (DMTA) cycles openly, and at scale and (2) the adoption of a mindset where data scientists and experimentalists work as a unified team, and where data science is incorporated into the experimental design
- …