5 research outputs found
STWalk: Learning Trajectory Representations in Temporal Graphs
Analyzing the temporal behavior of nodes in time-varying graphs is useful for
many applications such as targeted advertising, community evolution and outlier
detection. In this paper, we present a novel approach, STWalk, for learning
trajectory representations of nodes in temporal graphs. The proposed framework
makes use of structural properties of graphs at current and previous time-steps
to learn effective node trajectory representations. STWalk performs random
walks on a graph at a given time step (called space-walk) as well as on graphs
from past time-steps (called time-walk) to capture the spatio-temporal behavior
of nodes. We propose two variants of STWalk to learn trajectory
representations. In one algorithm, we perform space-walk and time-walk as part
of a single step. In the other variant, we perform space-walk and time-walk
separately and combine the learned representations to get the final trajectory
embedding. Extensive experiments on three real-world temporal graph datasets
validate the effectiveness of the learned representations when compared to
three baseline methods. We also show the goodness of the learned trajectory
embeddings for change point detection, as well as demonstrate that arithmetic
operations on these trajectory representations yield interesting and
interpretable results.Comment: 10 pages, 5 figures, 2 table
Scalable system for smart urban transport management
Efficient management of smart transport systems requires the integration of various sensing technologies, as well as fast processing of a high volume of heterogeneous data, in order to perform smart analytics of urban networks in real time. However, dynamic response that relies on intelligent demand-side transport management is particularly challenging due to the increasing flow of transmitted sensor data. In this work, a novel smart service-driven, adaptable middleware architecture is proposed to acquire, store, manipulate, and integrate information from heterogeneous data sources in order to deliver smart analytics aimed at supporting strategic decision-making. The architecture offers adaptive and scalable data integration services for acquiring and processing dynamic data, delivering fast response time, and offering data mining and machine learning models for real-time prediction, combined with advanced visualisation techniques. The proposed solution has been implemented and validated, demonstrating its ability to provide real-time performance on the existing, operational, and large-scale bus network of a European capital city
IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages
India has a rich linguistic landscape with languages from 4 major language
families spoken by over a billion people. 22 of these languages are listed in
the Constitution of India (referred to as scheduled languages) are the focus of
this work. Given the linguistic diversity, high-quality and accessible Machine
Translation (MT) systems are essential in a country like India. Prior to this
work, there was (i) no parallel training data spanning all the 22 languages,
(ii) no robust benchmarks covering all these languages and containing content
relevant to India, and (iii) no existing translation models which support all
the 22 scheduled languages of India. In this work, we aim to address this gap
by focusing on the missing pieces required for enabling wide, easy, and open
access to good machine translation systems for all 22 scheduled Indian
languages. We identify four key areas of improvement: curating and creating
larger training datasets, creating diverse and high-quality benchmarks,
training multilingual models, and releasing models with open access. Our first
contribution is the release of the Bharat Parallel Corpus Collection (BPCC),
the largest publicly available parallel corpora for Indic languages. BPCC
contains a total of 230M bitext pairs, of which a total of 126M were newly
added, including 644K manually translated sentence pairs created as part of
this work. Our second contribution is the release of the first n-way parallel
benchmark covering all 22 Indian languages, featuring diverse domains,
Indian-origin content, and source-original test sets. Next, we present
IndicTrans2, the first model to support all 22 languages, surpassing existing
models on multiple existing and new benchmarks created as a part of this work.
Lastly, to promote accessibility and collaboration, we release our models and
associated data with permissive licenses at
https://github.com/ai4bharat/IndicTrans2