15 research outputs found
Model of mobility demands for future short distance public transport systems
Short distance public transport faces huge challenges, although it is very important within a sustainable transport system to reduce traffic emissions. Revenues and subsidization are decreasing and especially in rural regions the offer is constantly diminishing. New approaches for public transport systems are strongly needed to avoid traffic infarcts in urban and rural areas to grant a basic offer of mobility services for everyone. In the proposed work a demand centered approach of dynamic public transport planning is introduced which relies on regional traffic data. The approach is based on a demand model which is represented as a dynamic undirected attributed graph. The demands are logged through traffic sensors and sustainability focused traveler information systems
Towards Building Real-Time, Convenient Route Recommendation System for Public Transit
International audiencePublic transportation is essential for sustainable and economical development of cities. Several transport organizations aim to provide service information to commuters through web and mobile apps. This information includes possible routes between two stations, estimated travel and arrival times, and real-time updates about traffic conditions. However, this information is currently not personalized according to commuter preferences. In this work, we emphasize the need for personalized transit service information to commuters and present a vision of our work in this direction. Our final goal is to develop a fully-functional personalized route recommendation system for public transit commuters. This involves identifying commuter preferences and suitable recommendation techniques, and developing a platform to communicate this information to the commuters. We identify the requirements for the development of this platform, and propose an architecture for our system. As a proof of concept, we present an Android participatory sensing application - MetroCognition, which acquires feedback on convenience experienced by commuters in public transit
Reconstructing individual mobility from smart card transactions: A space alignment approach
Abstract-Smart card transactions capture rich information of human mobility and urban dynamics, therefore are of particular interest to urban planners and location-based service providers. However, since most transaction systems are only designated for billing purpose, typically, fine-grained location information, such as the exact boarding and alighting stops of a bus trip, is only partially or not available at all, which blocks deep exploitation of this rich and valuable data at individual level. This paper presents a "space alignment" framework to reconstruct individual mobility history from a large-scale smart card transaction dataset pertaining to a metropolitan city. Specifically, we show that by delicately aligning the monetary space and geospatial space with the temporal space, we are able to extrapolate a series of critical domain specific constraints. Later, these constraints are naturally incorporated into a semi-supervised conditional random field to infer the exact boarding and alighting stops of all transit routes with a surprisingly high accuracy, e.g., given only 10% trips with known alighting/boarding stops, we successfully inferred more than 78% alighting and boarding stops from all unlabeled trips. In addition, we demonstrated that the smart card data enriched by the proposed approach dramatically improved the performance of a conventional method for identifying users' home and work places (with 88% improvement on home detection and 35% improvement on work place detection). The proposed method offers the possibility to mine individual mobility from common public transit transactions, and showcases how uncertain data can be leveraged with domain knowledge and constraints, to support cross-application data mining tasks
Determinants of continuance intention of user on smartphone-based traveller information systems in the greater Klang Valley
In these modern-days, the use of mobile traveller information service is pivotal in the efficient and effective running of the transportation system for an urban area. The role of urban facilities managers in urban transportation planning is to develop a plan to provide drivers with real-time traveller information services to enable regional economic growth and transition. Existing research in the mobile information traveller information services area has not deeply investigated the determinants of continuance intention to use smartphone-based traveller information systems (STIS). The purpose of this study is to attempt to do so by investigating STIS usersâ continuance intention at the post-adoption phase. This study developed and validated an extended framework based on the expectation-confirmation model (ECM). The 280 STIS users from the Klang Valley highways and major streets participated in the study. The extended ECM results revealed that STIS usersâ continuance intention is determined by perceived enjoyment and perceived usefulness of continued STIS use, followed by satisfaction with STIS use. In this study, satisfaction and perceived usefulness are determined primarily by confirmation of expectation from participantsâ previous use, except for the perceived enjoyment. The findings of this study have implications for the transportation sectors in planning their strategies to increase usersâ continuance intention to use STIS services. Most of the current literature in mobile information services studies focused only on pre-adoption and have paid little attention to userâs continuance intention, especially in the context of smartphone apps or electronic information in the transportation system services. This study fills the theoretical and practical gaps by focusing on the post-adoption phase and developed an extended framework based on the ECM to explain the STIS continuance intention context. In addition, the topic is timely, as mobile information services have been flourishing in the current worldwide transportation sector services
Mining Public Transport Usage for Personalised Intelligent Transport Systems
Traveller information, route planning, and service updates have become essential components of public transport systems: they help people navigate built environments by providing access to information regarding delays and service disruptions. However, one aspect that these systems lack is a way of tailoring the information they offer in order to provide personalised trip time estimates and relevant notifications to each traveller. Mining each user's travel history, collected by automated ticketing systems, has the potential to address this gap. In this work, we analyse one such dataset of travel history on the London underground. We then propose and evaluate methods to (a) predict personalised trip times for the system users and (b) rank stations based on future mobility patterns, in order to identify the subset of stations that are of greatest interest to the user and thus provide useful travel updates. © 2010 IEEE
Recommended from our members
Optimising the Loading Diversity of Rail Passenger Crowding using On-Board Occupancy Data
Crowded conditions on trains can lead to lower passenger satisfaction, discourage rail travel, result in negative economic impacts and are a factor in a number of health and safety hazards. In the UK there is an annual survey of rail passenger crowding, although the measures used do not reflect coach-by-coach variations, nor do they reflect variations across the peak period.
In this MPhil thesis I investigated the application of weight-based automatic passenger counting data to deliver more even loadings on trains through the provision of new real-time and static solutions. In addition I investigated the potential benefits of such solutions in terms of reduced dwell times and reduced crowding. The overall concept proposed was to make the most of the existing available capacity; for example, so that no-one is standing when seats are available. Through analysing a large sample of air suspension data, I identified station-specific trends where some coaches were over capacity while others had spare capacity. I also conducted a critical review of academic research into on-train crowding and solutions that seek to optimise âloading diversityâ.
This study contributes to this emerging subject area in several ways: I propose two new metrics to describe inter-coach loading diversity that, unlike existing metrics, contain information relative to the capacity; I have revealed a link between the inter-coach loading diversity metrics and estimated boarding times, with trains classified as âvery unevenâ on departure typically having dwell times of approximately five to ten seconds greater than services that were classified as being âevenâ with a similar total number of passengers on board; and finally I have applied classification supervised learning techniques to predict the load factor for a given service and these predictors were an improvement over taking the historical average
Forestogram: Biclustering Visualization Framework with Applications in Public Transport and Bioinformatics
RĂSUMĂ : Dans de nombreux problĂšmes dâanalyse de donnĂ©es, les donnĂ©es sont exprimĂ©es dans une matrice avec les sujets en ligne et les attributs en colonne. Les mĂ©thodes de segmentations traditionnelles visent Ă regrouper les sujets (lignes), selon des critĂšres de similitude entre ces
sujets. Le but est de constituer des groupes de sujets (lignes) qui partagent un certain degrĂ© de ressemblance. Les groupes obtenus permettent de garantir que les sujets partagent des similitudes dans leurs attributs (colonnes), il nây a cependant aucune garantie sur ce qui se passe au niveau des attributs (les colonnes). Dans certaines applications, un regroupement simultanĂ© des lignes et des colonnes appelĂ© biclustering de la matrice de donnĂ©es peut ĂȘtre souhaitĂ©. Pour cela, nous concevons et dĂ©veloppons un nouveau cadre appelĂ© Forestogram, qui permet le calcul de ce regroupement simultanĂ© des lignes et des colonnes (biclusters)dans un mode hiĂ©rarchique. Le regroupement simultanĂ© des lignes et des colonnes de maniĂšre
hiérarchique peut aider les praticiens à mieux comprendre comment les groupes évoluent avec des propriétés théoriques intéressantes. Forestogram, le nouvel outil de calcul et de
visualisation proposĂ©, pourrait ĂȘtre considĂ©rĂ© comme une extension 3D du dendrogramme, avec une fusion orthogonale Ă©tendue. Chaque bicluster est constituĂ© dâun groupe de lignes (ou de sujets) qui dĂ©plie un schĂ©ma fortement corrĂ©lĂ© avec le groupe de colonnes (ou attributs)
correspondantes. Cependant, au lieu dâeffectuer un clustering bidirectionnel indĂ©pendamment de chaque cĂŽtĂ©, nous proposons un algorithme de biclustering hiĂ©rarchique qui prend les lignes et les colonnes en mĂȘme temps pour dĂ©terminer les biclusters. De plus, nous dĂ©veloppons un
critĂšre dâinformation basĂ© sur un modĂšle qui fournit un nombre estimĂ© de biclusters Ă travers un ensemble de configurations hiĂ©rarchiques au sein du forestogramme sous des hypothĂšses lĂ©gĂšres. Nous Ă©tudions le cadre suggĂ©rĂ© dans deux perspectives appliquĂ©es diffĂ©rentes, lâune dans le domaine du transport en commun, lâautre dans le domaine de la bioinformatique. En premier lieu, nous Ă©tudions le comportement des usagers dans le transport en commun
à partir de deux informations distinctes, les données temporelles et les coordonnées spatiales recueillies à partir des données de transaction de la carte à puce des usagers. Dans de nombreuses villes, les sociétés de transport en commun du monde entier utilisent un systÚme de
carte Ă puce pour gĂ©rer la perception des tarifs. Lâanalyse de cette information fournit un aperçu complet de lâinfluence de lâutilisateur dans le rĂ©seau de transport en commun interactif. Ă cet Ă©gard, lâanalyse des donnĂ©es temporelles, dĂ©crivant lâheure dâentrĂ©e dans le rĂ©seau
de transport en commun est considérée comme la composante la plus importante des données recueillies à partir des cartes à puce. Les techniques classiques de segmentation, basées sur la distance, ne sont pas appropriées pour analyser les données temporelles. Une nouvelle projection intuitive est suggérée pour conserver le modÚle de données horodatées. Ceci est introduit dans la méthode suggérée pour découvrir le modÚle temporel comportemental des
utilisateurs. Cette projection conserve la distance temporelle entre toute paire arbitraire de donnĂ©es horodatĂ©es avec une visualisation significative. Par consĂ©quent, cette information est introduite dans un algorithme de classification hiĂ©rarchique en tant que mĂ©thode de segmentation de donnĂ©es pour dĂ©couvrir le modĂšle des utilisateurs. Ensuite, lâheure dâutilisation est prise en compte comme une variable latente pour rendre la mĂ©trique euclidienne appropriĂ©e dans lâextraction du motif spatial Ă travers notre forestogramme. Comme deuxiĂšme application, le forestogramme est testĂ© sur un ensemble de donnĂ©es multiomiques combinĂ©es Ă partir de diffĂ©rentes mesures biologiques pour Ă©tudier comment lâĂ©tat de santĂ© des patientes et les modalitĂ©s biologiques correspondantes Ă©voluent hiĂ©rarchiquement au cours du terme de la grossesse, dans chaque bicluster. Le maintien de la grossesse repose sur un Ă©quilibre finement Ă©quilibrĂ© entre la tolĂ©rance Ă lâallogreffe foetale et la protection
mĂ©canismes contre les agents pathogĂšnes envahissants. MalgrĂ© lâimpact bien Ă©tabli du dĂ©veloppement pendant les premiers mois de la grossesse sur les rĂ©sultats Ă long terme, les interactions entre les divers mĂ©canismes biologiques qui rĂ©gissent la progression de la grossesse
nâont pas Ă©tĂ© Ă©tudiĂ©es en dĂ©tail. DĂ©montrer la chronologie de ces adaptations Ă la grossesse Ă terme fournit le cadre pour de futures Ă©tudes examinant les dĂ©viations impliquĂ©es dans les pathologies liĂ©es Ă la grossesse, y compris la naissance prĂ©maturĂ©e et la prĂ©Ă©clampsie. Nous effectuons une analyse multi-physique de 51 Ă©chantillons de 17 femmes enceintes, livrant Ă terme. Les ensembles de donnĂ©es comprennent des mesures de lâimmunome, du transcriptome,
du microbiome, du protĂ©ome et du mĂ©tabolome dâĂ©chantillons obtenus simultanĂ©ment chez les mĂȘmes patients. La modĂ©lisation prĂ©dictive multivariĂ©e utilisant lâalgorithme Elastic Net est utilisĂ©e pour mesurer la capacitĂ© de chaque ensemble de donnĂ©es Ă prĂ©dire lâĂąge gestationnel. En utilisant la gĂ©nĂ©ralisation empilĂ©e, ces ensembles de donnĂ©es sont combinĂ©s en un seul modĂšle. Ce modĂšle augmente non seulement significativement le pouvoir prĂ©dictif
en combinant tous les ensembles de donnĂ©es, mais rĂ©vĂšle Ă©galement de nouvelles interactions entre diffĂ©rentes modalitĂ©s biologiques. En outre, notre forestogramme suggĂ©rĂ© est une autre ligne directrice avec lâĂąge gestationnel au moment de lâĂ©chantillonnage qui fournit un modĂšle non supervisĂ© pour montrer combien dâinformations supervisĂ©es sont nĂ©cessaires pour chaque trimestre pour caractĂ©riser les changements induits par la grossesse dans Microbiome, Transcriptome, GĂ©nome, Exposome et Immunome rĂ©ponses efficacement.----------ABSTRACT : In many statistical modeling problems data are expressed in a matrix with subjects in row and attributes in column. In this regard, simultaneous grouping of rows and columns known
as biclustering of the data matrix is desired. We design and develop a new framework called Forestogram, with the aim of fast computational and hierarchical illustration of biclusters. Often in practical data analysis, we deal with a two-dimensional object known as the data matrix, where observations are expressed as samples (or subjects) in rows, and attributes (or features) in columns. Thus, simultaneous grouping of rows and columns in a hierarchical
manner helps practitioners better understanding how clusters evolve. Forestogram, a novel computational and visualization tool, could be thought of as a 3D expansion of dendrogram, with extended orthogonal merge. Each bicluster consists of group of rows (or samples) that
unfolds a highly-correlated schema with their corresponding group of columns (or attributes). However, instead of performing two-way clustering independently on each side, we propose a hierarchical biclustering algorithm which takes rows and columns at the same time to determine the biclusters. Furthermore, we develop a model-based information criterion which provides an estimated number of biclusters through a set of hierarchical configurations within the forestogram under mild assumptions. We study the suggested framework in two different applied perspectives, one in public transit domain, another one in bioinformatics field. First, we investigate the usersâ behavior in public transit based on two distinct information, temporal data and spatial coordinates gathered from smart card. In many cities, worldwide public transit companies use smart card system to manage fare collection. Analysis of this information provides a comprehensive insight of userâs influence in the interactive public transit network. In this regard, analysis of temporal data, describing the time of entering to the public transit network is considered as the most substantial component of the data gathered from the smart cards. Classical distance-based techniques are not always suitable to analyze this time series data. A novel projection with intuitive visual map from higher
dimension into a three-dimensional clock-like space is suggested to reveal the underlying temporal pattern of public transit users. This projection retains the temporal distance between any arbitrary pair of time-stamped data with meaningful visualization. Consequently, this information is fed into a hierarchical clustering algorithm as a method of data segmentation to discover the pattern of users. Then, the time of the usage is taken as a latent variable into account to make the Euclidean metric appropriate for extracting the spatial pattern through
our forestogram. As a second application, forestogram is tested on a multiomics dataset combined from different biological measurements to study how patients and corresponding biological modalities evolve hierarchically in each bicluster over the term of pregnancy. The maintenance of pregnancy relies on a finely-tuned balance between tolerance to the fetal allograft and protective
mechanisms against invading pathogens. Despite the well-established impact of development during the early months of pregnancy on long-term outcomes, the interactions between various biological mechanisms that govern the progression of pregnancy have not been studied in details. Demonstrating the chronology of these adaptations to term pregnancy provides the framework for future studies examining deviations implicated in pregnancy-related pathologies including preterm birth and preeclampsia. We perform a multiomics analysis of 51 samples from 17 pregnant women, delivering at term. The datasets include measurements from the immunome, transcriptome, microbiome, proteome, and metabolome of samples obtained
simultaneously from the same patients. Multivariate predictive modeling using the Elastic Net algorithm is used to measure the ability of each dataset to predict gestational age. Using stacked generalization, these datasets are combined into a single model. This model
not only significantly increases the predictive power by combining all datasets, but also reveals novel interactions between different biological modalities. Furthermore, our suggested forestogram is another guideline along with the gestational age at time of sampling that provides an unsupervised model to show how much supervised information is necessary for each trimester to characterize the pregnancy-induced changes in Microbiome, Transcriptome,
Genome, Exposome, and Immunome responses effectively
Méthodes spatio-temporelles de fouilles des données de cartes à puce en transport urbain
RĂSUMĂ: Les donnĂ©es des cartes Ă puce du systĂšme de transport en commun sont utiles pour comprendre le comportement des usagers du rĂ©seau du transport en commun. De nombreuses recherches pertinentes ont dĂ©jĂ Ă©tĂ© menĂ©es concernant : (1) l'utilisation de donnĂ©es de cartes Ă puce, (2) les techniques de fouille de donnĂ©es et (3) l'utilisation de la fouille de donnĂ©es avec des donnĂ©es de cartes Ă puce. Dans ces recherches, la classification des comportements des usagers est basĂ©e sur des dĂ©placements pour lesquels les classifications temporelles et spatiales sont considĂ©rĂ©es comme des processus sĂ©parĂ©s. Nos partenaires de recherche ont exprimĂ© le souhait de pouvoir examiner les comportements des usagers en considĂ©rant simultanĂ©ment les dimensions spatiales et temporelles. Dans cette thĂšse, nous dĂ©veloppons des mĂ©thodes, basĂ©es sur les comportements quotidiens des usagers, prenant en compte Ă la fois les comportements spatiaux et temporels. La mĂ©thodologie dĂ©veloppĂ©e pour classifier les comportements des utilisateurs de cartes Ă puce sâappuie sur la mĂ©thode de distance corrĂ©lation croisĂ©e (cross correlation distance, ou CCD), sur la dĂ©formation temporelle dynamique (dynamic time warping ou DTW), sur la classification hiĂ©rarchique et sur l'Ă©chantillonnage. De plus, une mĂ©thode basĂ©e sur la densitĂ© est aussi abordĂ©e. Cette thĂšse est contribuĂ©e de quatre articles plus dâautre rĂ©sultats prĂ©sentĂ©s dans un chapitre distinct: (1) Afin de commencer la classification temporelle, une comparaison entre CCD et DTW est faite en vue de choisir la meilleure mĂ©trique et dĂ©velopper une mĂ©thode de classification des sĂ©ries temporelles en utilisant la classification hiĂ©rarchique, et CCD a Ă©tĂ© prouvĂ© meilleur dans ce cas-ci. Avec cette mĂ©thode proposĂ©e, un morceau des comportements temporels peut ĂȘtre classifiĂ©. (2) Afin de rĂ©aliser la classification temporelle pour les donnĂ©es massives, une mĂ©thode dâĂ©chantillonnage permettant de traiter les grands volumes de donnĂ©es provenant des systĂšmes de cartes Ă puce de transport en commun ainsi quâun indicateur de calibration de cette mĂ©thode sont proposĂ©s. Cette mĂ©thode dâĂ©chantillonnage nous permet de classifier tous les comportements temporels dâusagers dans un rĂ©seau de transports en commun, et cet indicateur nous permet de choisir les meilleurs paramĂštres dans lâalgorithme. (3) Afin de regrouper les comportements spatiaux et spatio-temporels dâusagers en transport en commun, des mĂ©thodes de classification spatiale et spatio-temporelle de comportements des usagers en ajustant lâalgorithme de DTW sont dĂ©veloppĂ©es, et des mĂ©thodes de visualisation des rĂ©sultats en appliquant un graphique spatio-temporel en 3 dimensions sont aussi dĂ©veloppĂ©es, en vue de montrer l'efficacitĂ© de l'algorithme. La visualisation des rĂ©sultats nous montre lâeffectivitĂ© de ces deux mĂ©thodes. (4) Afin de tester si la mĂ©thode de classification dĂ©veloppĂ©e dans une ville sâapplique dans une autre ville, nous dĂ©veloppons une mĂ©thode de reconnaissance et de comparaison des comportements de deux villes entre le Canada et le Chile. Les rĂ©sultats montent quâenviron 66% de comportements temporelles peuvent ĂȘtre reconnu donnĂ© un profile de transaction dâun jour, et lâexactitude de reconnaissance est environ 70%. (5) Afin dâanalyser les rĂ©sultats de les classifications spatiale et spatio-temporelle plus profonde, des analyses sont faits incluant la proportion de mĂ©tro, le moyen et la dĂ©viation de trajectoire espace-temps etc, et ces analyse nous permet dâidentifier les diffĂ©rences de demande entre les groupes obtenus. (6) En outre, des mĂ©thodes de classification de zones gĂ©ographiques basĂ©es sur la densitĂ© pour la mesure du changement de comportements des usagers sont dĂ©veloppĂ©s. Afin de tester ces mĂ©thodes, des donnĂ©es massives provenant des systĂšmes de perception automatique de la SociĂ©tĂ© de Transport lâOutaouais (STO) de Gatineau et de TranSantiago de Santiago (Chili) sont utilisĂ©es. Concernant lâimplĂ©mentation, les mĂ©thodes proposĂ©es sont programmĂ©es en Python. Les rĂ©sultats des mĂ©thodes, non seulement permettent de regrouper les profils des usagers du transport en commun en quelques groupes et de mieux connaĂźtre les caractĂ©ristiques de chacun, mais aussi de dĂ©velopper une sĂ©rie de mĂ©thodes de visualisation, avec lesquelles les donnĂ©es peuvent ĂȘtre traitĂ©es automatiquement pour que des graphiques soient gĂ©nĂ©rĂ©s. GrĂące Ă ces graphiques, les autoritĂ©s de transport en commun peuvent traduire les donnĂ©es recueillies automatiquement pour illustrer la demande de transport. Par consĂ©quent, des chercheurs espĂšrent ces contributions aideront les autoritĂ©s pour planifier les transports en commun afin de mieux rĂ©pondre aux demandes des citoyens.----------ABSTRACT: Transit smart card data is useful for understanding the behavior of transit users. Numerous relevant research has been conducted on: (1) the use of smart card data, (2) data mining techniques and (3) the use of data mining with smart card data. In this research, the classification of user behavior is based on travel in which temporal and spatial classifications are considered as separate processes. We develop methods, based on the daily behaviors of users, taking into account both spatial and temporal behaviors. The methodology developed to classify the behavior of smart card users is based on the cross correlation distance (CCD) method, dynamic time warping (DTW), hierarchical classification and sampling method. In addition, the density-based method is also affected.
This thesis is presented with four articles plus other results in a separate chapter: (1) In order to start the temporal classification, a comparison between CCD and DTW is made in order to choose the best metric and develop a method of classification of time series using hierarchical classification. CCD has been proved better in this case. A piece of temporal behaviors can be classified with this proposed method. (2) In order to achieve temporal classification for Big Data, a sampling method for processing large volumes of data from transit smart card systems and a calibration indicator for this method are proposed. This sampling method allows us to classify all the usersâ temporal behaviors in a public transport network, and this indicator allows us to choose the best parameters in the algorithm. (3) In order to classify the spatial and spatio-temporal behavior of users in public transport, methods of spatial and spatio-temporal classification of user behaviors by adjusting the DTW algorithm is developed, and a method of visualization of the results by applying a 3-dimensional spatio-temporal graph is also developed, to show the efficiency of the algorithm. The visualization of the results shows us the effectiveness of these two methods. (4) In order to test whether the classification method developed in one city applies in another city, we develop a method to recognize and compare the behavior of two cities between Canada and Chile. The results show that about 66% of temporal behaviors can be recognized given one-day transaction profiles of two cities, and the recognition accuracy is about 70%. (5) For a deeper view of the spatio-temporal classifications results, analyzes are made including the proportion of metro utilisation, the mean and the deviation of space-time trajectory etc, and these analyses allow us to identify the differences of demands between the clusters obtained. (6) In addition, density-based geographic classification methods for measuring the change of user behavior are developed. To test these methods, massive data from the Automated Collection System of the la SociĂ©tĂ© de Transport lâOutaouais (STO) and the TranSantiago of Santiago de Chile are used. Regarding the implementation, the proposed methods are programmed in python. The result of these methods not only allows the profiles of transit users to be grouped in a few groups and better understand the characteristics of each, but also creates a series of visualization approaches with which data can be directly transferred to the graphs. With these graphs, transit authorities can translate automatically collected data into traveler demand. As a result, researchers hope that these contributions help the authorities to plan public transit by better meeting the demands of citizens