Search CORE

379 research outputs found

Technical support for Life Sciences communities on a production grid infrastructure

Author: Glatard Tristan
Michel Franck
Montagnat Johan
Publication venue
Publication date: 11/03/2012
Field of study

Production operation of large distributed computing infrastructures (DCI) still requires a lot of human intervention to reach acceptable quality of service. This may be achievable for scientific communities with solid IT support, but it remains a show-stopper for others. Some application execution environments are used to hide runtime technical issues from end users. But they mostly aim at fault-tolerance rather than incident resolution, and their operation still requires substantial manpower. A longer-term support activity is thus needed to ensure sustained quality of service for Virtual Organisations (VO). This paper describes how the biomed VO has addressed this challenge by setting up a technical support team. Its organisation, tooling, daily tasks, and procedures are described. Results are shown in terms of resource usage by end users, amount of reported incidents, and developed software tools. Based on our experience, we suggest ways to measure the impact of the technical support, perspectives to decrease its human cost and make it more community-specific.Comment: HealthGrid'12, Amsterdam : Netherlands (2012

arXiv.org e-Print Archive

High-Resolution Road Vehicle Collision Prediction for the City of Montreal

Author: Glatard Tristan
Guédon Timothée
Hébert Antoine
Jaumard Brigitte
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/11/2019
Field of study

Road accidents are an important issue of our modern societies, responsible for millions of deaths and injuries every year in the world. In Quebec only, in 2018, road accidents are responsible for 359 deaths and 33 thousands of injuries. In this paper, we show how one can leverage open datasets of a city like Montreal, Canada, to create high-resolution accident prediction models, using big data analytics. Compared to other studies in road accident prediction, we have a much higher prediction resolution, i.e., our models predict the occurrence of an accident within an hour, on road segments defined by intersections. Such models could be used in the context of road accident prevention, but also to identify key factors that can lead to a road accident, and consequently, help elaborate new policies. We tested various machine learning methods to deal with the severe class imbalance inherent to accident prediction problems. In particular, we implemented the Balanced Random Forest algorithm, a variant of the Random Forest machine learning algorithm in Apache Spark. Interestingly, we found that in our case, Balanced Random Forest does not perform significantly better than Random Forest. Experimental results show that 85% of road vehicle collisions are detected by our model with a false positive rate of 13%. The examples identified as positive are likely to correspond to high-risk situations. In addition, we identify the most important predictors of vehicle collisions for the area of Montreal: the count of accidents on the same road segment during previous years, the temperature, the day of the year, the hour and the visibility

arXiv.org e-Print Archive

Crossref

Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering

Author: Barghi Soudabeh
Glatard Tristan
Salari Ali
Scaria Lalet
Publication venue
Publication date: 25/09/2018
Field of study

Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and storage requirements. We present a method to predict the computational reproducibility of data analysis pipelines in large population studies. We formulate the problem as a collaborative filtering process, with constraints on the construction of the training set. We propose 6 different strategies to build the training set, which we evaluate on 2 datasets, a synthetic one modeling a population with a growing number of subject types, and a real one obtained with neuroinformatics pipelines. Results show that one sampling method, "Random File Numbers (Uniform)" is able to predict computational reproducibility with a good accuracy. We also analyze the relevance of including file and subject biases in the collaborative filtering model. We conclude that the proposed method is able to speedup reproducibility evaluations substantially, with a reduced accuracy loss

arXiv.org e-Print Archive

Crossref

Scipedia

Mondrian Forest for Data Stream Classification Under Memory Constraints

Author: Glatard Tristan
Khannouz Martin
Publication venue
Publication date: 16/09/2022
Field of study

Supervised learning algorithms generally assume the availability of enough memory to store their data model during the training and test phases. However, in the Internet of Things, this assumption is unrealistic when data comes in the form of infinite data streams, or when learning algorithms are deployed on devices with reduced amounts of memory. In this paper, we adapt the online Mondrian forest classification algorithm to work with memory constraints on data streams. In particular, we design five out-of-memory strategies to update Mondrian trees with new data points when the memory limit is reached. Moreover, we design trimming mechanisms to make Mondrian trees more robust to concept drifts under memory constraints. We evaluate our algorithms on a variety of real and simulated datasets, and we conclude with recommendations on their use in different situations: the Extend Node strategy appears as the best out-of-memory strategy in all configurations, whereas different trimming mechanisms should be adopted depending on whether a concept drift is expected. All our methods are implemented in the OrpailleCC open-source library and are ready to be used on embedded systems and connected objects

arXiv.org e-Print Archive

Implementation of Turing machines with the Scufl data-flow language

Author: Glatard Tristan
Montagnat Johan
Publication venue: HAL CCSD
Publication date: 01/05/2008
Field of study

International audienceIn this paper, the expressiveness of the simple Scufl data-flow language is studied by showing how it can be used to implement Turing machines. To do that, several non trivial Scufl patterns such as self-looping or sub-workflows are required and we precisely explicit them. The main result of this work is to show how a complex workflow can be implemented using a very simple data-flow language. Beyond that, it shows that Scufl is a Turing complete language, given some restrictions that we discuss

Crossref

HAL-UNICE

INRIA a CCSD electronic archive server