Search CORE

6 research outputs found

BrowserFlow: imprecise data flow tracking to prevent accidental data disclosure

Author: Muthukumaran D
Papagiannis I
Pietzuch PR
Watcharapichat P
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/09/2016
Field of study

With the use of external cloud services such as Google Docs or Evernote in an enterprise setting, the loss of control over sensitive data becomes a major concern for organisations. It is typical for regular users to violate data disclosure policies accidentally, e.g. when sharing text between documents in browser tabs. Our goal is to help such users comply with data disclosure policies: we want to alert them about potentially unauthorised data disclosure from trusted to untrusted cloud services. This is particularly challenging when users can modify data in arbitrary ways, they employ multiple cloud services, and cloud services cannot be changed. To track the propagation of text data robustly across cloud services, we introduce imprecise data flow tracking, which identifies data flows implicitly by detecting and quantifying the similarity between text fragments. To reason about violations of data disclosure policies, we describe a new text disclosure model that, based on similarity, associates text fragments in web browsers with security tags and identifies unauthorised data flows to untrusted services. We demonstrate the applicability of imprecise data tracking through BrowserFlow, a browser-based middleware that alerts users when they expose potentially sensitive text to an untrusted cloud service. Our experiments show that BrowserFlow can robustly track data flows and manage security tags for documents with no noticeable performance impact

Spiral - Imperial College Digital Repository

Vertical Transmission of Orientia tsutsugamushi

Author: Duangporn Phulsuksombati
Pochaman Watcharapichat
Stephen P. Frances
Publication venue: 'Entomological Society of America'
Publication date
Field of study

Crossref

Occurrence of Orientia tsutsugamushi

Author: Duangporn Phulsuksombati
Panita Tanskul
Pochaman Watcharapichat
Stephen P. Frances
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

Crossbow: scaling deep learning with small batch sizes on multi-GPU servers

Author: Costa P
Koliousis A
Mai L
Pietzuch P
Watcharapichat P
Weidlich M
Publication venue: 'VLDB Endowment'
Publication date: 08/01/2019
Field of study

Deep learning models are trained on servers with many GPUs, andtraining must scale with the number of GPUs. Systems such asTensorFlow and Caffe2 train models with parallel synchronousstochastic gradient descent: they process a batch of training data ata time, partitioned across GPUs, and average the resulting partialgradients to obtain an updated global model. To fully utilise allGPUs, systems must increase the batch size, which hinders statisticalefficiency. Users tune hyper-parameters such as the learning rate tocompensate for this, which is complex and model-specific.We describeCROSSBOW, a new single-server multi-GPU sys-tem for training deep learning models that enables users to freelychoose their preferred batch size—however small—while scalingto multiple GPUs.CROSSBOWuses many parallel model replicasand avoids reduced statistical efficiency through a new synchronoustraining method. We introduceSMA, a synchronous variant of modelaveraging in which replicasindependentlyexplore the solution spacewith gradient descent, but adjust their searchsynchronouslybased onthe trajectory of a globally-consistent average model.CROSSBOWachieves high hardware efficiency with small batch sizes by poten-tially training multiple model replicas per GPU, automatically tuningthe number of replicas to maximise throughput. Our experimentsshow thatCROSSBOWimproves the training time of deep learningmodels on an 8-GPU server by 1.3–4×compared to TensorFlow

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Meta-dataflows: efficient exploratory dataflow jobs

Author: Castro Fernandez R
Culhane W
Pietzuch PR
Watcharapichat P
Weidlich M
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2018
Field of study

Distributed dataflow systems such as Apache Spark and Apache Flink are used to derive new insights from large datasets. While they efficiently execute concrete data processing workflows, expressed as dataflow graphs, they lack generic support for exploratory work- flows : if a user is uncertain about the correct processing pipeline, e.g. in terms of data cleaning strategy or choice of model parame- ters, they must repeatedly submit modified jobs to the system. This, however, misses out on optimisation opportunities for exploratory workflows, both in terms of scheduling and memory allocation. We describe meta-dataflows (MDFs), a new model to effectively express exploratory workflows and efficiently execute them on compute clusters. With MDFs, users specify a family of dataflows using two primitives: (a) an explore operator automatically con- siders choices in a dataflow; and (b) a choose operator assesses the result quality of explored dataflow branches and selects a subset of the results. We propose optimisations to execute MDFs: a system can (i) avoid redundant computation when exploring branches by reusing intermediate results and discarding results from underper- forming branches; and (ii) consider future data access patterns in the MDF when allocating cluster memory. Our evaluation shows that MDFs improve the runtime of exploratory workflows by up to 90% compared to sequential execution

Spiral - Imperial College Digital Repository

Transstadial and Transovarial Transmission of Orientia tsutsugamushi

Author: Duangporn Phulsuksombati
Kenneth J. Linthicum
Kriangkrai Lerdthusnee
Panita Tanskul
Pochaman Watcharapichat
Siriporn Phasomkusolsil
Stephen P. Frances
Supaporn Ratanatham
Publication venue: 'Entomological Society of America'
Publication date
Field of study

Crossref