19 research outputs found
COINSTAC: A Privacy Enabled Model and Prototype for Leveraging and Processing Decentralized Brain Imaging Data
The field of neuroimaging has embraced the need for sharing and collaboration. Data sharing mandates from public funding agencies and major journal publishers have spurred the development of data repositories and neuroinformatics consortia. However, efficient and effective data sharing still faces several hurdles. For example, open data sharing is on the rise but is not suitable for sensitive data that are not easily shared, such as genetics. Current approaches can be cumbersome (such as negotiating multiple data sharing agreements). There are also significant data transfer, organization and computational challenges. Centralized repositories only partially address the issues. We propose a dynamic, decentralized platform for large scale analyses called the Collaborative Informatics and Neuroimaging Suite Toolkit for Anonymous Computation (COINSTAC). The COINSTAC solution can include data missing from central repositories, allows pooling of both open and ``closed'' repositories by developing privacy-preserving versions of widely-used algorithms, and incorporates the tools within an easy-to-use platform enabling distributed computation. We present an initial prototype system which we demonstrate on two multi-site data sets, without aggregating the data. In addition, by iterating across sites, the COINSTAC model enables meta-analytic solutions to converge to ``pooled-data'' solutions (i.e. as if the entire data were in hand). More advanced approaches such as feature generation, matrix factorization models, and preprocessing can be incorporated into such a model. In sum, COINSTAC enables access to the many currently unavailable data sets, a user friendly privacy enabled interface for decentralized analysis, and a powerful solution that complements existing data sharing solutions
Recommended from our members
Knowledge Discovery and Data Mining (KDDM) survey report.
The large number of government and industry activities supporting the Unit of Action (UA), with attendant documents, reports and briefings, can overwhelm decision-makers with an overabundance of information that hampers the ability to make quick decisions often resulting in a form of gridlock. In particular, the large and rapidly increasing amounts of data and data formats stored on UA Advanced Collaborative Environment (ACE) servers has led to the realization that it has become impractical and even impossible to perform manual analysis leading to timely decisions. UA Program Management (PM UA) has recognized the need to implement a Decision Support System (DSS) on UA ACE. The objective of this document is to research the commercial Knowledge Discovery and Data Mining (KDDM) market and publish the results in a survey. Furthermore, a ranking mechanism based on UA ACE-specific criteria has been developed and applied to a representative set of commercially available KDDM solutions. In addition, an overview of four R&D areas identified as critical to the implementation of DSS on ACE is provided. Finally, a comprehensive database containing detailed information on surveyed KDDM tools has been developed and is available upon customer request
Mechanism for Change Detection in HTML Web Pages as XML Documents
Veebilehtede muudatuste tuvastamine on oluline osa veebi monitoorimisest. Veebi automaatset monitoorimist saab kasutada spetsiiflise informatsiooni kogumiseks, näiteks avalike teadaannete, uudiste või hinnamuutuste automaatseks märkamiseks. Kui lehe HTML-kood talletada, on võimalik seda lehte uuesti külastades uut ja eelnevat koodi võrrelda ning nendevahelised erinevused leida. HTML-koode saab võrrelda tavateksti võrdlemise meetodite abil, kuid sel juhul riskime lehe struktuuri kohta käiva informatsiooni kaotamisega. HTML-kood on struktuurilt puulaadne ja selle omaduse säilitamine muudatuste tuvastamisel on soovitav. Selles töös kirjeldame mehhanismi, millega eelnevalt kogutud HTML-koodis lehed teisendatakse XML dokumentide kujule ning võrreldakse neid XML puudena. Me kirjeldame selle ülesande täitmiseks vajalikke komponente ja oma teostust, mis kasutab NutchWAX-i, NekoHTML-i, XMLUnit-it, Jena-t ja MongoDBd. Me analüüsime mõõtmistulemusi, mis koguti selle programmiga 1,1 miljoni HTML lehe läbimisel. Meile teadaolevatel andmetel pole sellist mehhanismi varem rakendatud. Me näitame, et mehhanism on kasutatav tegelikkuses esinevate andmete töötlemiseks.Change detection of web pages is an important aspect of web monitoring. Automated web monitoring can be used for the collection of specifc information, for example for detecting public announcements, news posts and changes of prices. If we store the HTML code of a page, we can compare the current and previous codes when we revisit the page, allowing us to find their changes. HTML code can be compared using ordinary text comparison, but this brings the risk of losing information about the structure of the page. HTML code is treelike in structure and it is a desirable property to preserve when finding changes. In this work we describe a mechanism that can be applied to collected HTML pages to find their changes by transforming HTML pages into XML documents and comparing the resulting XML trees. We give a general list of the components needed for this task, describe our implementation which uses NutchWAX, NekoHTML, XMLUnit, Jena and MongoDB, and show the results of applying the program to a dataset. We analyse the results of measurements collected when running our program on 1.1 million HTML pages. To our knowledge this mechanism has not been tested in previous works. We show that the mechanism is usable on real world data
Simulations of amphiphilic fluids using mesoscale lattice-Boltzmann and lattice-gas methods
We compare two recently developed mesoscale models of binary immiscible and
ternary amphiphilic fluids. We describe and compare the algorithms in detail
and discuss their stability properties. The simulation results for the cases of
self-assembly of ternary droplet phases and binary water-amphiphile sponge
phases are compared and discussed. Both models require parallel implementation
and deployment on large scale parallel computing resources in order to achieve
reasonable simulation times for three-dimensional models. The parallelisation
strategies and performance on two distinct parallel architectures are compared
and discussed. Large scale three dimensional simulations of multiphase fluids
requires the extensive use of high performance visualisation techniques in
order to enable the large quantities of complex data to be interpreted. We
report on our experiences with two commercial visualisation products: AVS and
VTK. We also discuss the application and use of novel computational steering
techniques for the more efficient utilisation of high performance computing
resources. We close the paper with some suggestions for the future development
of both models.Comment: 30 pages, 9 figure
Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks
Malware still constitutes a major threat in the cybersecurity landscape, also
due to the widespread use of infection vectors such as documents. These
infection vectors hide embedded malicious code to the victim users,
facilitating the use of social engineering techniques to infect their machines.
Research showed that machine-learning algorithms provide effective detection
mechanisms against such threats, but the existence of an arms race in
adversarial settings has recently challenged such systems. In this work, we
focus on malware embedded in PDF files as a representative case of such an arms
race. We start by providing a comprehensive taxonomy of the different
approaches used to generate PDF malware, and of the corresponding
learning-based detection systems. We then categorize threats specifically
targeted against learning-based PDF malware detectors, using a well-established
framework in the field of adversarial machine learning. This framework allows
us to categorize known vulnerabilities of learning-based PDF malware detectors
and to identify novel attacks that may threaten such systems, along with the
potential defense mechanisms that can mitigate the impact of such threats. We
conclude the paper by discussing how such findings highlight promising research
directions towards tackling the more general challenge of designing robust
malware detectors in adversarial settings
Toward Real-Time Video-Enhanced Augmented Reality for Medical Visualization and Simulation
In this work we demonstrate two separate forms of augmented reality environments for use with minimally-invasive surgical techniques. In Chapter 2 it is demonstrated how a video feed from a webcam, which could mimic a laparoscopic or endoscopic camera used during an interventional procedure, can be used to identify the pose of the camera with respect to the viewed scene and augment the video feed with computer-generated information, such as rendering of internal anatomy not visible beyond the image surface, resulting in a simple augmented reality environment. Chapter 3 details our implementation of a similar system to the one previously mentioned, albeit with an external tracking system.
Additionally, we discuss the challenges and considerations for expanding this system to support an external tracking system, specifically the Polaris Spectra optical tracker. Because of the relocation of the tracking origin to a point other than the camera center, there is an additional registration step necessary to establish the position of all components within the scene. This modification is expected to increase accuracy and robustness of the system
Utilising the grid for augmented reality
Traditionally registration and tracking within Augmented Reality (AR) applications have been built around specific markers which have been added into the user’s viewpoint and allow for their position to be tracked and their orientation to be estimated in real-time. All attempts to implement AR without specific markers have increased the computational requirements and some information about the environment is still needed in order to match the registration between the real world and the virtual artifacts. This thesis describes a novel method that not only provides a generic platform for AR but also seamlessly deploys High Performance Computing (HPC) resources to deal with the additional computational load, as part of the distributed High Performance Visualization (HPV) pipeline used to render the virtual artifacts. The developed AR framework is then applied to a real world application of a marker-less AR interface for Transcranial Magnetic Stimulation (TMS), named BART (Bangor Augmented Reality for TMS). Three prototypes of BART are presented, along with a discussion of the subsequent limitations and solutions of each. First by using a proprietary tracking system it is possible to achieve accurate tracking, but with the limitations of having to use bold markers and being unable to render the virtual artifacts in real time. Second, BART v2 implements a novel tracking system using computer vision techniques. Repeatable feature points are extracted from the users view point to build a description of the object or plane that the virtual artifact is aligned with. Then as each frame is updated we use the changing position of the feature points to estimate how the object has moved. Third, the e-Viz framework is used to autonomously deploy HPV resources to ensure that the virtual objects are rendered in real-time. e-Viz also enables the allocation of remote High Performance Computing (HPC) resources to handle the computational requirements of the object tracking and pose estimation
Opportunities and obstacles for deep learning in biology and medicine
Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network\u27s prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine