Search CORE

163 research outputs found

MetisFL: An Embarrassingly Parallelized Controller for Scalable & Efficient Federated Learning Workflows

Author: Ambite Jose Luis
Anastasiou Chrysovalantis
Asghar Armaghan
Stripelis Dimitris
Toral Patrick
Publication venue
Publication date: 13/11/2023
Field of study

A Federated Learning (FL) system typically consists of two core processing entities: the federation controller and the learners. The controller is responsible for managing the execution of FL workflows across learners and the learners for training and evaluating federated models over their private datasets. While executing an FL workflow, the FL system has no control over the computational resources or data of the participating learners. Still, it is responsible for other operations, such as model aggregation, task dispatching, and scheduling. These computationally heavy operations generally need to be handled by the federation controller. Even though many FL systems have been recently proposed to facilitate the development of FL workflows, most of these systems overlook the scalability of the controller. To meet this need, we designed and developed a novel FL system called MetisFL, where the federation controller is the first-class citizen. MetisFL re-engineers all the operations conducted by the federation controller to accelerate the training of large-scale FL workflows. By quantitatively comparing MetisFL against other state-of-the-art FL systems, we empirically demonstrate that MetisFL leads to a 10-fold wall-clock time execution boost across a wide range of challenging FL workflows with increasing model sizes and federation sites.Comment: 15 pages, 11 figures, Accepted at DistributedML '2

arXiv.org e-Print Archive

Exploiting data semantics to discover, extract, and model web sources

Author: Anon Plangprasopchok
Cenk Gazen
Craig A. Knoblock
José Luis Ambite
Kristina Lerman
Mark Carman
Steven Minton
Thomas Russ
Publication venue
Publication date: 01/01/2008
Field of study

We describe DEIMOS, a system that automatically discovers and models new sources of information. The system exploits four core technologies developed by our group that makes an end-to-end solution to this problem possible. First, given an example source, DEIMOS finds other similar sources online. Second, it invokes and extracts data from these sources. Third, given the syntactic structure of a source, DEIMOS maps its inputs and outputs to semantic types. Finally, it infers the source’s semantic definition, i.e., the function that maps the inputs to the outputs. DEIMOS is able to successfully automate these steps by exploiting a combination of background knowledge and data semantics. We describe the challenges in integrating separate components into a unified approach to discovering, extracting and modeling new online sources. We provide an end-toend validation of the system in two information domains to show that it can successfully discover and model new data sources in those domains. 1

CiteSeerX

Crossref

NeuroBridge ontology: computable provenance metadata to give the long tail of neuroimaging data a FAIR chance for secondary use

Author: Ambite Jose Luis
Appaji Abhishek
Lander Howard M.
Rajasekar Arcot
Sahoo Satya S.
Turner Jessica A.
Turner Matthew D.
Wang Lei
Wang Yue
Publication venue: Frontiers Media SA
Publication date: 01/01/2023
Field of study

Background Despite the efforts of the neuroscience community, there are many published neuroimaging studies with data that are still not findable or accessible. Users face significant challenges in reusing neuroimaging data due to the lack of provenance metadata, such as experimental protocols, study instruments, and details about the study participants, which is also required for interoperability. To implement the FAIR guidelines for neuroimaging data, we have developed an iterative ontology engineering process and used it to create the NeuroBridge ontology. The NeuroBridge ontology is a computable model of provenance terms to implement FAIR principles and together with an international effort to annotate full text articles with ontology terms, the ontology enables users to locate relevant neuroimaging datasets. Methods Building on our previous work in metadata modeling, and in concert with an initial annotation of a representative corpus, we modeled diagnosis terms (e.g., schizophrenia, alcohol usage disorder), magnetic resonance imaging (MRI) scan types (T1-weighted, task-based, etc.), clinical symptom assessments (PANSS, AUDIT), and a variety of other assessments. We used the feedback of the annotation team to identify missing metadata terms, which were added to the NeuroBridge ontology, and we restructured the ontology to support both the final annotation of the corpus of neuroimaging articles by a second, independent set of annotators, as well as the functionalities of the NeuroBridge search portal for neuroimaging datasets. Results The NeuroBridge ontology consists of 660 classes with 49 properties with 3,200 axioms. The ontology includes mappings to existing ontologies, enabling the NeuroBridge ontology to be interoperable with other domain specific terminological systems. Using the ontology, we annotated 186 neuroimaging full-text articles describing the participant types, scanning, clinical and cognitive assessments. ConclusionThe NeuroBridge ontology is the first computable metadata model that represents the types of data available in recent neuroimaging studies in schizophrenia and substance use disorders research; it can be extended to include more granular terms as needed. This metadata ontology is expected to form the computational foundation to help both investigators to make their data FAIR compliant and support users to conduct reproducible neuroimaging research

Carolina Digital Repository

NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding.

Author: Alachram Halima
Ambite José Luis
Ananiadou Sophia
Beißbarth Tim
Chambers Brendan
Christopoulou Fenia
Evans James A
Galstyan Aram
Gao Xin
Garg Sahil
Hermjakob Ulf
Khomtchouk Bohdan B
King Ross
Li Maolin
Li Yu
Marcu Daniel
Matthew Joel
Pan Weidi
Rzhetsky Andrey
Schoene Annika M
Sheng Emily
Soldatova Larisa
Stevens Robert
Wang Kanix
Wingender Edgar
Publication venue: NPJ Syst Biol Appl
Publication date: 01/01/2021
Field of study

Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades1,2, the most dramatic advances in MR have followed in the wake of critical corpus development3. Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet4 was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus

Goldsmiths Research Online

Directory of Open Access Journals

Chalmers Research

Apollo (Cambridge)