Search CORE

43,843 research outputs found

Annotating Synapses in Large EM Datasets

Author: Huang Gary B.
Olbris Donald J.
Parag Toufiq
Plaza Stephen M.
Rivlin Patricia K.
Saunders Mathew A.
Publication venue
Publication date: 04/12/2014
Field of study

Reconstructing neuronal circuits at the level of synapses is a central problem in neuroscience and becoming a focus of the emerging field of connectomics. To date, electron microscopy (EM) is the most proven technique for identifying and quantifying synaptic connections. As advances in EM make acquiring larger datasets possible, subsequent manual synapse identification ({\em i.e.}, proofreading) for deciphering a connectome becomes a major time bottleneck. Here we introduce a large-scale, high-throughput, and semi-automated methodology to efficiently identify synapses. We successfully applied our methodology to the Drosophila medulla optic lobe, annotating many more synapses than previous connectome efforts. Our approaches are extensible and will make the often complicated process of synapse identification accessible to a wider-community of potential proofreaders

arXiv.org e-Print Archive

CiteSeerX

Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy

Author: Bertram Ludäscher
Brian
Junfei Qiu
Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/10/2018
Field of study

Data collection for scientific applications is increasing exponentially and is forecasted to soon reach peta- and exabyte scales. Applications which process and analyze scientific data must be scalable and focus on execution performance to keep pace. In the field of radio astronomy, in addition to increasingly large datasets, tasks such as the identification of transient radio signals from extrasolar sources are computationally expensive. We present a scalable approach to radio pulsar detection written in Scala that parallelizes candidate identification to take advantage of in-memory task processing using Apache Spark on a YARN distributed system. Furthermore, we introduce a novel automated multiclass supervised machine learning technique that we combine with feature selection to reduce the time required for candidate classification. Experimental testing on a Beowulf cluster with 15 data nodes shows that the parallel implementation of the identification algorithm offers a speedup of up to 5X that of a similar multithreaded implementation. Further, we show that the combination of automated multiclass classification and feature selection speeds up the execution performance of the RandomForest machine learning algorithm by an average of 54% with less than a 2% average reduction in the algorithm's ability to correctly classify pulsars. The generalizability of these results is demonstrated by using two real-world radio astronomy data sets.Comment: In Proceedings of the 47th International Conference on Parallel Processing (ICPP 2018). ACM, New York, NY, USA, Article 11, 11 page

arXiv.org e-Print Archive

Crossref

Automated legal sensemaking: the centrality of relevance and intentionality

Author: Bruce Hedin
Chris Hogan
Robert S. Bauer
Stevenson St
Teresa Jade
Publication venue
Publication date: 01/01/2008
Field of study

Introduction: In a perfect world, discovery would ideally be conducted by the senior litigator who is responsible for developing and fully understanding all nuances of their client’s legal strategy. Of course today we must deal with the explosion of electronically stored information (ESI) that never is less than tens-of-thousands of documents in small cases and now increasingly involves multi-million-document populations for internal corporate investigations and litigations. Therefore scalable processes and technologies are required as a substitute for the authority’s judgment. The approaches taken have typically either substituted large teams of surrogate human reviewers using vastly simplified issue coding reference materials or employed increasingly sophisticated computational resources with little focus on quality metrics to insure retrieval consistent with the legal goal. What is required is a system (people, process, and technology) that replicates and automates the senior litigator’s human judgment. In this paper we utilize 15 years of sensemaking research to establish the minimum acceptable basis for conducting a document review that meets the needs of a legal proceeding. There is no substitute for a rigorous characterization of the explicit and tacit goals of the senior litigator. Once a process has been established for capturing the authority’s relevance criteria, we argue that literal translation of requirements into technical specifications does not properly account for the activities or states-of-affairs of interest. Having only a data warehouse of written records, it is also necessary to discover the intentions of actors involved in textual communications. We present quantitative results for a process and technology approach that automates effective legal sensemaking

CiteSeerX

UCL Discovery

PhenDisco: phenotype discovery system for the database of genotypes and phenotypes.

Author: Alipanah Neda
Conway Mike
Doan Son
Farzaneh Seena
Feupe Stephanie Feudjio
Garland Asher
Hsieh Alex
Jiang Xiaoqian
Kim Hyeon-Eui
Lin Ko-Wei
Ohno-Machado Lucila
Ross Mindy K
Walker Rebecca
Xu Hua
Zhang Jing
Publication venue: eScholarship, University of California
Publication date: 29/08/2013
Field of study

The database of genotypes and phenotypes (dbGaP) developed by the National Center for Biotechnology Information (NCBI) is a resource that contains information on various genome-wide association studies (GWAS) and is currently available via NCBI's dbGaP Entrez interface. The database is an important resource, providing GWAS data that can be used for new exploratory research or cross-study validation by authorized users. However, finding studies relevant to a particular phenotype of interest is challenging, as phenotype information is presented in a non-standardized way. To address this issue, we developed PhenDisco (phenotype discoverer), a new information retrieval system for dbGaP. PhenDisco consists of two main components: (1) text processing tools that standardize phenotype variables and study metadata, and (2) information retrieval tools that support queries from users and return ranked results. In a preliminary comparison involving 18 search scenarios, PhenDisco showed promising performance for both unranked and ranked search comparisons with dbGaP's search engine Entrez. The system can be accessed at http://pfindr.net

PubMed Central

eScholarship - University of California

University of Melbourne Institutional Repository