36 research outputs found
A visual survey of the inshore fish communities of Gran Canaria (Canary Islands).
An in situ visual survey technique (5 minutes and 100 m2 area) was used to assess the inshore fishes off Gran Canaria. In 1996, 211 visual surveys were conducted at 7 localities. Locations differed significantly among each other with regards to the number of species per survey (ANOVA: p < 0.01). The five most abundant species were Chromis limbatus, Boops boops, Pomadasys incisus, Abudefduf luridus, and Thalassoma pavo with respective mean abundances of 65.6, 37.4, 16.7, 8.7, and 4.5 per 100 m2. Detrended Correspondence Analysis, a multivariate ordination technique showed that the major determinant of community structure is substrate type. The majority of the surveyed species had low axis 1 ordination scores indicating a strong association with a hard substrate. The step-wise linear
regression models explained 45.3 % and 1 1.4% of the variation in the first and second axis survey ordination scores, respectively
Serverless OpenHealth at data commons scale—traversing the 20 million patient records of New York’s SPARCS dataset in real-time
In a previous report, we explored the serverless OpenHealth approach to the Web as a Global Compute space. That approach relies on the modern browser full stack, and, in particular, its configuration for application assembly by code injection. The opportunity, and need, to expand this approach has since increased markedly, reflecting a wider adoption of Open Data policies by Public Health Agencies. Here, we describe how the serverless scaling challenge can be achieved by the isomorphic mapping between the remote data layer API and a local (client-side, in-browser) operator. This solution is validated with an accompanying interactive web application (bit.ly/loadsparcs) capable of real-time traversal of New York’s 20 million patient records of the Statewide Planning and Research Cooperative System (SPARCS), and is compared with alternative approaches. The results obtained strengthen the argument that the FAIR reproducibility needed for Population Science applications in the age of P4 Medicine is particularly well served by the Web platform
Linked open drug data for pharmaceutical research and development
There is an abundance of information about drugs available on the Web. Data sources range from medicinal chemistry results, over the impact of drugs on gene expression, to the outcomes of drugs in clinical trials. These data are typically not connected together, which reduces the ease with which insights can be gained. Linking Open Drug Data (LODD) is a task force within the World Wide Web Consortium's (W3C) Health Care and Life Sciences Interest Group (HCLS IG). LODD has surveyed publicly available data about drugs, created Linked Data representations of the data sets, and identified interesting scientific and business questions that can be answered once the data sets are connected. The task force provides recommendations for the best practices of exposing data in a Linked Data representation. In this paper, we present past and ongoing work of LODD and discuss the growing importance of Linked Data as a foundation for pharmaceutical R&D data sharing
Disease phenotyping using deep learning: A diabetes case study
Characterization of a patient clinical phenotype is central to biomedical
informatics. ICD codes, assigned to inpatient encounters by coders, is
important for population health and cohort discovery when clinical information
is limited. While ICD codes are assigned to patients by professionals trained
and certified in coding there is substantial variability in coding. We present
a methodology that uses deep learning methods to model coder decision making
and that predicts ICD codes. Our approach predicts codes based on demographics,
lab results, and medications, as well as codes from previous encounters. We are
able to predict existing codes with high accuracy for all three of the test
cases we investigated: diabetes, acute renal failure, and chronic kidney
disease. We employed a panel of clinicians, in a blinded manner, to assess
ground truth and compared the predictions of coders, model and clinicians. When
disparities between the model prediction and coder assigned codes were
reviewed, our model outperformed coder assigned ICD codes.Comment: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018
arXiv:cs/010120
Recommended from our members
Experimental uncertainty estimation and statistics for data having interval uncertainty.
This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties
Recommended from our members
Dependence in probabilistic modeling, Dempster-Shafer theory, and probability bounds analysis.
This report summarizes methods to incorporate information (or lack of information) about inter-variable dependence into risk assessments that use Dempster-Shafer theory or probability bounds analysis to address epistemic and aleatory uncertainty. The report reviews techniques for simulating correlated variates for a given correlation measure and dependence model, computation of bounds on distribution functions under a specified dependence model, formulation of parametric and empirical dependence models, and bounding approaches that can be used when information about the intervariable dependence is incomplete. The report also reviews several of the most pervasive and dangerous myths among risk analysts about dependence in probabilistic models
DocGraph subset Jamestown, NY core provider in .gephi file
<p>See fileset for details</p
Linking clinicians to biomedical researchers: An application of the ISF ontology at Stony Brook Medicine
<p>Experience computation for clinical facutly based on administrative data using the ISF ontology.</p>
<p>Â </p
DocGraph subset Bronx, NY core provider in .graphml file
<p>See fileset for more details</p