77 research outputs found
Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline
From medical charts to national census, healthcare has traditionally operated
under a paper-based paradigm. However, the past decade has marked a long and
arduous transformation bringing healthcare into the digital age. Ranging from
electronic health records, to digitized imaging and laboratory reports, to
public health datasets, today, healthcare now generates an incredible amount of
digital information. Such a wealth of data presents an exciting opportunity for
integrated machine learning solutions to address problems across multiple
facets of healthcare practice and administration. Unfortunately, the ability to
derive accurate and informative insights requires more than the ability to
execute machine learning models. Rather, a deeper understanding of the data on
which the models are run is imperative for their success. While a significant
effort has been undertaken to develop models able to process the volume of data
obtained during the analysis of millions of digitalized patient records, it is
important to remember that volume represents only one aspect of the data. In
fact, drawing on data from an increasingly diverse set of sources, healthcare
data presents an incredibly complex set of attributes that must be accounted
for throughout the machine learning pipeline. This chapter focuses on
highlighting such challenges, and is broken down into three distinct
components, each representing a phase of the pipeline. We begin with attributes
of the data accounted for during preprocessing, then move to considerations
during model building, and end with challenges to the interpretation of model
output. For each component, we present a discussion around data as it relates
to the healthcare domain and offer insight into the challenges each may impose
on the efficiency of machine learning techniques.Comment: Healthcare Informatics, Machine Learning, Knowledge Discovery: 20
Pages, 1 Figur
Minute ventilation of cyclists, car and bus passengers: an experimental study
<p>Abstract</p> <p>Background</p> <p>Differences in minute ventilation between cyclists, pedestrians and other commuters influence inhaled doses of air pollution. This study estimates minute ventilation of cyclists, car and bus passengers, as part of a study on health effects of commuters' exposure to air pollutants.</p> <p>Methods</p> <p>Thirty-four participants performed a submaximal test on a bicycle ergometer, during which heart rate and minute ventilation were measured simultaneously at increasing cycling intensity. Individual regression equations were calculated between heart rate and the natural log of minute ventilation. Heart rates were recorded during 280 two hour trips by bicycle, bus and car and were calculated into minute ventilation levels using the individual regression coefficients.</p> <p>Results</p> <p>Minute ventilation during bicycle rides were on average 2.1 times higher than in the car (individual range from 1.3 to 5.3) and 2.0 times higher than in the bus (individual range from 1.3 to 5.1). The ratio of minute ventilation of cycling compared to travelling by bus or car was higher in women than in men. Substantial differences in regression equations were found between individuals. The use of individual regression equations instead of average regression equations resulted in substantially better predictions of individual minute ventilations.</p> <p>Conclusion</p> <p>The comparability of the gender-specific overall regression equations linking heart rate and minute ventilation with one previous American study, supports that for studies on the group level overall equations can be used. For estimating individual doses, the use of individual regression coefficients provides more precise data. Minute ventilation levels of cyclists are on average two times higher than of bus and car passengers, consistent with the ratio found in one small previous study of young adults. The study illustrates the importance of inclusion of minute ventilation data in comparing air pollution doses between different modes of transport.</p
The U.S. Environmental Protection Agency Particulate Matter Health Effects Research Centers Program: a midcourse report of status, progress, and plans.
In 1998 Congress mandated expanded U.S. Environmental Protection Agency (U.S. EPA) health effects research on ambient air particulate matter (PM) and a National Research Council (NRC) committee to provide research oversight. The U.S. EPA currently supports intramural and extramural PM research, including five academically based PM centers. The PM centers in their first 2.5 years have initiated research directed at critical issues identified by the NRC committee, including collaborative activities, and sponsored scientific workshops in key research areas. Through these activities, there is a better understanding of PM health effects and scientific uncertainties. Future PM centers research will focus on long-term effects associated with chronic PM exposures. This report provides a synopsis of accomplishments to date, short-term goals (during the next 2.5 years) and longer-term goals. It consists of six sections: biological mechanisms, acute effects, chronic effects, dosimetry, exposure assessment, and the specific attributes of a coordinated PM centers program
Intracellular Trafficking and Synaptic Function of APL-1 in Caenorhabditis elegans
Background: Alzheimer’s disease (AD) is a neurodegenerative disorder primarily characterized by the deposition of b-amyloid plaques in the brain. Plaques are composed of the amyloid-b peptide derived from cleavage of the amyloid precursor protein (APP). Mutations in APP lead to the development of Familial Alzheimer’s Disease (FAD), however, the normal function of this protein has proven elusive. The organism Caenorhabditis elegans is an attractive model as the amyloid precursor-like protein (APL-1) is the single ortholog of APP, and loss of apl-1 leads to a severe molting defect and early larval lethality. Methodology/Principal Findings: We report here that lethality and molting can be rescued by full length APL-1, C-terminal mutations as well as a C-terminal truncation, suggesting that the extracellular region of the protein is essential for viability. RNAi knock-down of apl-1 followed by drug testing on the acetylcholinesterase inhibitor aldicarb showed that loss of apl-1 leads to aldicarb hypersensitivity, indicating a defect in synaptic function. The aldicarb hypersensitivity can be rescued by full length APL-1 in a dose dependent fashion. At the cellular level, kinesins UNC-104/KIF-1A and UNC-116/kinesin-1 are positive regulators of APL-1 expression in the neurons. Knock-down of the small GTPase rab-5 also leads to a dramatic decrease in the amount of apl-1 expression in neurons, suggesting that trafficking from the plasma membrane to the early endosome is important for apl-1 function. Loss of function of a different small GTPase, UNC-108, on the contrary, leads t
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research
- …