77 research outputs found

    Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline

    Full text link
    From medical charts to national census, healthcare has traditionally operated under a paper-based paradigm. However, the past decade has marked a long and arduous transformation bringing healthcare into the digital age. Ranging from electronic health records, to digitized imaging and laboratory reports, to public health datasets, today, healthcare now generates an incredible amount of digital information. Such a wealth of data presents an exciting opportunity for integrated machine learning solutions to address problems across multiple facets of healthcare practice and administration. Unfortunately, the ability to derive accurate and informative insights requires more than the ability to execute machine learning models. Rather, a deeper understanding of the data on which the models are run is imperative for their success. While a significant effort has been undertaken to develop models able to process the volume of data obtained during the analysis of millions of digitalized patient records, it is important to remember that volume represents only one aspect of the data. In fact, drawing on data from an increasingly diverse set of sources, healthcare data presents an incredibly complex set of attributes that must be accounted for throughout the machine learning pipeline. This chapter focuses on highlighting such challenges, and is broken down into three distinct components, each representing a phase of the pipeline. We begin with attributes of the data accounted for during preprocessing, then move to considerations during model building, and end with challenges to the interpretation of model output. For each component, we present a discussion around data as it relates to the healthcare domain and offer insight into the challenges each may impose on the efficiency of machine learning techniques.Comment: Healthcare Informatics, Machine Learning, Knowledge Discovery: 20 Pages, 1 Figur

    Minute ventilation of cyclists, car and bus passengers: an experimental study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Differences in minute ventilation between cyclists, pedestrians and other commuters influence inhaled doses of air pollution. This study estimates minute ventilation of cyclists, car and bus passengers, as part of a study on health effects of commuters' exposure to air pollutants.</p> <p>Methods</p> <p>Thirty-four participants performed a submaximal test on a bicycle ergometer, during which heart rate and minute ventilation were measured simultaneously at increasing cycling intensity. Individual regression equations were calculated between heart rate and the natural log of minute ventilation. Heart rates were recorded during 280 two hour trips by bicycle, bus and car and were calculated into minute ventilation levels using the individual regression coefficients.</p> <p>Results</p> <p>Minute ventilation during bicycle rides were on average 2.1 times higher than in the car (individual range from 1.3 to 5.3) and 2.0 times higher than in the bus (individual range from 1.3 to 5.1). The ratio of minute ventilation of cycling compared to travelling by bus or car was higher in women than in men. Substantial differences in regression equations were found between individuals. The use of individual regression equations instead of average regression equations resulted in substantially better predictions of individual minute ventilations.</p> <p>Conclusion</p> <p>The comparability of the gender-specific overall regression equations linking heart rate and minute ventilation with one previous American study, supports that for studies on the group level overall equations can be used. For estimating individual doses, the use of individual regression coefficients provides more precise data. Minute ventilation levels of cyclists are on average two times higher than of bus and car passengers, consistent with the ratio found in one small previous study of young adults. The study illustrates the importance of inclusion of minute ventilation data in comparing air pollution doses between different modes of transport.</p

    The U.S. Environmental Protection Agency Particulate Matter Health Effects Research Centers Program: a midcourse report of status, progress, and plans.

    Get PDF
    In 1998 Congress mandated expanded U.S. Environmental Protection Agency (U.S. EPA) health effects research on ambient air particulate matter (PM) and a National Research Council (NRC) committee to provide research oversight. The U.S. EPA currently supports intramural and extramural PM research, including five academically based PM centers. The PM centers in their first 2.5 years have initiated research directed at critical issues identified by the NRC committee, including collaborative activities, and sponsored scientific workshops in key research areas. Through these activities, there is a better understanding of PM health effects and scientific uncertainties. Future PM centers research will focus on long-term effects associated with chronic PM exposures. This report provides a synopsis of accomplishments to date, short-term goals (during the next 2.5 years) and longer-term goals. It consists of six sections: biological mechanisms, acute effects, chronic effects, dosimetry, exposure assessment, and the specific attributes of a coordinated PM centers program

    Intracellular Trafficking and Synaptic Function of APL-1 in Caenorhabditis elegans

    Get PDF
    Background: Alzheimer’s disease (AD) is a neurodegenerative disorder primarily characterized by the deposition of b-amyloid plaques in the brain. Plaques are composed of the amyloid-b peptide derived from cleavage of the amyloid precursor protein (APP). Mutations in APP lead to the development of Familial Alzheimer’s Disease (FAD), however, the normal function of this protein has proven elusive. The organism Caenorhabditis elegans is an attractive model as the amyloid precursor-like protein (APL-1) is the single ortholog of APP, and loss of apl-1 leads to a severe molting defect and early larval lethality. Methodology/Principal Findings: We report here that lethality and molting can be rescued by full length APL-1, C-terminal mutations as well as a C-terminal truncation, suggesting that the extracellular region of the protein is essential for viability. RNAi knock-down of apl-1 followed by drug testing on the acetylcholinesterase inhibitor aldicarb showed that loss of apl-1 leads to aldicarb hypersensitivity, indicating a defect in synaptic function. The aldicarb hypersensitivity can be rescued by full length APL-1 in a dose dependent fashion. At the cellular level, kinesins UNC-104/KIF-1A and UNC-116/kinesin-1 are positive regulators of APL-1 expression in the neurons. Knock-down of the small GTPase rab-5 also leads to a dramatic decrease in the amount of apl-1 expression in neurons, suggesting that trafficking from the plasma membrane to the early endosome is important for apl-1 function. Loss of function of a different small GTPase, UNC-108, on the contrary, leads t

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research
    corecore