479 research outputs found
Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline
From medical charts to national census, healthcare has traditionally operated
under a paper-based paradigm. However, the past decade has marked a long and
arduous transformation bringing healthcare into the digital age. Ranging from
electronic health records, to digitized imaging and laboratory reports, to
public health datasets, today, healthcare now generates an incredible amount of
digital information. Such a wealth of data presents an exciting opportunity for
integrated machine learning solutions to address problems across multiple
facets of healthcare practice and administration. Unfortunately, the ability to
derive accurate and informative insights requires more than the ability to
execute machine learning models. Rather, a deeper understanding of the data on
which the models are run is imperative for their success. While a significant
effort has been undertaken to develop models able to process the volume of data
obtained during the analysis of millions of digitalized patient records, it is
important to remember that volume represents only one aspect of the data. In
fact, drawing on data from an increasingly diverse set of sources, healthcare
data presents an incredibly complex set of attributes that must be accounted
for throughout the machine learning pipeline. This chapter focuses on
highlighting such challenges, and is broken down into three distinct
components, each representing a phase of the pipeline. We begin with attributes
of the data accounted for during preprocessing, then move to considerations
during model building, and end with challenges to the interpretation of model
output. For each component, we present a discussion around data as it relates
to the healthcare domain and offer insight into the challenges each may impose
on the efficiency of machine learning techniques.Comment: Healthcare Informatics, Machine Learning, Knowledge Discovery: 20
Pages, 1 Figur
Genomic-Bioinformatic Analysis of Transcripts Enriched in the Third-Stage Larva of the Parasitic Nematode Ascaris suum
Differential transcription in Ascaris suum was investigated using a genomic-bioinformatic approach. A cDNA archive enriched for molecules in the infective third-stage larva (L3) of A. suum was constructed by suppressive-subtractive hybridization (SSH), and a subset of cDNAs from 3075 clones subjected to microarray analysis using cDNA probes derived from RNA from different developmental stages of A. suum. The cDNAs (n = 498) shown by microarray analysis to be enriched in the L3 were sequenced and subjected to bioinformatic analyses using a semi-automated pipeline (ESTExplorer). Using gene ontology (GO), 235 of these molecules were assigned to ‘biological process’ (n = 68), ‘cellular component’ (n = 50), or ‘molecular function’ (n = 117). Of the 91 clusters assembled, 56 molecules (61.5%) had homologues/orthologues in the free-living nematodes Caenorhabditis elegans and C. briggsae and/or other organisms, whereas 35 (38.5%) had no significant similarity to any sequences available in current gene databases. Transcripts encoding protein kinases, protein phosphatases (and their precursors), and enolases were abundantly represented in the L3 of A. suum, as were molecules involved in cellular processes, such as ubiquitination and proteasome function, gene transcription, protein–protein interactions, and function. In silico analyses inferred the C. elegans orthologues/homologues (n = 50) to be involved in apoptosis and insulin signaling (2%), ATP synthesis (2%), carbon metabolism (6%), fatty acid biosynthesis (2%), gap junction (2%), glucose metabolism (6%), or porphyrin metabolism (2%), although 34 (68%) of them could not be mapped to a specific metabolic pathway. Small numbers of these 50 molecules were predicted to be secreted (10%), anchored (2%), and/or transmembrane (12%) proteins. Functionally, 17 (34%) of them were predicted to be associated with (non-wild-type) RNAi phenotypes in C. elegans, the majority being embryonic lethality (Emb) (13 types; 58.8%), larval arrest (Lva) (23.5%) and larval lethality (Lvl) (47%). A genetic interaction network was predicted for these 17 C. elegans orthologues, revealing highly significant interactions for nine molecules associated with embryonic and larval development (66.9%), information storage and processing (5.1%), cellular processing and signaling (15.2%), metabolism (6.1%), and unknown function (6.7%). The potential roles of these molecules in development are discussed in relation to the known roles of their homologues/orthologues in C. elegans and some other nematodes. The results of the present study provide a basis for future functional genomic studies to elucidate molecular aspects governing larval developmental processes in A. suum and/or the transition to parasitism
Different atmospheric moisture divergence responses to extreme and moderate El Niños
On seasonal and inter-annual time scales, vertically integrated moisture divergence provides a useful measure of the tropical atmospheric hydrological cycle. It reflects the combined dynamical and thermodynamical effects, and is not subject to the limitations that afflict observations of evaporation minus precipitation. An empirical orthogonal function (EOF) analysis of the tropical Pacific moisture divergence fields calculated from the ERA-Interim reanalysis reveals the dominant effects of the El Niño-Southern Oscillation (ENSO) on inter-annual time scales. Two EOFs are necessary to capture the ENSO signature, and regression relationships between their Principal Components and indices of equatorial Pacific sea surface temperature (SST) demonstrate that the transition from strong La Niña through to extreme El Niño events is not a linear one. The largest deviation from linearity is for the strongest El Niños, and we interpret that this arises at least partly because the EOF analysis cannot easily separate different patterns of responses that are not orthogonal to each other. To overcome the orthogonality constraints, a self-organizing map (SOM) analysis of the same moisture divergence fields was performed. The SOM analysis captures the range of responses to ENSO, including the distinction between the moderate and strong El Niños identified by the EOF analysis. The work demonstrates the potential for the application of SOM to large scale climatic analysis, by virtue of its easier interpretation, relaxation of orthogonality constraints and its versatility for serving as an alternative classification method. Both the EOF and SOM analyses suggest a classification of “moderate” and “extreme” El Niños by their differences in the magnitudes of the hydrological cycle responses, spatial patterns and evolutionary paths. Classification from the moisture divergence point of view shows consistency with results based on other physical variables such as SST
Performance of CMS muon reconstruction in pp collision events at sqrt(s) = 7 TeV
The performance of muon reconstruction, identification, and triggering in CMS
has been studied using 40 inverse picobarns of data collected in pp collisions
at sqrt(s) = 7 TeV at the LHC in 2010. A few benchmark sets of selection
criteria covering a wide range of physics analysis needs have been examined.
For all considered selections, the efficiency to reconstruct and identify a
muon with a transverse momentum pT larger than a few GeV is above 95% over the
whole region of pseudorapidity covered by the CMS muon system, abs(eta) < 2.4,
while the probability to misidentify a hadron as a muon is well below 1%. The
efficiency to trigger on single muons with pT above a few GeV is higher than
90% over the full eta range, and typically substantially better. The overall
momentum scale is measured to a precision of 0.2% with muons from Z decays. The
transverse momentum resolution varies from 1% to 6% depending on pseudorapidity
for muons with pT below 100 GeV and, using cosmic rays, it is shown to be
better than 10% in the central region up to pT = 1 TeV. Observed distributions
of all quantities are well reproduced by the Monte Carlo simulation.Comment: Replaced with published version. Added journal reference and DO
Skp is a multivalent chaperone of outer membrane proteins
The trimeric chaperone Skp sequesters outer-membrane proteins (OMPs) within a hydrophobic cage, thereby preventing their aggregation during transport across the periplasm in Gram-negative bacteria. Here, we studied the interaction between Escherichia coli Skp and five OMPs of varying size. Investigations of the kinetics of OMP folding revealed that higher Skp/OMP ratios are required to prevent the folding of 16-stranded OMPs compared with their 8-stranded counterparts. Ion mobility spectrometry–mass spectrometry (IMS–MS) data, computer modeling and molecular dynamics simulations provided evidence that 10- to 16-stranded OMPs are encapsulated within an expanded Skp substrate cage. For OMPs that cannot be fully accommodated in the expanded cavity, sequestration is achieved by binding of an additional Skp trimer. The results suggest a new mechanism for Skp chaperone activity involving the coordination of multiple copies of Skp in protecting a single substrate from aggregation
Performance of CMS muon reconstruction in pp collision events at sqrt(s) = 7 TeV
The performance of muon reconstruction, identification, and triggering in CMS
has been studied using 40 inverse picobarns of data collected in pp collisions
at sqrt(s) = 7 TeV at the LHC in 2010. A few benchmark sets of selection
criteria covering a wide range of physics analysis needs have been examined.
For all considered selections, the efficiency to reconstruct and identify a
muon with a transverse momentum pT larger than a few GeV is above 95% over the
whole region of pseudorapidity covered by the CMS muon system, abs(eta) < 2.4,
while the probability to misidentify a hadron as a muon is well below 1%. The
efficiency to trigger on single muons with pT above a few GeV is higher than
90% over the full eta range, and typically substantially better. The overall
momentum scale is measured to a precision of 0.2% with muons from Z decays. The
transverse momentum resolution varies from 1% to 6% depending on pseudorapidity
for muons with pT below 100 GeV and, using cosmic rays, it is shown to be
better than 10% in the central region up to pT = 1 TeV. Observed distributions
of all quantities are well reproduced by the Monte Carlo simulation.Comment: Replaced with published version. Added journal reference and DO
Do coder characteristics influence validity of ICD-10 hospital discharge data?
<p>Abstract</p> <p>Background</p> <p>Administrative data are widely used to study health systems and make important health policy decisions. Yet little is known about the influence of coder characteristics on administrative data validity in these studies. Our goal was to describe the relationship between several measures of validity in coded hospital discharge data and 1) coders' volume of coding (≥13,000 vs. <13,000 records), 2) coders' employment status (full- vs. part-time), and 3) hospital type.</p> <p>Methods</p> <p>This descriptive study examined 6 indicators of face validity in ICD-10 coded discharge records from 4 hospitals in Calgary, Canada between April 2002 and March 2007. Specifically, mean number of coded diagnoses, procedures, complications, Z-codes, and codes ending in 8 or 9 were compared by coding volume and employment status, as well as hospital type. The mean number of diagnoses was also compared across coder characteristics for 6 major conditions of varying complexity. Next, kappa statistics were computed to assess agreement between discharge data and linked chart data reabstracted by nursing chart reviewers. Kappas were compared across coder characteristics.</p> <p>Results</p> <p>422,618 discharge records were coded by 59 coders during the study period. The mean number of diagnoses per record decreased from 5.2 in 2002/2003 to 3.9 in 2006/2007, while the number of records coded annually increased from 69,613 to 102,842. Coders at the tertiary hospital coded the most diagnoses (5.0 compared with 3.9 and 3.8 at other sites). There was no variation by coder or site characteristics for any other face validity indicator. The mean number of diagnoses increased from 1.5 to 7.9 with increasing complexity of the major diagnosis, but did not vary with coder characteristics. Agreement (kappa) between coded data and chart review did not show any consistent pattern with respect to coder characteristics.</p> <p>Conclusions</p> <p>This large study suggests that coder characteristics do not influence the validity of hospital discharge data. Other jurisdictions might benefit from implementing similar employment programs to ours, e.g.: a requirement for a 2-year college training program, a single management structure across sites, and rotation of coders between sites. Limitations include few coder characteristics available for study due to privacy concerns.</p
A recombinant Fasciola gigantica 14-3-3 epsilon protein (rFg14-3-3e) modulates various functions of goat peripheral blood mononuclear cells
Background
The molecular structure of Fasciola gigantica 14-3-3 protein has been characterized. However, the involvement of this protein in parasite pathogenesis remains elusive and its effect on the functions of innate immune cells is unknown. We report on the cloning and expression of a recombinant F. gigantica 14-3-3 epsilon protein (rFg14-3-3e), and testing its effects on specific functions of goat peripheral blood mononuclear cells (PBMCs).
Methods
rFg14-3-3e protein was expressed in Pichia pastoris. Western blot and immunofluorescence assay (IFA) were used to examine the reactivity of rFg14-3-3e protein to anti-F. gigantica and anti-rFg14-3-3e antibodies, respectively. Various assays were used to investigate the stimulatory effects of the purified rFg14-3-3e protein on specific functions of goat PBMCs, including cytokine secretion, proliferation, migration, nitric oxide (NO) production, phagocytosis, and apoptotic capabilities. Potential protein interactors of rFg14-3-3e were identified by querying the databases Intact, String, BioPlex and BioGrid. A Total Energy analysis of each of the identified interaction was performed. Gene Ontology (GO) enrichment analysis was conducted using Funcassociate 3.0.
Results
Sequence analysis revealed that rFg14-3-3e protein had 100% identity to 14-3-3 protein from Fasciola hepatica. Western blot analysis showed that rFg14-3-3e protein is recognized by sera from goats experimentally infected with F. gigantica and immunofluorescence staining using rat anti-rFg14-3-3e antibodies demonstrated the specific binding of rFg14-3-3e protein to the surface of goat PBMCs. rFg14-3-3e protein stimulated goat PBMCs to produce interleukin-10 (IL-10) and transforming growth factor beta (TGF-β), corresponding with low levels of IL-4 and interferon gamma (IFN-γ). Also, this recombinant protein promoted the release of NO and cell apoptosis, and inhibited the proliferation and migration of goat PBMCs and suppressed monocyte phagocytosis. Homology modelling revealed 65% identity between rFg14-3-3e and human 14-3-3 protein YWHAE. GO enrichment analysis of the interacting proteins identified terms related to apoptosis, protein binding, locomotion, hippo signalling and leukocyte and lymphocyte differentiation, supporting the experimental findings.
Conclusions
Our data suggest that rFg14-3-3e protein can influence various cellular and immunological functions of goat PBMCs in vitro and may be involved in mediating F. gigantica pathogenesis. Because of its involvement in F. gigantica recognition by innate immune cells, rFg14-3-3e protein may have applications for development of diagnostics and therapeutic interventions
Flavins and Flavoproteins in the Neuroimmune Landscape of Stress Sensitization and Major Depressive Disorder
Matt Scott Schrier,1 Maria Igorevna Smirnova,2– 4 Daniel Paul Nemeth,1 Richard Carlton Deth,5 Ning Quan1,3 1Department of Biomedical Science, Charles E. Schmidt College of Medicine, Florida Atlantic University, Jupiter, FL, USA; 2The International Max Planck Research School (IMPRS) for Synapses and Circuits, Jupiter, FL, USA; 3Stiles-Nicholson Brain Institute, Florida Atlantic University, Jupiter, FL, USA; 4Department of Biological Sciences, Charles E. Schmidt College of Science, Florida Atlantic University, Jupiter, FL, USA; 5Department of Pharmaceutical Sciences, Barry and Judy Silverman College of Pharmacy, Nova Southeastern University, Ft. Lauderdale, FL, USACorrespondence: Matt Scott Schrier, Florida Atlantic University, Building: MC-17, Room: 229E, 5353 Parkside Drive, Jupiter, FL, 33458, USA, Email [email protected]: Major Depressive Disorder (MDD) is a common and severe neuropsychiatric condition resulting in irregular alterations in affect, mood, and cognition. Besides the well-studied neurotransmission-related etiologies of MDD, several biological systems and phenomena, such as the hypothalamic-pituitary-adrenal (HPA) axis, reactive oxygen species (ROS) production, and cytokine signaling, have been implicated as being altered and contributing to depressive symptoms. However, the manner in which these factors interact with each other to induce their effects on MDD development has been less clear, but is beginning to be understood. Flavins are potent biomolecules that regulate many redox activities, including ROS generation and energy production. Studies have found that circulating flavin levels are modulated during stress and MDD. Flavins are also known for their importance in immune responses. This review offers a unique perspective that considers the redox-active cofactors, flavin mononucleotide (FMN) and flavin adenine dinucleotide (FAD), as vital substrates for linking MDD-related maladaptive processes together, by permitting stress-induced enhancement of microglial interleukin-1 beta (IL-1β) signaling. Keywords: cofactor, cytokines, IL-1β, microglia, neuroinflammation, redo
Study of hadronic event-shape variables in multijet final states in pp collisions at √s=7 TeV
Peer reviewe
- …
