62 research outputs found
Leveraging Logical Definitions and Lexical Features to Detect Missing Is-a Relations in Biomedical Terminologies
Biomedical terminologies play a vital role in managing biomedical data. Missing IS-A relations in a biomedical terminology could be detrimental to its downstream usages. In this paper, we investigate an approach combining logical definitions and lexical features to discover missing IS-A relations in two biomedical terminologies: SNOMED CT and the National Cancer Institute (NCI) thesaurus. The method is applied to unrelated concept-pairs within non-lattice subgraphs: graph fragments within a terminology likely to contain various inconsistencies. Our approach first compares whether the logical definition of a concept is more general than that of the other concept. Then, we check whether the lexical features of the concept are contained in those of the other concept. If both constraints are satisfied, we suggest a potentially missing IS-A relation between the two concepts. The method identified 982 potential missing IS-A relations for SNOMED CT and 100 for NCI thesaurus. In order to assess the efficacy of our approach, a random sample of results belonging to the Clinical Findings and Procedure subhierarchies of SNOMED CT and results belonging to the Drug, Food, Chemical or Biomedical Material subhierarchy of the NCI thesaurus were evaluated by domain experts. The evaluation results revealed that 118 out of 150 suggestions are valid for SNOMED CT and 17 out of 20 are valid for NCI thesaurus
Leveraging Logical Definitions and Lexical Features to Detect Missing Is-a Relations in Biomedical Terminologies
Biomedical terminologies play a vital role in managing biomedical data. Missing IS-A relations in a biomedical terminology could be detrimental to its downstream usages. In this paper, we investigate an approach combining logical definitions and lexical features to discover missing IS-A relations in two biomedical terminologies: SNOMED CT and the National Cancer Institute (NCI) thesaurus. The method is applied to unrelated concept-pairs within non-lattice subgraphs: graph fragments within a terminology likely to contain various inconsistencies. Our approach first compares whether the logical definition of a concept is more general than that of the other concept. Then, we check whether the lexical features of the concept are contained in those of the other concept. If both constraints are satisfied, we suggest a potentially missing IS-A relation between the two concepts. The method identified 982 potential missing IS-A relations for SNOMED CT and 100 for NCI thesaurus. In order to assess the efficacy of our approach, a random sample of results belonging to the Clinical Findings and Procedure subhierarchies of SNOMED CT and results belonging to the Drug, Food, Chemical or Biomedical Material subhierarchy of the NCI thesaurus were evaluated by domain experts. The evaluation results revealed that 118 out of 150 suggestions are valid for SNOMED CT and 17 out of 20 are valid for NCI thesaurus
A Hybrid Unsupervised and Supervised Learning Approach for Postictal Generalized EEG Suppression Detection
Sudden unexpected death of epilepsy (SUDEP) is a catastrophic and fatal complication of epilepsy and is the primary cause of mortality in those who have uncontrolled seizures. While several multifactorial processes have been implicated including cardiac, respiratory, autonomic dysfunction leading to arrhythmia, hypoxia, and cessation of cerebral and brainstem function, the mechanisms underlying SUDEP are not completely understood. Postictal generalized electroencephalogram (EEG) suppression (PGES) is a potential risk marker for SUDEP, as studies have shown that prolonged PGES was significantly associated with a higher risk of SUDEP. Automated PGES detection techniques have been developed to efficiently obtain PGES durations for SUDEP risk assessment. However, real-world data recorded in epilepsy monitoring units (EMUs) may contain high-amplitude signals due to physiological artifacts, such as breathing, muscle, and movement artifacts, making it difficult to determine the end of PGES. In this paper, we present a hybrid approach that combines the benefits of unsupervised and supervised learning for PGES detection using multi-channel EEG recordings. A K-means clustering model is leveraged to group EEG recordings with similar artifact features. We introduce a new learning strategy for training a set of random forest (RF) models based on clustering results to improve PGES detection performance. Our approach achieved a 5-second tolerance-based detection accuracy of 64.92%, a 10-second tolerance-based detection accuracy of 79.85%, and an average predicted time distance of 8.26 seconds with 286 EEG recordings using leave-one-out (LOO) cross-validation. The results demonstrated that our hybrid approach provided better performance compared to other existing approaches
A Multimodal Clinical Data Resource for Personalized Risk Assessment of Sudden Unexpected Death in Epilepsy
Epilepsy affects ~2–3 million individuals in the United States, a third of whom have uncontrolled seizures. Sudden unexpected death in epilepsy (SUDEP) is a catastrophic and fatal complication of poorly controlled epilepsy and is the primary cause of mortality in such patients. Despite its huge public health impact, with a ~1/1,000 incidence rate in persons with epilepsy, it is an uncommon enough phenomenon to require multi-center efforts for well-powered studies. We developed the Multimodal SUDEP Data Resource (MSDR), a comprehensive system for sharing multimodal epilepsy data in the NIH funded Center for SUDEP Research. The MSDR aims at accelerating research to address critical questions about personalized risk assessment of SUDEP. We used a metadata-guided approach, with a set of common epilepsy-specific terms enforcing uniform semantic interpretation of data elements across three main components: (1) multi-site annotated datasets; (2) user interfaces for capturing, managing, and accessing data; and (3) computational approaches for the analysis of multimodal clinical data. We incorporated the process for managing dataset-specific data use agreements, evidence of Institutional Review Board review, and the corresponding access control in the MSDR web portal. The metadata-guided approach facilitates structural and semantic interoperability, ultimately leading to enhanced data reusability and scientific rigor. MSDR prospectively integrated and curated epilepsy patient data from seven institutions, and it currently contains data on 2,739 subjects and 10,685 multimodal clinical data files with different data formats. In total, 55 users registered in the current MSDR data repository, and 6 projects have been funded to apply MSDR in epilepsy research, including three R01 projects and three R21 projects
NHash: Randomized N-Gram Hashing for Distributed Generation of Validatable Unique Study Identifiers in Multicenter Research
BACKGROUND: A unique study identifier serves as a key for linking research data about a study subject without revealing protected health information in the identifier. While sufficient for single-site and limited-scale studies, the use of common unique study identifiers has several drawbacks for large multicenter studies, where thousands of research participants may be recruited from multiple sites. An important property of study identifiers is error tolerance (or validatable), in that inadvertent editing mistakes during their transmission and use will most likely result in invalid study identifiers.
OBJECTIVE: This paper introduces a novel method called Randomized N-gram Hashing (NHash), for generating unique study identifiers in a distributed and validatable fashion, in multicenter research. NHash has a unique set of properties: (1) it is a pseudonym serving the purpose of linking research data about a study participant for research purposes; (2) it can be generated automatically in a completely distributed fashion with virtually no risk for identifier collision; (3) it incorporates a set of cryptographic hash functions based on N-grams, with a combination of additional encryption techniques such as a shift cipher; (d) it is validatable (error tolerant) in the sense that inadvertent edit errors will mostly result in invalid identifiers.
METHODS: NHash consists of 2 phases. First, an intermediate string using randomized N-gram hashing is generated. This string consists of a collection of N-gram hashes f1, f2, ..., fk. The input for each function fi has 3 components: a random number r, an integer n, and input data m. The result, fi(r, n, m), is an n-gram of m with a starting position s, which is computed as (r mod |m|), where |m| represents the length of m. The output for Step 1 is the concatenation of the sequence f1(r1, n1, m1), f2(r2, n2, m2), ..., fk(rk, nk, mk). In the second phase, the intermediate string generated in Phase 1 is encrypted using techniques such as shift cipher. The result of the encryption, concatenated with the random number r, is the final NHash study identifier.
RESULTS: We performed experiments using a large synthesized dataset comparing NHash with random strings, and demonstrated neglegible probability for collision. We implemented NHash for the Center for SUDEP Research (CSR), a National Institute for Neurological Disorders and Stroke-funded Center Without Walls for Collaborative Research in the Epilepsies. This multicenter collaboration involves 14 institutions across the United States and Europe, bringing together extensive and diverse expertise to understand sudden unexpected death in epilepsy patients (SUDEP).
CONCLUSIONS: The CSR Data Repository has successfully used NHash to link deidentified multimodal clinical data collected in participating CSR institutions, meeting all desired objectives of NHash
Post-ictal Modulation of Baroreflex Sensitivity in Patients With Intractable Epilepsy
Objective: Seizure-related autonomic dysregulation occurs in epilepsy patients and may contribute to Sudden Unexpected Death in Epilepsy (SUDEP). We tested how different types of seizures affect baroreflex sensitivity (BRS) and heart rate variability (HRV). We hypothesized that BRS and HRV would be reduced after bilateral convulsive seizures (BCS).Methods: We recorded blood pressure (BP), electrocardiogram (ECG) and oxygen saturation continuously in patients (n = 18) with intractable epilepsy undergoing video-EEG monitoring. A total of 23 seizures, either focal seizures (FS, n = 14) or BCS (n = 9), were analyzed from these patients. We used 5 different HRV measurements in both the time and frequency domains to study HRV in pre- and post-ictal states. We used the average frequency domain gain, computed as the average of the magnitude ratio between the systolic BP (BPsys) and the RR-interval time series, in the low-frequency (LF) band as frequency domain index of BRS in addition to the instantaneous slope between systolic BP and RR-interval satisfying spontaneous BRS criteria as a time domain index of BRS.Results: Overall, the post-ictal modulation of HRV varied across the subjects but not specifically by the type of seizures. Comparing pre- to post-ictal epochs, the LF power of BRS decreased in 8 of 9 seizures for patients with BCS; whereas following 12 of 14 FS, BRS increased. Similarly, spontaneous BRS decreased following 7 of 9 BCS. The presence or absence of oxygen desaturation was not consistent with the changes in BRS following seizures, and the HRV does not appear to be correlated with the BRS changes. These data suggest that a transient decrease in BRS and temporary loss of cardiovascular homeostatic control can follow BCS but is unlikely following FS.Significance: These findings indicate significant post-ictal autonomic dysregulation in patients with epilepsy following BCS. Further, reduced BRS following BCS, if confirmed in future studies on SUDEP cases, may indicate one quantifiable risk marker of SUDEP
Ontology-Based Feature Engineering in Machine Learning Workflows for Heterogeneous Epilepsy Patient Records
Biomedical ontologies are widely used to harmonize heterogeneous data and integrate large volumes of clinical data from multiple sources. This study analyzed the utility of ontologies beyond their traditional roles, that is, in addressing a challenging and currently underserved field of feature engineering in machine learning workflows. Machine learning workflows are being increasingly used to analyze medical records with heterogeneous phenotypic, genotypic, and related medical terms to improve patient care. We performed a retrospective study using neuropathology reports from the German Neuropathology Reference Center for Epilepsy Surgery at Erlangen, Germany. This cohort included 312 patients who underwent epilepsy surgery and were labeled with one or more diagnoses, including dual pathology, hippocampal sclerosis, malformation of cortical dysplasia, tumor, encephalitis, and gliosis. We modeled the diagnosis terms together with their microscopy, immunohistochemistry, anatomy, etiologies, and imaging findings using the description logic-based Web Ontology Language (OWL) in the Epilepsy and Seizure Ontology (EpSO). Three tree-based machine learning models were used to classify the neuropathology reports into one or more diagnosis classes with and without ontology-based feature engineering. We used five-fold cross validation to avoid overfitting with a fixed number of repetitions while leaving out one subset of data for testing, and we used recall, balanced accuracy, and hamming loss as performance metrics for the multi-label classification task. The epilepsy ontology-based feature engineering approach improved the performance of all the three learning models with an improvement of 35.7%, 54.5%, and 33.3% in logistics regression, random forest, and gradient tree boosting models respectively. The run time performance of all three models improved significantly with ontology-based feature engineering with gradient tree boosting model showing a 93.8% reduction in the time required for training and testing of the model. Although, all three models showed an overall improved performance across the three-performance metrics using ontology-based feature engineering, the rate of improvement was not consistent across all input features. To analyze this variation in performance, we computed feature importance scores and found that microscopy had the highest importance score across the three models, followed by imaging, immunohistochemistry, and anatomy in a decreasing order of importance scores. This study showed that ontologies have an important role in feature engineering to make heterogeneous clinical data accessible to machine learning models and also improve the performance of machine learning models in multilabel multiclass classification tasks
Postictal serotonin levels are associated with peri-ictal apnea.
ObjectiveTo determine the relationship between serum serotonin (5-HT) levels, ictal central apnea (ICA), and postconvulsive central apnea (PCCA) in epileptic seizures.MethodsWe prospectively evaluated video EEG, plethysmography, capillary oxygen saturation (SpO2), and ECG for 49 patients (49 seizures) enrolled in a multicenter study of sudden unexpected death in epilepsy (SUDEP). Postictal and interictal venous blood samples were collected after a clinical seizure for measurement of serum 5-HT levels. Seizures were classified according to the International League Against Epilepsy 2017 seizure classification. We analyzed seizures with and without ICA (n = 49) and generalized convulsive seizures (GCS) with and without PCCA (n = 27).ResultsPostictal serum 5-HT levels were increased over interictal levels for seizures without ICA (p = 0.01), compared to seizures with ICA (p = 0.21). In patients with GCS without PCCA, serum 5-HT levels were increased postictally compared to interictal levels (p < 0.001), but not in patients with seizures with PCCA (p = 0.22). Postictal minus interictal 5-HT levels also differed between the 2 groups with and without PCCA (p = 0.03). Increased heart rate was accompanied by increased serum 5-HT levels (postictal minus interictal) after seizures without PCCA (p = 0.03) compared to those with PCCA (p = 0.42).ConclusionsThe data suggest that significant seizure-related increases in serum 5-HT levels are associated with a lower incidence of seizure-related breathing dysfunction, and may reflect physiologic changes that confer a protective effect against deleterious phenomena leading to SUDEP. These results need to be confirmed with a larger sample size study
Age-specific periictal electroclinical features of generalized tonic-clonic seizures and potential risk of sudden unexpected death in epilepsy (SUDEP)
Generalized tonic–clonic seizure (GTCS) is the commonest seizure type associated with sudden unexpected
death in epilepsy (SUDEP). This study examined the semiological and electroencephalographic differences
(EEG) in the GTCSs of adults as compared with those of children. The rationale lies on epidemiological observations
that have noted a tenfold higher incidence of SUDEP in adults.Weanalyzed the video-EEG data of 105 GTCS
events in 61 consecutive patients (12 children, 23 seizure events and 49 adults, 82 seizure events) recruited from
the Epilepsy Monitoring Unit. Semiological, EEG, and 3-channel EKG features were studied. Periictal seizure
phase durations were analyzed including tonic, clonic, total seizure, postictal EEG suppression (PGES), and
recovery phases. Heart rate variability (HRV)measures includingRMSSD (root mean square successive difference
of RR intervals), SDNN (standard deviation of NN intervals), and SDSD (standard deviation of differences) were
analyzed (including low frequency/high frequency power ratios) during preictal baseline and ictal and postictal
phases. Generalized estimating equations (GEEs)were used to find associations between electroclinical features.
Separate subgroup analyses were carried out on adult and pediatric age groups as well as medication groups
(no antiepileptic medication cessation versus unchanged or reduced medication) during admission.Major differences
were seen in adult and pediatric seizures with total seizure duration, tonic phase, PGES, and recovery
phases being significantly shorter in children (p b 0.01). Generalized estimating equation analysis, using tonic
phase duration as the dependent variable, found age to correlate significantly (p b 0.001), and this remained
significant during subgroup analysis (adults and children) such that each 0.12-second increase in tonic phase
duration correlated with a 1-second increase in PGES duration. Postictal EEG suppression durations were on
average 28 s shorter in children. With cessation of medication, total seizure duration was significantly increased
by a mean value of 8 s in children and 11 s in adults (p b 0.05). Tonic phase duration also significantly increased
with medication cessation, and although PGES durations increased, this was not significant. Root mean square
successive difference was negatively correlated with PGES duration (longer PGES durations were associated
with decreased vagally mediated heart rate variability; p b 0.05) but not with tonic phase duration. This study
clearly points out identifiable electroclinical differences between adult and pediatric GTCSs that may be relevant
in explaining lower SUDEP risk in children. The findings suggest that some prolonged seizure phases and
prolonged PGES duration may be electroclinical markers of SUDEP risk and merit further study
Detection of human papillomavirus in laryngeal squamous cell carcinoma: systematic review and meta-analysis
Background: Recent studies have reported a human papillomavirus (HPV) prevalence of 20% to 30% in laryngeal squamous cell carcinoma (LSCC), although clinical data on HPV involvement remain largely inconsistent, ascribed by some to differences in HPV detection methods or in geographic origin of the studies.
Objective
To perform a systematic review and formal meta-analysis of the literature reporting on HPV detection in LSCC.
Methods
Literature was searched from January 1964 until March 2015. The effect size was calculated as event rates (95% confidence interval [CI]), with homogeneity testing using Cochran's Q and I2 statistics. Meta-regression was used to test the impact of study-level covariates (HPV detection method, geographic origin) on effect size. Potential publication bias was estimated using funnel plot symmetry.
Results
One hundred seventy nine studies were eligible, comprising a sample size of 7,347 LSCCs from different geographic regions. Altogether, 1,830 (25%) cases tested HPV-positive considering all methods, with effect size of 0.269 (95% CI: 0.242 to 0.297; random-effects model). In meta-analysis stratified by the 1) HPV detection technique and 2) geographic study origin, the between-study heterogeneity was significant only for geographic origin (P = .0001). In meta-regression, the HPV detection method (P = .876) or geographic origin (P = .234) were not significant study-level covariates. Some evidence for publication bias was found only for studies from North America and those using non–polymerase chain reaction methods, with a marginal effect on adjusted point estimates for both.
Conclusions
Variability in HPV detection rates in LSCC is explained by geographic origin of study but not by HPV detection method. However, they were not significant study-level covariates in formal meta-regression
- …