17 research outputs found

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Guest Editorial IEEE BIBM 2017 Special Issue

    No full text

    Using Attribute Value Lattice to Find Closed Frequent Itemsets

    No full text
    Finding all closed frequent itemsets is a key step of association rule mining since the non-redundant association rule can be inferred from all the closed frequent itemsets. In this paper we present a new method for finding closed frequent itemsets based on attribute value lattice. In the new method, we argue that vertical data representation and attribute value lattice can find all closed frequent itemsets efficiently, thus greatly improve the efficiency of association rule mining algorithm. We discuss how these techniques and methods are applied to find closed frequent itemsets. In our method, the data are represented vertically; each frequent attribute value is associated with its granule, which is represented as a hybrid bitmap. Based on the partial order defined between the attribute values among the databases, an attribute value lattice is constructed, which is much smaller compared with the original databases. Instead of searching all the items in the databases, which is adopted by almost all the association rule algorithms to find frequent itemsets, our method only searches the attribute-value lattice. A bottom-up breadth-first approach is employed to search the attribute value lattice to find the closed frequent itemsets

    Relationship between glycated hemoglobin levels and three-month outcomes in acute ischemic stroke patients with or without diabetes: a prospective Korean cohort study

    No full text
    Abstract Objective In patients experiencing acute ischemic stroke, there is ongoing debate surrounding the connection between chronic hyperglycemic status and their initial clinical outcomes. Our objective was to examine the connection between glycated hemoglobin (HbA1c) levels and adverse clinical outcomes at both 3-months adverse clinical outcomes in individuals with acute ischemic stroke (AIS) with and without diabetes. Methods The present prospective cohort study involved 896 AIS patients without diabetes and 628 with diabetes treated at a South Korean hospital from January 2010 to December 2016. The target independent variable is HbA1c. The outcome variable is a modified Rankin scale score ≥ 3. A binary logistic regression model was applied to assess the connection between HbA1c levels and 3-month poor clinical outcomes in AIS patients with and without diabetes. Additionally, a generalized additive model and smoothed curve fitting were utilized to explore potential nonlinear associations between HbA1c levels and 3-month adverse clinical outcomes in AIS patients with and without diabetes. Results The binary logistic regression model could not identify any statistically significant connection between HbA1c and 3-month adverse clinical outcomes in AIS patients, both those with and without diabetes, after correcting for various factors. However, a nonlinear relationship emerged between HbA1c and 3-month adverse clinical outcomes in AIS patients with diabetes. The inflection point for HbA1c was determined to be 6.1%. For HbA1c values ≤ 6.1%, an inverse association was observed between HbA1c and 3-month adverse clinical outcomes in diabetic AIS patients, and each 1% increase in HbA1c in AIS patients with DM was associated with an 87% reduction in 3-month adverse clinical outcomes (OR = 0.13, 95% CI: 0.02–0.81). Conversely, when HbA1c exceeded 6.1%, a positive association between HbA1c and 3-month adverse clinical outcomes became apparent in diabetic AIS patients, and each 1% increase in HbA1c in AIS patients with DM was associated with a 23% increase in 3-month adverse clinical outcomes (OR = 1.23, 95%CI: 1.03–1.47). However, it’s important to note that no significant linear or nonlinear relationships were observed between HbA1c levels and 3-month adverse clinical outcomes in AIS patients without diabetes. Conclusion Our findings suggest a nonlinear connection and threshold effect between HbA1c and 3-month adverse clinical outcomes in AIS patients with diabetes. AIS patients with diabetes had a lower risk of 3-month adverse clinical outcomes when their HbA1c control was close to 6.1%. Our findings may aid treatment decision-making and potentially guide interventions to optimize glycemic control in AIS patients

    Active learning with deep pre-trained models for sequence tagging of clinical and biomedical texts

    No full text
    Active learning is a technique that helps to minimize the annotation budget required for the creation of a labeled dataset while maximizing the performance of a model trained on this dataset. It has been shown that active learning can be successfully applied to sequence tagging tasks of text processing in conjunction with deep learning models even when a limited amount of labeled data is available. Recent advances in transfer learning methods for natural language processing based on deep pre-trained models such as ELMo and BERT offer a much better ability to generalize on small annotated datasets compared to their shallow counterparts. The combination of deep pre-trained models and active learning leads to a powerful approach to dealing with annotation scarcity. In this work, we investigate the potential of this approach on clinical and biomedical data. The experimental evaluation shows that the combination of active learning and deep pre-trained models outperforms the standard methods of active learning. We also suggest a modification to a standard uncertainty sampling strategy and empirically show that it could be beneficial for annotation of very skewed datasets. Finally, we propose an annotation tool empowered with active learning and deep pre-trained models that could be used for entity annotation directly from Jupyter IDE

    Developing a Visual Analysis Platform of Human Rabies for Hubei Province of China (VAP-HRHB)

    No full text
    As an acute zoonosis, rabies has a fatality rate of nearly 100%. Since rabies surveillance data over the years show that the rabies epidemic areas in China expand from southern provinces to central and northern provinces, we selected Hubei Province, an area with a high incidence of rabies in Central China, to investigate national rabies prevention and control work by developing a visual analysis platform (VAP-HRHB: http://www.combio-lezhang.online/rabies/index.html). VAP-HRHB employs bioinformatics methods to predict the future developmental trend for rabies incidence and exposure numbers, thereby comprehensively improving the prevention and control of infectious diseases in China

    An innovative online process mining framework for supporting incremental GDPR compliance of business processes

    No full text
    GDPR (General Data Protection Regulation) is a new regulation of the European Union that superimposes strict privacy constraints on storing, accessing and processing user data, as a way to ensure that personal user data are not violated neither disclosed without an explicit consent. As a consequence, business processes that interact with large amounts of such data may easily cause GDPR violations, due to the typical complexity of such processes. Inspired by these considerations, this paper highlights the challenges and critical aspects associated with the GDPR compliance journey when opting for naïve straight-forward solutions. We propose a business-aware GDPR compliance journey using online process mining. Using several large log files generated based on a real scenario, we show that the proposed tool is both effective and efficient. As such, it proves to be a powerful concept for usage in incremental GDPR compliance environments
    corecore