1,528 research outputs found

    Data semantic enrichment for complex event processing over IoT Data Streams

    Get PDF
    This thesis generalizes techniques for processing IoT data streams, semantically enrich data with contextual information, as well as complex event processing in IoT applications. A case study for ECG anomaly detection and signal classification was conducted to validate the knowledge foundation

    A Deep Understanding of Structural and Functional Behavior of Tabular and Graphical Modules in Technical Documents

    Get PDF
    The rapid increase of published research papers in recent years has escalated the need for automated ways to process and understand them. The successful recognition of the information that is contained in technical documents, depends on the understanding of the document’s individual modalities. These modalities include tables, graphics, diagrams and etc. as defined in Bourbakis’ pioneering work. However, the depth of understanding is correlated to the efficiency of detection and recognition. In this work, a novel methodology is proposed for automatic processing of and understanding of tables and graphics images in technical document. Previous attempts on tables and graphics understanding retrieve only superficial knowledge such as table contents and axis values. However, the focus on capturing the internal associations and relations between the extracted data from each figure is studied here. The proposed methodology is divided into the following steps: 1) figure detection, 2) figure recognition, 3) figure understanding, by figures we mean tables, graphics and diagrams. More specifically, we evaluate different heuristic and learning methods for classifying table and graphics images as part of the detection module. Table recognition and deep understanding includes the extraction of the knowledge that is illustrated in a table image along with the deeper associations between the table variables. The graphics recognition module follows a clustering based approach in order to recognize middle points. Middle points are 2D points where the direction of the curves changes. They delimit the straight line segments that construct the graphics curves. We use these detected middle points in order to understand various features of each line segment and the associations between them. Additionally, we convert the extracted internal tabular associations and the captured curves’ structural and functional behavior into a common and at the same time unique form of representation, which is the Stochastic Petri-net (SPN) graphs. The use of SPN graphs allow for the merging of different document modalities through the functions that describe them, without any prior knowledge about what these functions are. Finally, we achieve a higher level of document understanding through the synergistic merging of the aforementioned SPN graphs that we extract from the table and graphics modalities. We provide results from every step of the document modalities understanding methodologies and the synergistic merging as proof of concept for this research

    Shale lithofacies modeling of the Bakken Formation in the Williston basin, North Dakota

    Get PDF
    The Bakken petroleum system (Devonian-Mississippian) in the Williston basin of North Dakota and Montana in the United States, and Saskatchewan and Manitoba in Canada is one of the largest unconventional oil plays in North America. The Bakken Formation consists of three members: upper, middle, and lower. Both upper and lower members are shale (source rocks), whereas the middle member (reservoir rock) is composed of mixed lithologies, including sandstone, dolostone, and limestone. Underlying the lower Bakken shale member, the Three Forks Formation is another target for hydrocarbon exploration.;Although the middle Bakken member along with the Three Forks Formation have been the targets for horizontal drilling and hydraulic stimulation throughout the basin, several uncertainties remain, including facies variation due to depositional and diagenetic controls on mineral composition and organic matter content in the Bakken shale members, which could play a significant role in hydrocarbon generation and production. Although the Bakken shale members may look homogeneous in the appearance, they are significantly heterogeneous and complex mixture of quartz, smectite, illite, carbonate, pyrite, and kerogen in varying proportions. Improved characterization of the Bakken shale lithofacies is important to better understand depositional environment, lithofacies distribution, and their potential influence on hydrocarbon production.;The main objective of this work is to investigate vertical and lateral heterogeneities of the Bakken shale lithofacies, based on mineralogy and organic matter richness. Secondly, if the Bakken shale members are composed of different lithofacies, can they be associated with different depositional and/or diagenetic conditions, which could influence source, transportation, and preservation of organic matter and sediment in the Williston basin.;Core data (such as X-ray diffraction, X-ray fluorescence, and Total Organic Carbon content), conventional borehole geophysical logs (such as gamma, resistivity, bulk density, neutron porosity, and photo-electric factor), and advanced petrophysical logs (such as Spectral Gamma and Pulsed Neutron Spectroscopy) are used and integrated together to classify the Bakken shale lithofacies and build models of lithofacies distribution at multiple scales. Usually there are minimal core data, scattered advanced well logs, and ubiquitous conventional well log suites in a petroliferous basin, which hinders lithofacies analysis and petrophysical modeling. Therefore, a significant effort of this work is geared towards developing and applying cost-effective mathematical algorithms (such as Support Vector Machine and Artificial Neural Network etc.) and geostatistical techniques (such as Sequential Indicator Simulation) to classify, predict, and interpolate shale lithofacies with high accuracy, using conventional well log-derived petrophysical parameters from several wells.;The results show that both upper and lower Bakken shale members are vertically and laterally heterogeneous at core, well, and regional scales. Bakken shale members can be classified as five different lithofacies, in terms of mineralogy and organic matter content. Organic-rich shale lithofacies are more dominant than organic-poor shale lithofacies. It appears several factors (such as source of minerals, paleo-redox conditions, organic matter productivity, and preservation etc.) controlled the Bakken shale lithofacies distribution pattern. Silica in the Organic Siliceous Shale (OSS) lithofacies near the basin center is hypothesized to be related to the presence of biogenic silica (e.g. radiolaria), whereas the portion of OSS lithofacies near the basin margin is believed to be associated with eolian action. High organic matter content in the Organic Mudstone (OMD) lithofacies near the basin margin could be interpreted due to the presence of algal matter. The borehole geophysical, petrophysical approaches, and the 3D lithofacies modeling techniques developed in this study can be applied to detailed studies of complex shale formations and exploration of hydrocarbon resources worldwide

    Development of a fixed module repertoire for the analysis and interpretation of blood transcriptome data.

    Get PDF
    As the capacity for generating large-scale molecular profiling data continues to grow, the ability to extract meaningful biological knowledge from it remains a limitation. Here, we describe the development of a new fixed repertoire of transcriptional modules, BloodGen3, that is designed to serve as a stable reusable framework for the analysis and interpretation of blood transcriptome data. The construction of this repertoire is based on co-clustering patterns observed across sixteen immunological and physiological states encompassing 985 blood transcriptome profiles. Interpretation is supported by customized resources, including module-level analysis workflows, fingerprint grid plot visualizations, interactive web applications and an extensive annotation framework comprising functional profiling reports and reference transcriptional profiles. Taken together, this well-characterized and well-supported transcriptional module repertoire can be employed for the interpretation and benchmarking of blood transcriptome profiles within and across patient cohorts. Blood transcriptome fingerprints for the 16 reference cohorts can be accessed interactively via: https://drinchai.shinyapps.io/BloodGen3Module/

    Emerging landscape of oncogenic signatures across human cancers.

    Get PDF
    Cancer therapy is challenged by the diversity of molecular implementations of oncogenic processes and by the resulting variation in therapeutic responses. Projects such as The Cancer Genome Atlas (TCGA) provide molecular tumor maps in unprecedented detail. The interpretation of these maps remains a major challenge. Here we distilled thousands of genetic and epigenetic features altered in cancers to ∼500 selected functional events (SFEs). Using this simplified description, we derived a hierarchical classification of 3,299 TCGA tumors from 12 cancer types. The top classes are dominated by either mutations (M class) or copy number changes (C class). This distinction is clearest at the extremes of genomic instability, indicating the presence of different oncogenic processes. The full hierarchy shows functional event patterns characteristic of multiple cross-tissue groups of tumors, termed oncogenic signature classes. Targetable functional events in a tumor class are suggestive of class-specific combination therapy. These results may assist in the definition of clinical trials to match actionable oncogenic signatures with personalized therapies

    The N/O Plateau of Blue Compact Galaxies: Monte Carlo Simulations of the Observed Scatter

    Get PDF
    Chemical evolution models and Monte Carlo simulation techniques have been combined for the first time to study the distribution of blue compact galaxies on the N/O plateau. Each simulation comprises 70 individual chemical evolution models. For each model, input parameters relating to a galaxy's star formation history (bursting or continuous star formation, star formation efficiency), galaxy age, and outflow rate are chosen randomly from ranges predetermined to be relevant. Predicted abundance ratios from each simulation are collectively overplotted onto the data to test its viability. We present our results both with and without observational scatter applied to the model points. Our study shows that most trial combinations of input parameters, including a simulation comprising only simple models with instantaneous recycling, are successful in reproducing the observed morphology of the N/O plateau once observational scatter is added. Therefore simulations which include delay of nitrogen injection are no longer favored over those which propose that most nitrogen is produced by massive stars, if only the plateau morphology is used as the principal constraint. The one scenario which clearly cannot explain plateau morphology is one in which galaxy ages are allowed to range below 250 Myr. We conclude that the present data for the N/O plateau are insufficient by themselves for identifying the portion of the stellar mass spectrum most responsible for cosmic nitrogen production.Comment: 41 pages, 15 figures; accepted by ApJ, to appear Aug. 20, 200

    XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data

    Get PDF
    This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided he original work is properly cited.Abstract Background The primary goal of pathway analysis using transcriptome data is to find significantly perturbed pathways. However, pathway analysis is not always successful in identifying pathways that are truly relevant to the context under study. A major reason for this difficulty is that a single gene is involved in multiple pathways. In the KEGG pathway database, there are 146 genes, each of which is involved in more than 20 pathways. Thus activation of even a single gene will result in activation of many pathways. This complex relationship often makes the pathway analysis very difficult. While we need much more powerful pathway analysis methods, a readily available alternative way is to incorporate the literature information. Results In this study, we propose a novel approach for prioritizing pathways by combining results from both pathway analysis tools and literature information. The basic idea is as follows. Whenever there are enough articles that provide evidence on which pathways are relevant to the context, we can be assured that the pathways are indeed related to the context, which is termed as relevance in this paper. However, if there are few or no articles reported, then we should rely on the results from the pathway analysis tools, which is termed as significance in this paper. We realized this concept as an algorithm by introducing Context Score and Impact Score and then combining the two into a single score. Our method ranked truly relevant pathways significantly higher than existing pathway analysis tools in experiments with two data sets. Conclusions Our novel framework was implemented as ContextTRAP by utilizing two existing tools, TRAP and BEST. ContextTRAP will be a useful tool for the pathway based analysis of gene expression data since the user can specify the context of the biological experiment in a set of keywords. The web version of ContextTRAP is available at http://biohealth.snu.ac.kr/software/contextTRA

    Pacific Symposium on Biocomputing 2023

    Get PDF
    The Pacific Symposium on Biocomputing (PSB) 2023 is an international, multidisciplinary conference for the presentation and discussion of current research in the theory and application of computational methods in problems of biological significance. Presentations are rigorously peer reviewed and are published in an archival proceedings volume. PSB 2023 will be held on January 3-7, 2023 in Kohala Coast, Hawaii. Tutorials and workshops will be offered prior to the start of the conference.PSB 2023 will bring together top researchers from the US, the Asian Pacific nations, and around the world to exchange research results and address open issues in all aspects of computational biology. It is a forum for the presentation of work in databases, algorithms, interfaces, visualization, modeling, and other computational methods, as applied to biological problems, with emphasis on applications in data-rich areas of molecular biology.The PSB has been designed to be responsive to the need for critical mass in sub-disciplines within biocomputing. For that reason, it is the only meeting whose sessions are defined dynamically each year in response to specific proposals. PSB sessions are organized by leaders of research in biocomputing's 'hot topics.' In this way, the meeting provides an early forum for serious examination of emerging methods and approaches in this rapidly changing field

    Allometry and Ecology of the Bilaterian Gut Microbiome.

    Get PDF
    Classical ecology provides principles for construction and function of biological communities, but to what extent these apply to the animal-associated microbiota is just beginning to be assessed. Here, we investigated the influence of several well-known ecological principles on animal-associated microbiota by characterizing gut microbial specimens from bilaterally symmetrical animals (Bilateria) ranging from flies to whales. A rigorously vetted sample set containing 265 specimens from 64 species was assembled. Bacterial lineages were characterized by 16S rRNA gene sequencing. Previously published samples were also compared, allowing analysis of over 1,098 samples in total. A restricted number of bacterial phyla was found to account for the great majority of gut colonists. Gut microbial composition was associated with host phylogeny and diet. We identified numerous gut bacterial 16S rRNA gene sequences that diverged deeply from previously studied taxa, identifying opportunities to discover new bacterial types. The number of bacterial lineages per gut sample was positively associated with animal mass, paralleling known species-area relationships from island biogeography and implicating body size as a determinant of community stability and niche complexity. Samples from larger animals harbored greater numbers of anaerobic communities, specifying a mechanism for generating more-complex microbial environments. Predictions for species/abundance relationships from models of neutral colonization did not match the data set, pointing to alternative mechanisms such as selection of specific colonists by environmental niche. Taken together, the data suggest that niche complexity increases with gut size and that niche selection forces dominate gut community construction.IMPORTANCEThe intestinal microbiome of animals is essential for health, contributing to digestion of foods, proper immune development, inhibition of pathogen colonization, and catabolism of xenobiotic compounds. How these communities assemble and persist is just beginning to be investigated. Here we interrogated a set of gut samples from a wide range of animals to investigate the roles of selection and random processes in microbial community construction. We show that the numbers of bacterial species increased with the weight of host organisms, paralleling findings from studies of island biogeography. Communities in larger organisms tended to be more anaerobic, suggesting one mechanism for niche diversification. Nonselective processes enable specific predictions for community structure, but our samples did not match the predictions of the neutral model. Thus, these findings highlight the importance of niche selection in community construction and suggest mechanisms of niche diversification

    Statistical Methods in Integrative Genomics

    Get PDF
    Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions
    corecore