51,929 research outputs found

    Justification for inclusion dependency normal form

    Get PDF
    Functional dependencies (FDs) and inclusion dependencies (INDs) are the most fundamental integrity constraints that arise in practice in relational databases. In this paper, we address the issue of normalization in the presence of FDs and INDs and, in particular, the semantic justification for Inclusion Dependency Normal Form (IDNF), a normal form which combines Boyce-Codd normal form with the restriction on the INDs that they be noncircular and key-based. We motivate and formalize three goals of database design in the presence of FDs and INDs: noninteraction between FDs and INDs, elimination of redundancy and update anomalies, and preservation of entity integrity. We show that, as for FDs, in the presence of INDs being free of redundancy is equivalent to being free of update anomalies. Then, for each of these properties, we derive equivalent syntactic conditions on the database design. Individually, each of these syntactic conditions is weaker than IDNF and the restriction that an FD not be embedded in the righthand side of an IND is common to three of the conditions. However, we also show that, for these three goals of database design to be satisfied simultaneously, IDNF is both a necessary and sufficient condition

    Phylogenomics databases for facilitating functional genomics in rice

    Get PDF
    The completion of whole genome sequence of rice (Oryza sativa) has significantly accelerated functional genomics studies. Prior to the release of the sequence, only a few genes were assigned a function each year. Since sequencing was completed in 2005, the rate has exponentially increased. As of 2014, 1,021 genes have been described and added to the collection at The Overview of functionally characterized Genes in Rice online database (OGRO). Despite this progress, that number is still very low compared with the total number of genes estimated in the rice genome. One limitation to progress is the presence of functional redundancy among members of the same rice gene family, which covers 51.6 % of all non-transposable element-encoding genes. There remain a significant portion or rice genes that are not functionally redundant, as reflected in the recovery of loss-of-function mutants. To more accurately analyze functional redundancy in the rice genome, we have developed a phylogenomics databases for six large gene families in rice, including those for glycosyltransferases, glycoside hydrolases, kinases, transcription factors, transporters, and cytochrome P450 monooxygenases. In this review, we introduce key features and applications of these databases. We expect that they will serve as a very useful guide in the post-genomics era of research

    Statistical data mining for symbol associations in genomic databases

    Full text link
    A methodology is proposed to automatically detect significant symbol associations in genomic databases. A new statistical test is proposed to assess the significance of a group of symbols when found in several genesets of a given database. Applied to symbol pairs, the thresholded p-values of the test define a graph structure on the set of symbols. The cliques of that graph are significant symbol associations, linked to a set of genesets where they can be found. The method can be applied to any database, and is illustrated MSigDB C2 database. Many of the symbol associations detected in C2 or in non-specific selections did correspond to already known interactions. On more specific selections of C2, many previously unkown symbol associations have been detected. These associations unveal new candidates for gene or protein interactions, needing further investigation for biological evidence

    XML document design via GN-DTD

    Get PDF
    Designing a well-structured XML document is important for the sake of readability and maintainability. More importantly, this will avoid data redundancies and update anomalies when maintaining a large quantity of XML based documents. In this paper, we propose a method to improve XML structural design by adopting graphical notations for Document Type Definitions (GN-DTD), which is used to describe the structure of an XML document at the schema level. Multiples levels of normal forms for GN-DTD are proposed on the basis of conceptual model approaches and theories of normalization. The normalization rules are applied to transform a poorly designed XML document into a well-designed based on normalized GN-DTD, which is illustrated through examples

    Why is the snowflake schema a good data warehouse design?

    Get PDF
    Database design for data warehouses is based on the notion of the snowflake schema and its important special case, the star schema. The snowflake schema represents a dimensional model which is composed of a central fact table and a set of constituent dimension tables which can be further broken up into subdimension tables. We formalise the concept of a snowflake schema in terms of an acyclic database schema whose join tree satisfies certain structural properties. We then define a normal form for snowflake schemas which captures its intuitive meaning with respect to a set of functional and inclusion dependencies. We show that snowflake schemas in this normal form are independent as well as separable when the relation schemas are pairwise incomparable. This implies that relations in the data warehouse can be updated independently of each other as long as referential integrity is maintained. In addition, we show that a data warehouse in snowflake normal form can be queried by joining the relation over the fact table with the relations over its dimension and subdimension tables. We also examine an information-theoretic interpretation of the snowflake schema and show that the redundancy of the primary key of the fact table is zero

    An analysis of the Sargasso Sea resource and the consequences for database composition

    Get PDF
    Background: The environmental sequencing of the Sargasso Sea has introduced a huge new resource of genomic information. Unlike the protein sequences held in the current searchable databases, the Sargasso Sea sequences originate from a single marine environment and have been sequenced from species that are not easily obtainable by laboratory cultivation. The resource also contains very many fragments of whole protein sequences, a side effect of the shotgun sequencing method.These sequences form a significant addendum to the current searchable databases but also present us with some intrinsic difficulties. While it is important to know whether it is possible to assign function to these sequences with the current methods and whether they will increase our capacity to explore sequence space, it is also interesting to know how current bioinformatics techniques will deal with the new sequences in the resource.Results: The Sargasso Sea sequences seem to introduce a bias that decreases the potential of current methods to propose structure and function for new proteins. In particular the high proportion of sequence fragments in the resource seems to result in poor quality multiple alignments.Conclusion: These observations suggest that the new sequences should be used with care, especially if the information is to be used in large scale analyses. On a positive note, the results may just spark improvements in computational and experimental methods to take into account the fragments generated by environmental sequencing techniques

    Use of a 3-item short-form version of the Barthel Index for use in stroke: systematic review and external validation

    Get PDF
    Background and Purpose—There may be a potential to reduce the number of items assessed in the Barthel Index (BI), and shortened versions of the BI have been described. We sought to collate all existing short-form BI (SF-BI) and perform a comparative validation using clinical trial data. Methods—We performed a systematic review across multidisciplinary electronic databases to find all published SF-BI. Our validation used the VISTA (Virtual International Stroke Trials Archive) resource. We describe concurrent validity (agreement of each SF-BI with BI), convergent and divergent validity (agreement of each SF-BI with other outcome measures available in the data set), predictive validity (association of prognostic factors with SF-BI outcomes), and content validity (item correlation and exploratory factor analyses). Results—From 3546 titles, we found 8 articles describing 6 differing SF-BI. Using acute trial data (n=8852), internal reliability suggested redundancy in BI (Cronbach α, 0.96). Each SF-BI demonstrated a strong correlation with BI, modified Rankin Scale, National Institutes of Health Stroke Scale (all ρ≄0.83; P<0.001). Using rehabilitation trial data (n=332), SF-BI demonstrated modest correlation with quality of life measures Stroke Impact Scale and 5 domain EuroQOL (ρ≄0.50, P<0.001). Prespecified prognostic factors were associated with SF-BI outcomes (all P<0.001). Our factor analysis described a 3 factor structure, and item reduction suggested an optimal 3-item SF-BI comprising bladder control, transfer, and mobility items in keeping with 1 of the 3-item SF-BI previously described in the literature. Conclusions—There is redundancy in the original BI; we have demonstrated internal and external validity of a 3-item SF-BI that should be simple to use
    • 

    corecore