Skip to main content
Article thumbnail
Location of Repository

Curated Databases

By Peter Buneman, James Cheney, Wang-Chiew Tan and Stijn Vansummeren


Curated databases are databases that are populated and\ud updated with a great deal of human effort. Most reference\ud works that one traditionally found on the reference shelves\ud of libraries – dictionaries, encyclopedias, gazetteers etc. –\ud are now curated databases. Since it is now easy to publish\ud databases on the web, there has been an explosion in the\ud number of new curated databases used in scientific research.\ud The value of curated databases lies in the organization and\ud the quality of the data they contain. Like the paper reference works they have replaced, they usually represent the\ud efforts of a dedicated group of people to produce a definitive\ud description of some subject area.\ud Curated databases present a number of challenges for database research. The topics of annotation, provenance, and\ud citation are central, because curated databases are heavily\ud cross-referenced with, and include data from, other databases, and much of the work of a curator is annotating existing data. Evolution of structure is important because these\ud databases often evolve from semistructured representations,\ud and because they have to accommodate new scientific discoveries. Much of the work in these areas is in its infancy,\ud but it is beginning to provide suggest new research for both\ud theory and practice. We discuss some of this research and\ud emphasize the need to find appropriate models of the processes associated with curated databases

Year: 2008
OAI identifier:

Suggested articles


  1. A model for user-oriented data provenance in pipelined scientific workflows. doi
  2. (1990). A polygen model for heterogeneous database systems: The source tagging perspective.
  3. (1999). AceDB: A genome database management system. doi
  4. (2005). An annotation management system for relational databases. doi
  5. (1985). An overview of the KL-ONE knowledge representation system. doi
  6. (2008). Annotated XML: Queries and provenance. doi
  7. (2006). Annotation propagation revisited for key preserving views. doi
  8. (2004). Archiving scientific data. doi
  9. (2003). CDuce: an XML-centric general-purpose language. In ICFP doi
  10. Central Intelligence Agency. The world factbook. doi
  11. (2003). Containment of relational queries with annotation propagation. doi
  12. (2001). Creating semantic web contents with Protege-2000. doi
  13. (2005). DBNotes: A post-it system for relational databases based on provenance. doi
  14. (2006). Debugging schema mappings with routes.
  15. (2004). den Bussche. DTDs versus XML Schema: a practical study. In WebDB doi
  16. (2008). Dependencies revisited for improving data quality. doi
  17. (1999). Developing Time-Oriented Database Applications in SQL.
  18. (2008). DFL: A dataflow language based on petri nets and nested relational calculus. doi
  19. (2007). Efficient inclusion for a class of XML types with interleaving and counting. doi
  20. (2002). Enriching documents in an information portal using superimposed schematics. doi
  21. (2008). From dirt to shovels: fully automatic tool generation from ad hoc data. In POPL doi
  22. (2006). How to cite curated databases and how to make them citable. doi
  23. (2005). Immortal DB: transaction time support for SQL server. doi
  24. (1984). Incomplete information in relational databases. doi
  25. (2006). Inference of concise DTDs from XML data. In VLDB
  26. (2007). Inferring XML schema definitions from XML data. In VLDB doi
  27. (2007). Issues in building practical provenance systems.
  28. (2002). Keys for XML. doi
  29. (2008). Learning deterministic regular expressions for the inference of schemas from XML data. In WWW doi
  30. (2008). Local hoare reasoning about DOM. doi
  31. (2007). Lux: A lightweight, statically typed XML update language.
  32. (2007). Management of probabilistic data: foundations and challenges. doi
  33. (2006). MONDRIAN: Annotating and querying databases through colors and blocks. doi
  34. (1996). Object fusion in mediator systems.
  35. OMIM - online mendelian inheritance in man. doi
  36. (2003). On the complexity of schema inference from web pages in the presence of nullable data attributes. doi
  37. (2007). On the expressiveness of implicit provenance in query and update languages. doi
  38. (2002). On the propagation of deletions and annotations through views. doi
  39. (2007). Optimizing schema languages for XML: Numerical constraints and interleaving. doi
  40. (2001). Oracle9i flashback query.
  41. (1995). Principles of programming with complex objects and collection types. doi
  42. (2007). Program slicing and data provenance.
  43. (1981). Program slicing. In doi
  44. (2007). Provenance as dependency analysis. doi
  45. (2006). Provenance management in curated databases. doi
  46. (2007). Provenance semirings. doi
  47. (2007). Relational completeness of query languages for annotated databases. doi
  48. (2007). Risk Assessment for AHDS Performing Arts Collections: A Response to the Withdrawal of Core Funding.
  49. (2001). Run-time translation of view tuple deletions using data lineage.
  50. (1998). Source attribution for querying against semi-structured documents.
  51. (1999). Specifying updates in biomedical databases. doi
  52. (2004). Taverna: a tool for the composition and enactment of bioinformatics workflows. doi
  53. (2003). Temporal queries in XML document archives and web warehouses. doi
  54. (2003). Temporal slicing in the evaluation of XML queries. doi
  55. (2000). The Chimaera ontology environment.
  56. (2001). The distributed annotation system. doi
  57. (2000). The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. doi
  58. (2005). The LOCKSS peer-to-peer digital preservation system. doi
  59. (2008). The molecular biology database collection: doi
  60. (1997). The SWISS-PROT protein sequence data bank and its supplement trEMBL. doi
  61. (1997). The TSIMMIS approach to mediation: Data models and languages. doi
  62. (2000). Theorem proving techniques for view deletion in databases. doi
  63. (2000). Tracing the lineage of view data in a warehousing environment. doi
  64. (2000). Tracing the lineage ofview data in a warehousing environment. doi
  65. (1994). Type inference for records in a natural extension of ML. In Theoretical aspects of object-oriented programming.
  66. (2001). Why and where: A characterization of data provenance. doi
  67. (1994). Word problems-this time with interleaving. doi
  68. (2006). Working models for uncertain data. doi
  69. (2008). XArch: Archiving scientific and reference data. doi
  70. (2003). XDuce: A statically typed xml processing language. doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.