259,085 research outputs found
Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation
Background: Manual curation of experimental data from the biomedical literature is an expensive and time-consuming endeavor. Nevertheless, most biological knowledge bases still rely heavily on manual curation for data extraction and entry. Text mining software that can semi- or fully automate information retrieval from the literature would thus provide a significant boost to manual curation efforts.
Results: We employ the Textpresso category-based information retrieval and extraction system http://www.textpresso.org webcite, developed by WormBase to explore how Textpresso might improve the efficiency with which we manually curate C. elegans proteins to the Gene Ontology's Cellular Component Ontology. Using a training set of sentences that describe results of localization experiments in the published literature, we generated three new curation task-specific categories (Cellular Components, Assay Terms, and Verbs) containing words and phrases associated with reports of experimentally determined subcellular localization. We compared the results of manual curation to that of Textpresso queries that searched the full text of articles for sentences containing terms from each of the three new categories plus the name of a previously uncurated C. elegans protein, and found that Textpresso searches identified curatable papers with recall and precision rates of 79.1% and 61.8%, respectively (F-score of 69.5%), when compared to manual curation. Within those documents, Textpresso identified relevant sentences with recall and precision rates of 30.3% and 80.1% (F-score of 44.0%). From returned sentences, curators were able to make 66.2% of all possible experimentally supported GO Cellular Component annotations with 97.3% precision (F-score of 78.8%). Measuring the relative efficiencies of Textpresso-based versus manual curation we find that Textpresso has the potential to increase curation efficiency by at least 8-fold, and perhaps as much as 15-fold, given differences in individual curatorial speed.
Conclusion: Textpresso is an effective tool for improving the efficiency of manual, experimentally based curation. Incorporating a Textpresso-based Cellular Component curation pipeline at WormBase has allowed us to transition from strictly manual curation of this data type to a more efficient pipeline of computer-assisted validation. Continued development of curation task-specific Textpresso categories will provide an invaluable resource for genomics databases that rely heavily on manual curation
A Puff of Steem: Security Analysis of Decentralized Content Curation
Decentralized content curation is the process through which uploaded posts are ranked and filtered based exclusively on users\u27 feedback. Platforms such as the blockchain-based Steemit employ this type of curation while providing monetary incentives to promote the visibility of high quality posts according to the perception of the participants. Despite the wide adoption of the platform very little is known regarding its performance and resilience characteristics. In this work, we provide a formal model for decentralized content curation that identifies salient complexity and game-theoretic measures of performance and resilience to selfish participants. Armed with our model, we provide a first analysis of Steemit identifying the conditions under which the system can be expected to correctly converge to curation while we demonstrate its susceptibility to selfish participant behaviour. We validate our theoretical results with system simulations in various scenarios
Recommended from our members
Exploring a curatorial turn in journalism
Curation has moved from the ‘rarefied’ atmosphere of museums and exhibitions into journalism where new discourses and practices are proliferating. The changes have attracted academic attention such that journalism is now facing its own curatorial turn akin to what Paul O’Neill identified in museum studies. This article draws on a meta-analysis of journal articles in the field to argue that the prevalent instrumentalist definitions of curation are necessary but not sufficient to capture what the shifts in discourse and practices mean for journalism. In order to derive a more nuanced conceptualization of curation that includes the instrumental and metaphorical, the article draws on literature beyond the field of journalism studies to trace the changing meanings of the term from curation from antiquity to the digital age. The conditions are propitious for the movement of new practices into newsrooms but where it fits in relation to existing professions is intellectually unclear because of a lack of conceptual clarity as to how curation overlaps and differs from other roles. The article offers a preliminary attempt to address this
International data curation education action (IDEA) working group: a report from the second workshop of IDEA
The second workshop of the International Data curation Education (IDEA) Working Group was held December 5, 2008, in Edinburgh, Scotland, following the 4th International Digital Curation Conference. This workshop was jointly organized by the UK's Digital Curation Centre (DCC), the US's Institute of Museum and Library Services (IMLS), and the School of Information and Library Science at the University of North Carolina at Chapel Hill (SILS). Nearly forty educators and researchers accepted invitations to attend, with representation from universities, research centers, and funding agencies from Canada, the US, the UK, and Germany
Curation, curation, curation
The media curation craze has spawned a multitude of new sites that help users to collect and share web content. Some market themselves as spaces to explore a common interest through different types of related media. Others are promoted as a means for creating and sharing stories, or producing personalized newspapers. Still others target the education market, claiming that curation can be a powerful learning tool for web-based content. But who really benefits from the curation task: the content curator or the content consumer? This paper will argue that for curation to fully support learning, on either side, then the curation site has to allow the content curator to research and tell stories through their selected content and for the consumer to rewrite the story for themselves. This brings the curation task inline with museum practice, where museum professionals tell stories through careful selection, organization and presentation of objects in an exhibition, backed up by research. This paper introduces the notion of ‘recuration’ to describe a process in which shared content can be used as part of learning
Wormbase Curation Interfaces and Tools
Curating biological information from the published literature can be time- and labor-intensive especially without automated tools. WormBase1 has adopted several curation interfaces and tools, most of which were built in-house, to help curators recognize and extract data more efficiently from the literature. These tools range from simple computer interfaces for data entry to employing scripts that take advantage of complex text extraction algorithms, which automatically identify specific objects in a paper and presents them to the curator for curation. By using these in-house tools, we are also able to tailor the tool to the individual needs and preferences of the curator. For example, Gene Ontology Cellular Component and gene-gene interaction curators employ the text mining software Textpresso2 to indentify, retrieve, and extract relevant sentences from the full text of an article. The curators then use a web-based curation form to enter the data into our local database. For transgene and antibody curation, curators use the publicly available Phenote ontology annotation curation interface (developed by the Berkeley Bioinformatics Open-Source Projects (BBOP)), which we have adapted with datatype specific configurations. This tool has been used as a basis for developing our own Ontology Annotator tool, which is being used by our phenotype and gene ontology curators. For RNAi curation, we created web-based submission forms that allow the curator to efficiently capture all relevant information. In all cases, the data undergoes a final scripted data dump step to make sure all the information conforms into a readable file by our object oriented database
Permanent Objects, Disposable Systems
4th International Conference on Open RepositoriesThis presentation was part of the session : Conference PresentationsDate: 2009-05-19 01:00 PM – 02:30 PMThe California Digital Library (CDL) preservation program is re-envisioning its curation infrastructure as a set of loosely-coupled, distributed micro-services. There are many monolithic systems that support a range of preservation activities but also require the user and the hosting institution to buy-in to a particular system culture. The result is an institution that becomes, say, a DSpace, Fedora, or LOCKSS "shop", with a specific worldview and set of object flows and structures that will eventually need to be abandoned when it comes time to transition to the next system. Experience shows that these transitions are unavoidable, despite claims that once an object is in the system, it will be safe forever. In view of this it is safer and more cost-effective to acknowledge from the outset the inevitable transient nature of systems and to plan on managing, rather than resisting change. The disruption caused by change can be mitigated by basing curation services on simple universal structures and protocols (e.g., filesystems, HTTP) and micro-services that operate on them. We promote a "mix and match" approach in which appropriate content- and context-specific curation workflows can be nimbly constructed by combining necessary functions drawn from a granular set of independent micro-services. Micro-services, whether deployed in isolation or in combination, are especially suited to exploitation upstream towards content creators who normally don't want to think about preservation, especially if it's costly; compared to buying into an entire curation culture, it is easy to adopt a small, inexpensive tool that requires very little commitment. We see digital curation as an ongoing process of enrichment at all stages in the lifecycle of a digital object. Because the early developmental stages are so critical to an object's health and longevity, it is desirable to push curation "best practices" as far upstream towards the object creators as possible. If preservation is considered only when objects are close to retirement, it is often too late to correct the structural and semantic deficiencies that can impair object usability. The later the intervention, the more expensive the correction process, and it is always difficult to fund interventions for "has been" objects. In contrast, early stage curation challenges traditional practices. Traditionally, preservation actions are often based on end-stage processing, where objects are deposited "as is" and kept out of harm's way by limiting access (i.e., dark archives). While some systems are designed to be dark or "dim", with limited access and little regard for versioning or object enrichment, enrichment and access are now seen as necessary curation actions, that is, interventions for the sake of preservation. In particular, the darkness of an entire collection can change in the blink of an eye, for example, as the result of a court ruling or access rights purchase; turning the lights on for a collection should be as simple as throwing a switch, and not require transferring the collection from a "preservation repository" to an "access repository". Effective curation services must be flexible and easily configurable in order to respond appropriately to the wide diversity of content and content uses. To be most effective, not only should curation practices be pushed upstream but also they should be pushed out to many different contexts. The micro-services approach promotes the idea that curation is an outcome, not a place. Curation actions should be applied to content where it most usefully exists for the convenience of its creators or users. For example, high value digital assets in access repositories, or even scholars' desktops, would certainly benefit from such things as persistent identification or regular audits to discover and repair bit-level damage, functions usually available only in the context of a "preservation system" but now easily applied to content where it most usefully resides without requiring transfer to a central location
LIBER's involvement in supporting digital preservation in member libraries
Digital curation and preservation represent new challenges for universities. LIBER
has invested considerable effort to engage with the new agendas of digital preservation
and digital curation. Through two successful phases of the LIFE project, LIBER
is breaking new ground in identifying innovative models for costing digital curation
and preservation. Through LIFE’s input into the US-UK Blue Ribbon Task Force on
Sustainable Digital Preservation and Access, LIBER is aligned with major international
work in the economics of digital preservation. In its emerging new strategy and
structures, LIBER will continue to make substantial contributions in this area, mindful
of the needs of European research libraries
- …