6,517 research outputs found
Textpresso for Neuroscience: Searching the Full Text of Thousands of Neuroscience Research Papers
Textpresso is a text-mining system for scientific literature. Its two major features are access to the full text of research papers and the development and use of categories of biological concepts as well as categories that describe or relate objects. A search engine enables the user to search for one or a combination of these categories and/or keywords within an entire literature. Here we describe Textpresso for
Neuroscience, part of the core Neuroscience Information Framework
(NIF). The Textpresso site currently consists of 67,500 full text
papers and 131,300 abstracts. We show that using categories in
literature can make a pure keyword query more refined and meaningful.
We also show how semantic queries can be formulated with categories
only. We explain the build and content of the database and describe the
main features of the web pages and the advanced search options. We also
give detailed illustrations of the web service developed to provide
programmatic access to Textpresso. This web service is used by the NIF
interface to access Textpresso. The standalone website of Textpresso
for Neuroscience can be accessed at
http://www.textpresso.org/neuroscience
The NASA Astrophysics Data System: Architecture
The powerful discovery capabilities available in the ADS bibliographic
services are possible thanks to the design of a flexible search and retrieval
system based on a relational database model. Bibliographic records are stored
as a corpus of structured documents containing fielded data and metadata, while
discipline-specific knowledge is segregated in a set of files independent of
the bibliographic data itself.
The creation and management of links to both internal and external resources
associated with each bibliography in the database is made possible by
representing them as a set of document properties and their attributes.
To improve global access to the ADS data holdings, a number of mirror sites
have been created by cloning the database contents and software on a variety of
hardware and software platforms.
The procedures used to create and manage the database and its mirrors have
been written as a set of scripts that can be run in either an interactive or
unsupervised fashion.
The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table
Encoding models for scholarly literature
We examine the issue of digital formats for document encoding, archiving and
publishing, through the specific example of "born-digital" scholarly journal
articles. We will begin by looking at the traditional workflow of journal
editing and publication, and how these practices have made the transition into
the online domain. We will examine the range of different file formats in which
electronic articles are currently stored and published. We will argue strongly
that, despite the prevalence of binary and proprietary formats such as PDF and
MS Word, XML is a far superior encoding choice for journal articles. Next, we
look at the range of XML document structures (DTDs, Schemas) which are in
common use for encoding journal articles, and consider some of their strengths
and weaknesses. We will suggest that, despite the existence of specialized
schemas intended specifically for journal articles (such as NLM), and more
broadly-used publication-oriented schemas such as DocBook, there are strong
arguments in favour of developing a subset or customization of the Text
Encoding Initiative (TEI) schema for the purpose of journal-article encoding;
TEI is already in use in a number of journal publication projects, and the
scale and precision of the TEI tagset makes it particularly appropriate for
encoding scholarly articles. We will outline the document structure of a
TEI-encoded journal article, and look in detail at suggested markup patterns
for specific features of journal articles
ScriptLattes: an open-source knowledge extraction system from the Lattes platform
The Lattes platform is the major scientific information system maintained by the National Council for Scientific and Technological Development (CNPq). This platform allows to manage the curricular information of researchers and institutions working in Brazil based on the so called Lattes Curriculum. However, the public information is individually available for each researcher, not providing the automatic creation of reports of several scientific productions for research groups. It is thus difficult to extract and to summarize useful knowledge for medium to large size groups of researchers. This paper describes the design, implementation and experiences with scriptLattes: an open-source system to create academic reports of groups based on curricula of the Lattes Database. The scriptLattes system is composed by the following modules: (a) data selection, (b) data preprocessing, (c) redundancy treatment, (d) collaboration graph generation among group members, (e) research map generation based on geographical information, and (f) automatic report creation of bibliographical, technical and artistic production, and academic supervisions. The system has been extensively tested for a large variety of research groups of Brazilian institutions, and the generated reports have shown an alternative to easily extract knowledge from data in the context of Lattes platform. The source code, usage instructions and examples are available at http://scriptlattes.sourceforge.net/.Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior (CAPES)CNPqFAPES
Wild bee toxicity data for pesticide risk assessments
Pollination services are vital for agriculture, food security and biodiversity. Although many insect species provide pollination services, honeybees are thought to be the major provider of this service to agriculture. However, the importance of wild bees in this respect should not be overlooked. Whilst regulatory risk assessment processes have, for a long time, included that for pollinators, using honeybees (Apis mellifera) as a protective surrogate, there are concerns that this approach may not be suffciently adequate particularly because of global declines in pollinating insects. Consequently, risk assessments are now being expanded to include wild bee species such as bumblebees (Bombus spp.) and solitary bees (Osmia spp.). However, toxicity data for these species is scarce and are absent from the main pesticide reference resources. The aim of the study described here was to collate data relating to the acute toxicity of pesticides to wild bee species (both topical and dietary exposure) from published regulatory documents and peer reviewed literature, and to incorporate this into one of the main online resources for pesticide risk assessment data: The Pesticide Properties Database, thus ensuring that the data is maintained and continuously kept up to date. The outcome of this study is a dataset collated from 316 regulatory and peer reviewed articles that contains 178 records covering 120 different pesticides and their variants which includes 142 records for bumblebees and a further 115 records for other wild bee species.Peer reviewe
From Data Topology to a Modular Classifier
This article describes an approach to designing a distributed and modular
neural classifier. This approach introduces a new hierarchical clustering that
enables one to determine reliable regions in the representation space by
exploiting supervised information. A multilayer perceptron is then associated
with each of these detected clusters and charged with recognizing elements of
the associated cluster while rejecting all others. The obtained global
classifier is comprised of a set of cooperating neural networks and completed
by a K-nearest neighbor classifier charged with treating elements rejected by
all the neural networks. Experimental results for the handwritten digit
recognition problem and comparison with neural and statistical nonmodular
classifiers are given
- …