6,960 research outputs found

    Semiautomatic generation of CORBA interfaces for databases in molecular biology

    Get PDF
    The amount and complexity of genome related data is growing quickly. This highly interrelated data is distributed at many different sites, stored in numerous different formats, and maintained by independent data providers. CORBA, the industry standard for distributed computing, offers the opportunity to make implementation differences and distribution transparent and thereby helps to combine disparate data sources and application programs. In this thesis, the different aspects of CORBA access to molecular biology data are examined in detail. The work is motivated by a concrete application for distributed genome maps. Then, the different design issues relevant to the implementation of CORBA access layers are surveyed and evaluated. The most important of these issues is the question of how to represent data in a CORBA environment using the interface definition language IDL. Different representations have different advantages and disadvantages and the best representation is highly application specific. It is therefore in general impossible to generate a CORBA wrapper automatically for a given database. On the other hand, coding a server for each application manually is tedious and error prone. Therefore, a method is presented for the semiautomatic generation of CORBA wrappers for relational databases. A declarative language is described, which is used to specify the mapping between relations and IDL constructs. Using a set of such mapping rules, a CORBA server can be generated automatically. Additionally, the declarative mapping language allows for the support of ad-hoc queries, which are based on the IDL definitions

    An Introduction to Programming for Bioscientists: A Python-based Primer

    Full text link
    Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a 'variable', the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables, numerous exercises, and 19 pages of Supporting Information; currently in press at PLOS Computational Biolog

    Bioconductor: open software development for computational biology and bioinformatics.

    Get PDF
    The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples

    A Data Transformation System for Biological Data Sources

    Get PDF
    Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data

    The business machine in biology: the commercialization of AI in the life science

    Get PDF
    This paper traces one important trajectory in the history of expert systems. Through a collaboration between Edward Feigenbaum and the geneticist Joshua Lederberg, Nobel Laureate in Medicine, AI became deeply connected to the life sciences. Biology was a crucial test-bed for some of Feigenbaums systems and, in the long term, these systems had a transformative effect on biology. In particular, the work of Feigenbaum and his collaborators and students, brought biology and computing together in especially powerful ways. We now take for granted that biology can be computerized we have whole sub-disciplines such as bioinformatics, biocomputing, and computational biology devoted to the task of studying life as information. The computer systems and software that Feigenbaums lab helped to develop played an important role in establishing the possibility of these kinds of work

    A Semantic Web Management Model for Integrative Biomedical Informatics

    Get PDF
    Data, data everywhere. The diversity and magnitude of the data generated in the Life Sciences defies automated articulation among complementary efforts. The additional need in this field for managing property and access permissions compounds the difficulty very significantly. This is particularly the case when the integration involves multiple domains and disciplines, even more so when it includes clinical and high throughput molecular data.The emergence of Semantic Web technologies brings the promise of meaningful interoperation between data and analysis resources. In this report we identify a core model for biomedical Knowledge Engineering applications and demonstrate how this new technology can be used to weave a management model where multiple intertwined data structures can be hosted and managed by multiple authorities in a distributed management infrastructure. Specifically, the demonstration is performed by linking data sources associated with the Lung Cancer SPORE awarded to The University of Texas MD Anderson Cancer Center at Houston and the Southwestern Medical Center at Dallas. A software prototype, available with open source at www.s3db.org, was developed and its proposed design has been made publicly available as an open source instrument for shared, distributed data management.The Semantic Web technologies have the potential to addresses the need for distributed and evolvable representations that are critical for systems Biology and translational biomedical research. As this technology is incorporated into application development we can expect that both general purpose productivity software and domain specific software installed on our personal computers will become increasingly integrated with the relevant remote resources. In this scenario, the acquisition of a new dataset should automatically trigger the delegation of its analysis

    1st INCF Workshop on Sustainability of Neuroscience Databases

    Get PDF
    The goal of the workshop was to discuss issues related to the sustainability of neuroscience databases, identify problems and propose solutions, and formulate recommendations to the INCF. The report summarizes the discussions of invited participants from the neuroinformatics community as well as from other disciplines where sustainability issues have already been approached. The recommendations for the INCF involve rating, ranking, and supporting database sustainability
    • …
    corecore