3 research outputs found
Big data is 'buzzword du jour;' CS academics 'have the best job'
The
Communications
Web site, http://cacm.acm.org, features more than a dozen bloggers in the BLOG@CACM community. In each issue of
Communications
, we'll publish selected posts or excerpts.
twitter
Follow us on Twitter at http://twitter.com/blogCACM
http://cacm.acm.org/blogs/blog-cacm
Michael Stonebraker analyzes the different varieties of Big Data, while Judy Robertson considers the rewards of teaching computer science.</jats:p
Cloud-based solutions supporting data and knowledge integration in bioinformatics
In recent years, computer advances have changed the way the science progresses and have
boosted studies in silico; as a result, the concept of “scientific research” in bioinformatics
has quickly changed shifting from the idea of a local laboratory activity towards Web
applications and databases provided over the network as services. Thus, biologists have
become among the largest beneficiaries of the information technologies, reaching and
surpassing the traditional ICT users who operate in the field of so-called "hard science"
(i.e., physics, chemistry, and mathematics). Nevertheless, this evolution has to deal with
several aspects (including data deluge, data integration, and scientific collaboration, just to
cite a few) and presents new challenges related to the proposal of innovative approaches in
the wide scenario of emergent ICT solutions.
This thesis aims at facing these challenges in the context of three case studies, being
each case study devoted to cope with a specific open issue by proposing proper solutions in
line with recent advances in computer science.
The first case study focuses on the task of unearthing and integrating information from
different web resources, each having its own organization, terminology and data formats in
order to provide users with flexible environment for accessing the above resources and
smartly exploring their content. The study explores the potential of cloud paradigm as an
enabling technology to severely curtail issues associated with scalability and performance
of applications devoted to support the above task. Specifically, it presents Biocloud Search
EnGene (BSE), a cloud-based application which allows for searching and integrating
biological information made available by public large-scale genomic repositories. BSE is
publicly available at: http://biocloud-unica.appspot.com/.
The second case study addresses scientific collaboration on the Web with special focus
on building a semantic network, where team members, adequately supported by easy
access to biomedical ontologies, define and enrich network nodes with annotations derived
from available ontologies. The study presents a cloud-based application called
Collaborative Workspaces in Biomedicine (COWB) which deals with supporting users in
the construction of the semantic network by organizing, retrieving and creating
connections between contents of different types. Public and private workspaces provide an
accessible representation of the collective knowledge that is incrementally expanded.
COWB is publicly available at: http://cowb-unica.appspot.com/.
Finally, the third case study concerns the knowledge extraction from very large datasets.
The study investigates the performance of random forests in classifying microarray data. In
particular, the study faces the problem of reducing the contribution of trees whose nodes
are populated by non-informative features. Experiments are presented and results are then
analyzed in order to draw guidelines about how reducing the above contribution.
With respect to the previously mentioned challenges, this thesis sets out to give two
contributions summarized as follows. First, the potential of cloud technologies has been
evaluated for developing applications that support the access to bioinformatics resources
and the collaboration by improving awareness of user's contributions and fostering users
interaction. Second, the positive impact of the decision support offered by random forests
has been demonstrated in order to tackle effectively the curse of dimensionality