26,056 research outputs found
CS 475/675: Web Information Systems
This course covers advanced topics in managing W eh-based resources, with a focus on building applications involving heterogeneous data. It will expose students to the following concept, topics, architectures, techniques, and technologies:
• data, metadata, information, knowledge, and ontologies• unstructured, semi-structured, structured, multimodal, multimedia, and sensor data syntax,structural/representational, and semantic aspects of data• architectures: federated databases, mediator, information brokering• integration and analysis of Web-based information• automatic information/metadata extraction (entity identification/recognition, disambiguation)• Web search engines, social networks, Web 2.0• Semantic Web and Web 3.0• relevant Web standards and technologies• real-world examples that have major research projects and commercial product
A hyperconnected manufacturing collaboration system using the semantic web and Hadoop ecosystem system
With the explosive growth of digital data communications in synergistic operating networks and cloud computing service, hyperconnected
manufacturing collaboration systems face the challenges of extracting, processing, and analyzing data from multiple distributed web sources.
Although semantic web technologies provide the solution to web data interoperability by storing the semantic web standard in relational
databases for processing and analyzing of web-accessible heterogeneous digital data, web data storage and retrieval via the predefined schema
of relational / SQL databases has become increasingly inefficient with the advent of big data. In response to this problem, the Hadoop
Ecosystem System is being adopted to reduce the complexity of moving data to and from the big data cloud platform. This paper proposes a
novel approach in a set of the Hadoop tools for information integration and interoperability across hyperconnected manufacturing collaboration
systems. In the Hadoop approach, data is “Extracted” from the web sources, “Loaded” into a set of the NoSQL Hadoop Database (HBase)
tables, and then “Transformed” and integrated into the desired format model with Hive's schema-on-read. A case study was conducted to
illustrate that the Hadoop Extract-Load-Transform (ELT) approach for the syntax and semantics web data integration could be adopted across
the global smartphone value chain
XML in Motion from Genome to Drug
Information technology (IT) has emerged as a central to the solution of contemporary genomics and drug discovery problems. Researchers involved in genomics, proteomics, transcriptional profiling, high throughput structure determination, and in other sub-disciplines of bioinformatics have direct impact on this IT revolution. As the full genome sequences of many species, data from structural genomics, micro-arrays, and proteomics became available, integration of these data to a common platform require sophisticated bioinformatics tools. Organizing these data into knowledgeable databases and developing appropriate software tools for analyzing the same are going to be major challenges. XML (eXtensible Markup Language) forms the backbone of biological data representation and exchange over the internet, enabling researchers to aggregate data from various heterogeneous data resources. The present article covers a comprehensive idea of the integration of XML on particular type of biological databases mainly dealing with sequence-structure-function relationship and its application towards drug discovery. This e-medical science approach should be applied to other scientific domains and the latest trend in semantic web applications is also highlighted
A Reference Architecture for Building Semantic-Web Mediators
The Semantic Web comprises a large amount of distributed
and heterogeneous ontologies, which have been developed by different
communities, and there exists a need to integrate them. Mediators are
pieces of software that help to perform this integration, which have been
widely studied in the context of nested relational models. Unfortunately,
mediators for databases that are modelled using ontologies have not been
so widely studied. In this paper, we present a reference architecture for
building semantic-web mediators. To the best of our knowledge, this is
the first reference architecture in the bibliography that solves the integration
problem as a whole, contrarily to existing approaches that focus on
specific problems. Furthermore, we describe a case study that is contextualised
in the digital libraries domain in which we realise the benefits of
our reference architecture. Finally, we identify a number of best practices
to build semantic-web mediators.Ministerio de Educación y Ciencia TIN2007-64119Junta de Andalucía P07-TIC-2602,Junta de Andalucía P08-TIC-4100Ministerio de Industria, Turismo y Comercio TIN2008-04718-EMinisterio de Ciencia e Innovación TIN2010-21744Ministerio de Economía, Industria y Competitividad TIN2010-09809-EMinisterio de Ciencia e Innovación TIN2010-10811-EMinisterio de Ciencia e Innovación TIN2010-09988-
A Survey of Semantic Integration Approaches in Bioinformatics
Technological advances of computer science and data
analysis are helping to provide continuously huge volumes of
biological data, which are available on the web. Such advances
involve and require powerful techniques for data integration to
extract pertinent knowledge and information for a specific question.
Biomedical exploration of these big data often requires the use
of complex queries across multiple autonomous, heterogeneous
and distributed data sources. Semantic integration is an active
area of research in several disciplines, such as databases,
information-integration, and ontology. We provide a survey of some
approaches and techniques for integrating biological data, we focus
on those developed in the ontology community
Semantic-JSON: a lightweight web service interface for Semantic Web contents integrating multiple life science databases
Global cloud frameworks for bioinformatics research databases become huge and heterogeneous; solutions face various diametric challenges comprising cross-integration, retrieval, security and openness. To address this, as of March 2011 organizations including RIKEN published 192 mammalian, plant and protein life sciences databases having 8.2 million data records, integrated as Linked Open or Private Data (LOD/LPD) using SciNetS.org, the Scientists' Networking System. The huge quantity of linked data this database integration framework covers is based on the Semantic Web, where researchers collaborate by managing metadata across public and private databases in a secured data space. This outstripped the data query capacity of existing interface tools like SPARQL. Actual research also requires specialized tools for data analysis using raw original data. To solve these challenges, in December 2009 we developed the lightweight Semantic-JSON interface to access each fragment of linked and raw life sciences data securely under the control of programming languages popularly used by bioinformaticians such as Perl and Ruby. Researchers successfully used the interface across 28 million semantic relationships for biological applications including genome design, sequence processing, inference over phenotype databases, full-text search indexing and human-readable contents like ontology and LOD tree viewers. Semantic-JSON services of SciNetS.org are provided at http://semanticjson.org
- …