6,100 research outputs found
Data access and integration in the ISPIDER proteomics grid
Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources
Processing Rank-Aware Queries in Schema-Based P2P Systems
Effiziente Anfragebearbeitung in Datenintegrationssystemen sowie in
P2P-Systemen ist bereits seit einigen Jahren ein Aspekt aktueller
Forschung. Konventionelle Datenintegrationssysteme bestehen aus mehreren
Datenquellen mit ggf. unterschiedlichen Schemata, sind hierarchisch
aufgebaut und besitzen eine zentrale Komponente: den Mediator, der ein
globales Schema verwaltet. Anfragen an das System werden auf diesem
globalen Schema formuliert und vom Mediator bearbeitet, indem relevante
Daten von den Datenquellen transparent für den Benutzer angefragt werden.
Aufbauend auf diesen Systemen entstanden schließlich
Peer-Daten-Management-Systeme (PDMSs) bzw. schemabasierte P2P-Systeme. An
einem PDMS teilnehmende Knoten (Peers) können einerseits als Mediatoren
agieren andererseits jedoch ebenso als Datenquellen. Darüber hinaus sind
diese Peers autonom und können das Netzwerk jederzeit verlassen bzw.
betreten. Die potentiell riesige Datenmenge, die in einem derartigen
Netzwerk verfügbar ist, führt zudem in der Regel zu sehr großen
Anfrageergebnissen, die nur schwer zu bewältigen sind. Daher ist das
Bestimmen einer vollständigen Ergebnismenge in vielen Fällen äußerst
aufwändig oder sogar unmöglich. In diesen Fällen bietet sich die
Anwendung von Top-N- und Skyline-Operatoren, ggf. in Verbindung mit
Approximationstechniken, an, da diese Operatoren lediglich diejenigen
Datensätze als Ergebnis ausgeben, die aufgrund nutzerdefinierter
Ranking-Funktionen am relevantesten für den Benutzer sind. Da durch die
Anwendung dieser Operatoren zumeist nur ein kleiner Teil des Ergebnisses
tatsächlich dem Benutzer ausgegeben wird, muss nicht zwangsläufig die
vollständige Ergebnismenge berechnet werden sondern nur der Teil, der
tatsächlich relevant für das Endergebnis ist.
Die Frage ist nun, wie man derartige Anfragen durch die Ausnutzung dieser
Erkenntnis effizient in PDMSs bearbeiten kann. Die Beantwortung dieser
Frage ist das Hauptanliegen dieser Dissertation. Zur Lösung dieser
Problemstellung stellen wir effiziente Anfragebearbeitungsstrategien in
PDMSs vor, die die charakteristischen Eigenschaften ranking-basierter
Operatoren sowie Approximationstechniken ausnutzen. Peers werden dabei
sowohl auf Schema- als auch auf Datenebene hinsichtlich der Relevanz ihrer
Daten geprüft und dementsprechend in die Anfragebearbeitung einbezogen
oder ausgeschlossen. Durch die Heterogenität der Peers werden Techniken
zum Umschreiben einer Anfrage von einem Schema in ein anderes nötig. Da
existierende Techniken zum Umschreiben von Anfragen zumeist nur konjunktive
Anfragen betrachten, stellen wir eine Erweiterung dieser Techniken vor, die
Anfragen mit ranking-basierten Anfrageoperatoren berücksichtigt. Da PDMSs
dynamische Systeme sind und teilnehmende Peers jederzeit ihre Daten ändern
können, betrachten wir in dieser Dissertation nicht nur wie Routing-Indexe
verwendet werden, um die Relevanz eines Peers auf Datenebene zu bestimmen,
sondern auch wie sie gepflegt werden können. Schließlich stellen wir
SmurfPDMS (SiMUlating enviRonment For Peer Data Management Systems) vor,
ein System, welches im Rahmen dieser Dissertation entwickelt wurde und alle
vorgestellten Techniken implementiert.In recent years, there has been considerable research with respect to query
processing in data integration and P2P systems. Conventional data
integration systems consist of multiple sources with possibly different
schemas, adhere to a hierarchical structure, and have a central component
(mediator) that manages a global schema. Queries are formulated against
this global schema and the mediator processes them by retrieving relevant
data from the sources transparently to the user. Arising from these
systems, eventually Peer Data Management Systems (PDMSs), or schema-based
P2P systems respectively, have attracted attention. Peers participating in
a PDMS can act both as a mediator and as a data source, are autonomous, and
might leave or join the network at will. Due to these reasons peers often
hold incomplete or erroneous data sets and mappings. The possibly huge
amount of data available in such a network often results in large query
result sets that are hard to manage. Due to these reasons, retrieving the
complete result set is in most cases difficult or even impossible. Applying
rank-aware query operators such as top-N and skyline, possibly in
conjunction with approximation techniques, is a remedy to these problems as
these operators select only those result records that are most relevant to
the user. Being aware that in most cases only a small fraction of the
complete result set is actually output to the user, retrieving the complete
set before evaluating such operators is obviously inefficient.
Therefore, the questions we want to answer in this dissertation are how to
compute such queries in PDMSs and how to do that efficiently. We propose
strategies for efficient query processing in PDMSs that exploit the
characteristics of rank-aware queries and optionally apply approximation
techniques. A peer's relevance is determined on two levels: on schema-level
and on data-level. According to its relevance a peer is either considered
for query processing or not. Because of heterogeneity queries need to be
rewritten, enabling cooperation between peers that use different schemas.
As existing query rewriting techniques mostly consider conjunctive queries
only, we present an extension that allows for rewriting queries involving
rank-aware query operators. As PDMSs are dynamic systems and peers might
update their local data, this dissertation addresses not only the problem
of considering such structures within a query processing strategy but also
the problem of keeping them up-to-date. Finally, we provide a system-level
evaluation by presenting SmurfPDMS (SiMUlating enviRonment For Peer Data
Management Systems) -- a system created in the context of this dissertation
implementing all presented techniques
Institutional change and plant variety provision in Australia
Crop Production/Industries,
LinkedScales : bases de dados em multiescala
Orientador: André SantanchèTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: As ciências biológicas e médicas precisam cada vez mais de abordagens unificadas para a análise de dados, permitindo a exploração da rede de relacionamentos e interações entre elementos. No entanto, dados essenciais estão frequentemente espalhados por um conjunto cada vez maior de fontes com múltiplos níveis de heterogeneidade entre si, tornando a integração cada vez mais complexa. Abordagens de integração existentes geralmente adotam estratégias especializadas e custosas, exigindo a produção de soluções monolíticas para lidar com formatos e esquemas específicos. Para resolver questões de complexidade, essas abordagens adotam soluções pontuais que combinam ferramentas e algoritmos, exigindo adaptações manuais. Abordagens não sistemáticas dificultam a reutilização de tarefas comuns e resultados intermediários, mesmo que esses possam ser úteis em análises futuras. Além disso, é difícil o rastreamento de transformações e demais informações de proveniência, que costumam ser negligenciadas. Este trabalho propõe LinkedScales, um dataspace baseado em múltiplos níveis, projetado para suportar a construção progressiva de visões unificadas de fontes heterogêneas. LinkedScales sistematiza as múltiplas etapas de integração em escalas, partindo de representações brutas (escalas mais baixas), indo gradualmente para estruturas semelhantes a ontologias (escalas mais altas). LinkedScales define um modelo de dados e um processo de integração sistemático e sob demanda, através de transformações em um banco de dados de grafos. Resultados intermediários são encapsulados em escalas reutilizáveis e transformações entre escalas são rastreadas em um grafo de proveniência ortogonal, que conecta objetos entre escalas. Posteriormente, consultas ao dataspace podem considerar objetos nas escalas e o grafo de proveniência ortogonal. Aplicações práticas de LinkedScales são tratadas através de dois estudos de caso, um no domínio da biologia -- abordando um cenário de análise centrada em organismos -- e outro no domínio médico -- com foco em dados de medicina baseada em evidênciasAbstract: Biological and medical sciences increasingly need a unified, network-driven approach for exploring relationships and interactions among data elements. Nevertheless, essential data is frequently scattered across sources with multiple levels of heterogeneity. Existing data integration approaches usually adopt specialized, heavyweight strategies, requiring a costly upfront effort to produce monolithic solutions for handling specific formats and schemas. Furthermore, such ad-hoc strategies hamper the reuse of intermediary integration tasks and outcomes. This work proposes LinkedScales, a multiscale-based dataspace designed to support the progressive construction of a unified view of heterogeneous sources. It departs from raw representations (lower scales) and goes towards ontology-like structures (higher scales). LinkedScales defines a data model and a systematic, gradual integration process via operations over a graph database. Intermediary outcomes are encapsulated as reusable scales, tracking the provenance of inter-scale operations. Later, queries can combine both scale data and orthogonal provenance information. Practical applications of LinkedScales are discussed through two case studies on the biology domain -- addressing an organism-centric analysis scenario -- and the medical domain -- focusing on evidence-based medicine dataDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação141353/2015-5CAPESCNP
An Edge-Cloud based Reference Architecture to support cognitive solutions in Process Industry
Process Industry is one of the leading sectors of the world economy, characterized however by intense environmental impact, and very high-energy consumption. Despite a traditional low innovation pace in PI, in the recent years a strong push at worldwide level towards the dual objective of improving the efficiency of plants and the quality of products, significantly reducing the consumption of electricity and CO2 emissions has taken momentum. Digital Technologies (namely Smart Embedded Systems, IoT, Data, AI and Edge-to-Cloud Technologies) are enabling drivers for a Twin Digital-Green Transition, as well as foundations for human centric, safe, comfortable and inclusive workplaces. Currently, digital sensors in plants produce a large amount of data, which in most cases constitutes just a potential and not a real value for Process Industry, often locked-in in close proprietary systems and seldomly exploited. Digital technologies, with process modelling-simulation via digital twins, can build a bridge between the physical and the virtual worlds, bringing innovation with great efficiency and drastic reduction of waste. In accordance with the guidelines of Industrie 4.0 this work proposes a modular and scalable Reference Architecture, based on open source software, which can be implemented both in brownfield and greenfield scenarios. The ability to distribute processing between the edge, where the data have been created, and the cloud, where the greatest computational resources are available, facilitates the development of integrated digital solutions with cognitive capabilities. The reference architecture is being validated in the three pilot plants, paving the way to the development of integrated planning solutions, with scheduling and control of the plants, optimizing the efficiency and reliability of the supply chain, and balancing energy efficiency
Recommended from our members
Seshat: The Global History Databank
The vast amount of knowledge about past human societies has not been systematically organized and, therefore, remains inaccessible for empirically testing theories about cultural evolution and historical dynamics. For example, what evolutionary mechanisms were involved in the transition from the small-scale, uncentralized societies, in which humans lived 10,000 years ago, to the large-scale societies with an extensive division of labor, great differentials in wealth and power, and elaborate governance structures of today? Why do modern states sometimes fail to meet the basic needs of their populations? Why do economies decline, or fail to grow? In this article, we describe the structure and uses of a massive databank of historical and archaeological information, Seshat: The Global History Databank. The data that we are currently entering in Seshat will allow us and others to test theories explaining how modern societies evolved from ancestral ones, and why modern societies vary so much in their capacity to satisfy their members’ basic human needsPeer reviewedFinal Published versio
The centre cannot (always) hold:Examining pathways towards energy system de-centralisation
This is the final version. Available on open access from Elsevier via the DOI in this record'Energy decentralisation' means many things to many people. Among the confusion of definitions and practices that may be characterised as decentralisation, three broad causal narratives are commonly (implicitly or explicitly) invoked. These narratives imply that the process of decentralisation: i) will result in appropriate changes to rules and institutions, ii) will be more democratic and iii) is directly and causally linked to energy system decarbonisation. The principal aim of this paper is to critically examine these narratives. By conceptualising energy decentralisation as a distinct class of sociotechnical transition pathway, we present a comparative analysis of energy decentralisation in Cornwall, South West UK, the French island of Ushant and the National Electricity Market in Australia. We show that, while energy decentralisation is often strongly correlated with institutional change, increasing citizen agency in the energy system, and enhanced environmental performance, these trends cannot be assumed as given. Indeed, some decentralisation pathways may entrench incumbent actors' interests or block rapid decarbonisation. In particular, we show how institutional context is a key determinant of the link between energy decentralisation and normative goals such as democratisation and decarbonisation. While institutional theory suggests that changes in rules and institutions are often incremental and path-dependent, the dense legal and regulatory arrangements that develop around the electricity sector seem particularly resistant to adaptive change. Consequently, policymakers seeking to pursue normative goals such as democratisation or decarbonisation through energy decentralisation need to look beyond technology towards the rules, norms and laws that constitute the energy governance system.Engineering and Physical Sciences Research Council (EPSRC)European Structural and Investment FundINTERREG V FC
Introducing conflict as the microfoundation of organizational ambidexterity
This article contributes to our understanding of organizational ambidexterity by introducing conflict as its microfoundation. Existing research distinguishes between three approaches to how organizations can be ambidextrous, that is, engage in both exploitation and exploration. They may sequentially shift the strategic focus of the organization over time, they may establish structural arrangements enabling the simultaneous pursuit of being both exploitative and explorative, or they may provide a supportive organizational context for ambidextrous behavior. However, we know little about how exactly ambidexterity is accomplished and managed. We argue that ambidexterity is a dynamic and conflict-laden phenomenon, and we locate conflict at the level of individuals, units, and organizations. We develop the argument that conflicts in social interaction serve as the microfoundation to organizing ambidexterity, but that their function and type vary across the different approaches toward ambidexterity. The perspective developed in this article opens up promising research avenues to examine how organizations purposefully manage ambidexterity
Website Personalization Based on Demographic Data
This study focuses on websites personalization based on user's demographic data. The
main demographic data that used in this study are age, gender, race and occupation.
These data is obtained through user profiling technique conducted during the study.
Analysis of the data gathered is done to find the relationship between the user's
demographic data and their preferences for a website design. These data will be used as a
guideline in order to develop a website that will fulfill the visitor's need. The topic chose
was Obesity. HCI issues are considered as one of the important factors in this study
which are effectiveness and satisfaction. The methodologies used are website
personalization process, incremental model, combination of these two methods and
Cascading Style Sheet (CSS) which discussed detail in Chapter 3. After that, we will be
discussing the effectiveness and evaluation of the personalization website that have been
built. Last but not least, there will be conclusion that present the result of evaluation of
the websites made by the respondents
- …