6,100 research outputs found

    Data access and integration in the ISPIDER proteomics grid

    Get PDF
    Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources

    Processing Rank-Aware Queries in Schema-Based P2P Systems

    Get PDF
    Effiziente Anfragebearbeitung in Datenintegrationssystemen sowie in P2P-Systemen ist bereits seit einigen Jahren ein Aspekt aktueller Forschung. Konventionelle Datenintegrationssysteme bestehen aus mehreren Datenquellen mit ggf. unterschiedlichen Schemata, sind hierarchisch aufgebaut und besitzen eine zentrale Komponente: den Mediator, der ein globales Schema verwaltet. Anfragen an das System werden auf diesem globalen Schema formuliert und vom Mediator bearbeitet, indem relevante Daten von den Datenquellen transparent für den Benutzer angefragt werden. Aufbauend auf diesen Systemen entstanden schließlich Peer-Daten-Management-Systeme (PDMSs) bzw. schemabasierte P2P-Systeme. An einem PDMS teilnehmende Knoten (Peers) können einerseits als Mediatoren agieren andererseits jedoch ebenso als Datenquellen. Darüber hinaus sind diese Peers autonom und können das Netzwerk jederzeit verlassen bzw. betreten. Die potentiell riesige Datenmenge, die in einem derartigen Netzwerk verfügbar ist, führt zudem in der Regel zu sehr großen Anfrageergebnissen, die nur schwer zu bewältigen sind. Daher ist das Bestimmen einer vollständigen Ergebnismenge in vielen Fällen äußerst aufwändig oder sogar unmöglich. In diesen Fällen bietet sich die Anwendung von Top-N- und Skyline-Operatoren, ggf. in Verbindung mit Approximationstechniken, an, da diese Operatoren lediglich diejenigen Datensätze als Ergebnis ausgeben, die aufgrund nutzerdefinierter Ranking-Funktionen am relevantesten für den Benutzer sind. Da durch die Anwendung dieser Operatoren zumeist nur ein kleiner Teil des Ergebnisses tatsächlich dem Benutzer ausgegeben wird, muss nicht zwangsläufig die vollständige Ergebnismenge berechnet werden sondern nur der Teil, der tatsächlich relevant für das Endergebnis ist. Die Frage ist nun, wie man derartige Anfragen durch die Ausnutzung dieser Erkenntnis effizient in PDMSs bearbeiten kann. Die Beantwortung dieser Frage ist das Hauptanliegen dieser Dissertation. Zur Lösung dieser Problemstellung stellen wir effiziente Anfragebearbeitungsstrategien in PDMSs vor, die die charakteristischen Eigenschaften ranking-basierter Operatoren sowie Approximationstechniken ausnutzen. Peers werden dabei sowohl auf Schema- als auch auf Datenebene hinsichtlich der Relevanz ihrer Daten geprüft und dementsprechend in die Anfragebearbeitung einbezogen oder ausgeschlossen. Durch die Heterogenität der Peers werden Techniken zum Umschreiben einer Anfrage von einem Schema in ein anderes nötig. Da existierende Techniken zum Umschreiben von Anfragen zumeist nur konjunktive Anfragen betrachten, stellen wir eine Erweiterung dieser Techniken vor, die Anfragen mit ranking-basierten Anfrageoperatoren berücksichtigt. Da PDMSs dynamische Systeme sind und teilnehmende Peers jederzeit ihre Daten ändern können, betrachten wir in dieser Dissertation nicht nur wie Routing-Indexe verwendet werden, um die Relevanz eines Peers auf Datenebene zu bestimmen, sondern auch wie sie gepflegt werden können. Schließlich stellen wir SmurfPDMS (SiMUlating enviRonment For Peer Data Management Systems) vor, ein System, welches im Rahmen dieser Dissertation entwickelt wurde und alle vorgestellten Techniken implementiert.In recent years, there has been considerable research with respect to query processing in data integration and P2P systems. Conventional data integration systems consist of multiple sources with possibly different schemas, adhere to a hierarchical structure, and have a central component (mediator) that manages a global schema. Queries are formulated against this global schema and the mediator processes them by retrieving relevant data from the sources transparently to the user. Arising from these systems, eventually Peer Data Management Systems (PDMSs), or schema-based P2P systems respectively, have attracted attention. Peers participating in a PDMS can act both as a mediator and as a data source, are autonomous, and might leave or join the network at will. Due to these reasons peers often hold incomplete or erroneous data sets and mappings. The possibly huge amount of data available in such a network often results in large query result sets that are hard to manage. Due to these reasons, retrieving the complete result set is in most cases difficult or even impossible. Applying rank-aware query operators such as top-N and skyline, possibly in conjunction with approximation techniques, is a remedy to these problems as these operators select only those result records that are most relevant to the user. Being aware that in most cases only a small fraction of the complete result set is actually output to the user, retrieving the complete set before evaluating such operators is obviously inefficient. Therefore, the questions we want to answer in this dissertation are how to compute such queries in PDMSs and how to do that efficiently. We propose strategies for efficient query processing in PDMSs that exploit the characteristics of rank-aware queries and optionally apply approximation techniques. A peer's relevance is determined on two levels: on schema-level and on data-level. According to its relevance a peer is either considered for query processing or not. Because of heterogeneity queries need to be rewritten, enabling cooperation between peers that use different schemas. As existing query rewriting techniques mostly consider conjunctive queries only, we present an extension that allows for rewriting queries involving rank-aware query operators. As PDMSs are dynamic systems and peers might update their local data, this dissertation addresses not only the problem of considering such structures within a query processing strategy but also the problem of keeping them up-to-date. Finally, we provide a system-level evaluation by presenting SmurfPDMS (SiMUlating enviRonment For Peer Data Management Systems) -- a system created in the context of this dissertation implementing all presented techniques

    Institutional change and plant variety provision in Australia

    Get PDF
    Crop Production/Industries,

    LinkedScales : bases de dados em multiescala

    Get PDF
    Orientador: André SantanchèTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: As ciências biológicas e médicas precisam cada vez mais de abordagens unificadas para a análise de dados, permitindo a exploração da rede de relacionamentos e interações entre elementos. No entanto, dados essenciais estão frequentemente espalhados por um conjunto cada vez maior de fontes com múltiplos níveis de heterogeneidade entre si, tornando a integração cada vez mais complexa. Abordagens de integração existentes geralmente adotam estratégias especializadas e custosas, exigindo a produção de soluções monolíticas para lidar com formatos e esquemas específicos. Para resolver questões de complexidade, essas abordagens adotam soluções pontuais que combinam ferramentas e algoritmos, exigindo adaptações manuais. Abordagens não sistemáticas dificultam a reutilização de tarefas comuns e resultados intermediários, mesmo que esses possam ser úteis em análises futuras. Além disso, é difícil o rastreamento de transformações e demais informações de proveniência, que costumam ser negligenciadas. Este trabalho propõe LinkedScales, um dataspace baseado em múltiplos níveis, projetado para suportar a construção progressiva de visões unificadas de fontes heterogêneas. LinkedScales sistematiza as múltiplas etapas de integração em escalas, partindo de representações brutas (escalas mais baixas), indo gradualmente para estruturas semelhantes a ontologias (escalas mais altas). LinkedScales define um modelo de dados e um processo de integração sistemático e sob demanda, através de transformações em um banco de dados de grafos. Resultados intermediários são encapsulados em escalas reutilizáveis e transformações entre escalas são rastreadas em um grafo de proveniência ortogonal, que conecta objetos entre escalas. Posteriormente, consultas ao dataspace podem considerar objetos nas escalas e o grafo de proveniência ortogonal. Aplicações práticas de LinkedScales são tratadas através de dois estudos de caso, um no domínio da biologia -- abordando um cenário de análise centrada em organismos -- e outro no domínio médico -- com foco em dados de medicina baseada em evidênciasAbstract: Biological and medical sciences increasingly need a unified, network-driven approach for exploring relationships and interactions among data elements. Nevertheless, essential data is frequently scattered across sources with multiple levels of heterogeneity. Existing data integration approaches usually adopt specialized, heavyweight strategies, requiring a costly upfront effort to produce monolithic solutions for handling specific formats and schemas. Furthermore, such ad-hoc strategies hamper the reuse of intermediary integration tasks and outcomes. This work proposes LinkedScales, a multiscale-based dataspace designed to support the progressive construction of a unified view of heterogeneous sources. It departs from raw representations (lower scales) and goes towards ontology-like structures (higher scales). LinkedScales defines a data model and a systematic, gradual integration process via operations over a graph database. Intermediary outcomes are encapsulated as reusable scales, tracking the provenance of inter-scale operations. Later, queries can combine both scale data and orthogonal provenance information. Practical applications of LinkedScales are discussed through two case studies on the biology domain -- addressing an organism-centric analysis scenario -- and the medical domain -- focusing on evidence-based medicine dataDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação141353/2015-5CAPESCNP

    An Edge-Cloud based Reference Architecture to support cognitive solutions in Process Industry

    Get PDF
    Process Industry is one of the leading sectors of the world economy, characterized however by intense environmental impact, and very high-energy consumption. Despite a traditional low innovation pace in PI, in the recent years a strong push at worldwide level towards the dual objective of improving the efficiency of plants and the quality of products, significantly reducing the consumption of electricity and CO2 emissions has taken momentum. Digital Technologies (namely Smart Embedded Systems, IoT, Data, AI and Edge-to-Cloud Technologies) are enabling drivers for a Twin Digital-Green Transition, as well as foundations for human centric, safe, comfortable and inclusive workplaces. Currently, digital sensors in plants produce a large amount of data, which in most cases constitutes just a potential and not a real value for Process Industry, often locked-in in close proprietary systems and seldomly exploited. Digital technologies, with process modelling-simulation via digital twins, can build a bridge between the physical and the virtual worlds, bringing innovation with great efficiency and drastic reduction of waste. In accordance with the guidelines of Industrie 4.0 this work proposes a modular and scalable Reference Architecture, based on open source software, which can be implemented both in brownfield and greenfield scenarios. The ability to distribute processing between the edge, where the data have been created, and the cloud, where the greatest computational resources are available, facilitates the development of integrated digital solutions with cognitive capabilities. The reference architecture is being validated in the three pilot plants, paving the way to the development of integrated planning solutions, with scheduling and control of the plants, optimizing the efficiency and reliability of the supply chain, and balancing energy efficiency

    The centre cannot (always) hold:Examining pathways towards energy system de-centralisation

    Get PDF
    This is the final version. Available on open access from Elsevier via the DOI in this record'Energy decentralisation' means many things to many people. Among the confusion of definitions and practices that may be characterised as decentralisation, three broad causal narratives are commonly (implicitly or explicitly) invoked. These narratives imply that the process of decentralisation: i) will result in appropriate changes to rules and institutions, ii) will be more democratic and iii) is directly and causally linked to energy system decarbonisation. The principal aim of this paper is to critically examine these narratives. By conceptualising energy decentralisation as a distinct class of sociotechnical transition pathway, we present a comparative analysis of energy decentralisation in Cornwall, South West UK, the French island of Ushant and the National Electricity Market in Australia. We show that, while energy decentralisation is often strongly correlated with institutional change, increasing citizen agency in the energy system, and enhanced environmental performance, these trends cannot be assumed as given. Indeed, some decentralisation pathways may entrench incumbent actors' interests or block rapid decarbonisation. In particular, we show how institutional context is a key determinant of the link between energy decentralisation and normative goals such as democratisation and decarbonisation. While institutional theory suggests that changes in rules and institutions are often incremental and path-dependent, the dense legal and regulatory arrangements that develop around the electricity sector seem particularly resistant to adaptive change. Consequently, policymakers seeking to pursue normative goals such as democratisation or decarbonisation through energy decentralisation need to look beyond technology towards the rules, norms and laws that constitute the energy governance system.Engineering and Physical Sciences Research Council (EPSRC)European Structural and Investment FundINTERREG V FC

    Introducing conflict as the microfoundation of organizational ambidexterity

    Get PDF
    This article contributes to our understanding of organizational ambidexterity by introducing conflict as its microfoundation. Existing research distinguishes between three approaches to how organizations can be ambidextrous, that is, engage in both exploitation and exploration. They may sequentially shift the strategic focus of the organization over time, they may establish structural arrangements enabling the simultaneous pursuit of being both exploitative and explorative, or they may provide a supportive organizational context for ambidextrous behavior. However, we know little about how exactly ambidexterity is accomplished and managed. We argue that ambidexterity is a dynamic and conflict-laden phenomenon, and we locate conflict at the level of individuals, units, and organizations. We develop the argument that conflicts in social interaction serve as the microfoundation to organizing ambidexterity, but that their function and type vary across the different approaches toward ambidexterity. The perspective developed in this article opens up promising research avenues to examine how organizations purposefully manage ambidexterity

    Website Personalization Based on Demographic Data

    Get PDF
    This study focuses on websites personalization based on user's demographic data. The main demographic data that used in this study are age, gender, race and occupation. These data is obtained through user profiling technique conducted during the study. Analysis of the data gathered is done to find the relationship between the user's demographic data and their preferences for a website design. These data will be used as a guideline in order to develop a website that will fulfill the visitor's need. The topic chose was Obesity. HCI issues are considered as one of the important factors in this study which are effectiveness and satisfaction. The methodologies used are website personalization process, incremental model, combination of these two methods and Cascading Style Sheet (CSS) which discussed detail in Chapter 3. After that, we will be discussing the effectiveness and evaluation of the personalization website that have been built. Last but not least, there will be conclusion that present the result of evaluation of the websites made by the respondents
    corecore