12 research outputs found

    Mining Heterogeneous Urban Data at Multiple Granularity Layers

    Get PDF
    The recent development of urban areas and of the new advanced services supported by digital technologies has generated big challenges for people and city administrators, like air pollution, high energy consumption, traffic congestion, management of public events. Moreover, understanding the perception of citizens about the provided services and other relevant topics can help devising targeted actions in the management. With the large diffusion of sensing technologies and user devices, the capability to generate data of public interest within the urban area has rapidly grown. For instance, different sensors networks deployed in the urban area allow collecting a variety of data useful to characterize several aspects of the urban environment. The huge amount of data produced by different types of devices and applications brings a rich knowledge about the urban context. Mining big urban data can provide decision makers with knowledge useful to tackle the aforementioned challenges for a smart and sustainable administration of urban spaces. However, the high volume and heterogeneity of data increase the complexity of the analysis. Moreover, different sources provide data with different spatial and temporal references. The extraction of significant information from such diverse kinds of data depends also on how they are integrated, hence alternative data representations and efficient processing technologies are required. The PhD research activity presented in this thesis was aimed at tackling these issues. Indeed, the thesis deals with the analysis of big heterogeneous data in smart city scenarios, by means of new data mining techniques and algorithms, to study the nature of urban related processes. The problem is addressed focusing on both infrastructural and algorithmic layers. In the first layer, the thesis proposes the enhancement of the current leading techniques for the storage and elaboration of Big Data. The integration with novel computing platforms is also considered to support parallelization of tasks, tackling the issue of automatic scaling of resources. At algorithmic layer, the research activity aimed at innovating current data mining algorithms, by adapting them to novel Big Data architectures and to Cloud computing environments. Such algorithms have been applied to various classes of urban data, in order to discover hidden but important information to support the optimization of the related processes. This research activity focused on the development of a distributed framework to automatically aggregate heterogeneous data at multiple temporal and spatial granularities and to apply different data mining techniques. Parallel computations are performed according to the MapReduce paradigm and exploiting in-memory computing to reach near-linear computational scalability. By exploring manifold data resolutions in a relatively short time, several additional patterns of data can be discovered, allowing to further enrich the description of urban processes. Such framework is suitably applied to different use cases, where many types of data are used to provide insightful descriptive and predictive analyses. In particular, the PhD activity addressed two main issues in the context of urban data mining: the evaluation of buildings energy efficiency from different energy-related data and the characterization of people's perception and interest about different topics from user-generated content on social networks. For each use case within the considered applications, a specific architectural solution was designed to obtain meaningful and actionable results and to optimize the computational performance and scalability of algorithms, which were extensively validated through experimental tests

    Le nuage de point intelligent

    Full text link
    Discrete spatial datasets known as point clouds often lay the groundwork for decision-making applications. E.g., we can use such data as a reference for autonomous cars and robot’s navigation, as a layer for floor-plan’s creation and building’s construction, as a digital asset for environment modelling and incident prediction... Applications are numerous, and potentially increasing if we consider point clouds as digital reality assets. Yet, this expansion faces technical limitations mainly from the lack of semantic information within point ensembles. Connecting knowledge sources is still a very manual and time-consuming process suffering from error-prone human interpretation. This highlights a strong need for domain-related data analysis to create a coherent and structured information. The thesis clearly tries to solve automation problematics in point cloud processing to create intelligent environments, i.e. virtual copies that can be used/integrated in fully autonomous reasoning services. We tackle point cloud questions associated with knowledge extraction – particularly segmentation and classification – structuration, visualisation and interaction with cognitive decision systems. We propose to connect both point cloud properties and formalized knowledge to rapidly extract pertinent information using domain-centered graphs. The dissertation delivers the concept of a Smart Point Cloud (SPC) Infrastructure which serves as an interoperable and modular architecture for a unified processing. It permits an easy integration to existing workflows and a multi-domain specialization through device knowledge, analytic knowledge or domain knowledge. Concepts, algorithms, code and materials are given to replicate findings and extend current applications.Les ensembles discrets de données spatiales, appelés nuages de points, forment souvent le support principal pour des scénarios d’aide à la décision. Par exemple, nous pouvons utiliser ces données comme référence pour les voitures autonomes et la navigation des robots, comme couche pour la création de plans et la construction de bâtiments, comme actif numérique pour la modélisation de l'environnement et la prédiction d’incidents... Les applications sont nombreuses et potentiellement croissantes si l'on considère les nuages de points comme des actifs de réalité numérique. Cependant, cette expansion se heurte à des limites techniques dues principalement au manque d'information sémantique au sein des ensembles de points. La création de liens avec des sources de connaissances est encore un processus très manuel, chronophage et lié à une interprétation humaine sujette à l'erreur. Cela met en évidence la nécessité d'une analyse automatisée des données relatives au domaine étudié afin de créer une information cohérente et structurée. La thèse tente clairement de résoudre les problèmes d'automatisation dans le traitement des nuages de points pour créer des environnements intelligents, c'est-àdire des copies virtuelles qui peuvent être utilisées/intégrées dans des services de raisonnement totalement autonomes. Nous abordons plusieurs problématiques liées aux nuages de points et associées à l'extraction des connaissances - en particulier la segmentation et la classification - la structuration, la visualisation et l'interaction avec les systèmes cognitifs de décision. Nous proposons de relier à la fois les propriétés des nuages de points et les connaissances formalisées pour extraire rapidement les informations pertinentes à l'aide de graphes centrés sur le domaine. La dissertation propose le concept d'une infrastructure SPC (Smart Point Cloud) qui sert d'architecture interopérable et modulaire pour un traitement unifié. Elle permet une intégration facile aux flux de travail existants et une spécialisation multidomaine grâce aux connaissances liée aux capteurs, aux connaissances analytiques ou aux connaissances de domaine. Plusieurs concepts, algorithmes, codes et supports sont fournis pour reproduire les résultats et étendre les applications actuelles.Diskrete räumliche Datensätze, so genannte Punktwolken, bilden oft die Grundlage für Entscheidungsanwendungen. Beispielsweise können wir solche Daten als Referenz für autonome Autos und Roboternavigation, als Ebene für die Erstellung von Grundrissen und Gebäudekonstruktionen, als digitales Gut für die Umgebungsmodellierung und Ereignisprognose verwenden... Die Anwendungen sind zahlreich und nehmen potenziell zu, wenn wir Punktwolken als Digital Reality Assets betrachten. Allerdings stößt diese Erweiterung vor allem durch den Mangel an semantischen Informationen innerhalb von Punkt-Ensembles auf technische Grenzen. Die Verbindung von Wissensquellen ist immer noch ein sehr manueller und zeitaufwendiger Prozess, der unter fehleranfälliger menschlicher Interpretation leidet. Dies verdeutlicht den starken Bedarf an domänenbezogenen Datenanalysen, um eine kohärente und strukturierte Information zu schaffen. Die Arbeit versucht eindeutig, Automatisierungsprobleme in der Punktwolkenverarbeitung zu lösen, um intelligente Umgebungen zu schaffen, d.h. virtuelle Kopien, die in vollständig autonome Argumentationsdienste verwendet/integriert werden können. Wir befassen uns mit Punktwolkenfragen im Zusammenhang mit der Wissensextraktion - insbesondere Segmentierung und Klassifizierung - Strukturierung, Visualisierung und Interaktion mit kognitiven Entscheidungssystemen. Wir schlagen vor, sowohl Punktwolkeneigenschaften als auch formalisiertes Wissen zu verbinden, um schnell relevante Informationen mithilfe von domänenzentrierten Grafiken zu extrahieren. Die Dissertation liefert das Konzept einer Smart Point Cloud (SPC) Infrastruktur, die als interoperable und modulare Architektur für eine einheitliche Verarbeitung dient. Es ermöglicht eine einfache Integration in bestehende Workflows und eine multidimensionale Spezialisierung durch Gerätewissen, analytisches Wissen oder Domänenwissen. Konzepte, Algorithmen, Code und Materialien werden zur Verfügung gestellt, um Erkenntnisse zu replizieren und aktuelle Anwendungen zu erweitern

    Dagstuhl News January - December 2011

    Get PDF
    "Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic

    Uma rede telemática para a prestação regional de cuidados de saúde

    Get PDF
    Doutoramento em Engenharia InformáticaAs tecnologias de informação e comunicação na área da saúde não são só um instrumento para a boa gestão de informação, mas antes um fator estratégico para uma prestação de cuidados mais eficiente e segura. As tecnologias de informação são um pilar para que os sistemas de saúde evoluam em direção a um modelo centrado no cidadão, no qual um conjunto abrangente de informação do doente deve estar automaticamente disponível para as equipas que lhe prestam cuidados, independentemente de onde foi gerada (local geográfico ou sistema). Este tipo de utilização segura e agregada da informação clínica é posta em causa pela fragmentação generalizada das implementações de sistemas de informação em saúde. Várias aproximações têm sido propostas para colmatar as limitações decorrentes das chamadas “ilhas de informação” na saúde, desde a centralização total (um sistema único), à utilização de redes descentralizadas de troca de mensagens clínicas. Neste trabalho, propomos a utilização de uma camada de unificação baseada em serviços, através da federação de fontes de informação heterogéneas. Este agregador de informação clínica fornece a base necessária para desenvolver aplicações com uma lógica regional, que demostrámos com a implementação de um sistema de registo de saúde eletrónico virtual. Ao contrário dos métodos baseados em mensagens clínicas ponto-a-ponto, populares na integração de sistemas em saúde, desenvolvemos um middleware segundo os padrões de arquitetura J2EE, no qual a informação federada é expressa como um modelo de objetos, acessível através de interfaces de programação. A arquitetura proposta foi instanciada na Rede Telemática de Saúde, uma plataforma instalada na região de Aveiro que liga oito instituições parceiras (dois hospitais e seis centros de saúde), cobrindo ~350.000 cidadãos, utilizada por ~350 profissionais registados e que permite acesso a mais de 19.000.000 de episódios. Para além da plataforma colaborativa regional para a saúde (RTSys), introduzimos uma segunda linha de investigação, procurando fazer a ponte entre as redes para a prestação de cuidados e as redes para a computação científica. Neste segundo cenário, propomos a utilização dos modelos de computação Grid para viabilizar a utilização e integração massiva de informação biomédica. A arquitetura proposta (não implementada) permite o acesso a infraestruturas de e-Ciência existentes para criar repositórios de informação clínica para aplicações em saúde.Modern health information technology is not just a supporting instrument to good information management but a strategic requirement to provide more efficient and safer health care. Health information technology is a cornerstone to build the future patient-centric health care systems in which a comprehensive set of patient data will be available to the relevant care teams, in spite of where (system or service point) it was generated. Such secure and efficient use of clinical data is challenged by the existing fragmentation of health information systems implementation. Several approaches have been proposed to address the limitations of the so called “information silos” in healthcare, ranging from full centralization (a single system) to full-decentralized clinical message exchange networks. In this work we advocate the use of a service-based unification layer, by federating distributed heterogeneous information sources. This clinical information hub provides the basis to build regional-level applications, which we have demonstrated by implementing a virtual Electronic Health Record system. Unlike the message-driven, point-to-point approaches popular in health care systems integration, we developed a middleware layer, using J2EE architectural patterns, in which the common information is represented as an object model, accessible through programming interfaces. The proposed architecture was instantiated in the Rede Telemática da Saúde network, a platform deployed in the region of Aveiro connecting eight partner institutions (two hospitals and six primary care units), covering ~ 350,000 citizens, indexing information on more than 19,000,000 episodes of care and used by ~350 registered professionals. In addition to the regional health information collaborative platform (RTSys), we introduce a second line of research towards bridging the care networks and the science networks. In the later scenario, we propose the use of Grid computing to enable the massive use and integration of biomedical information. The proposed architecture (not implemented) enables to access existing e-Science infrastructures to create clinical information repositories for health applications

    New views on the Drosophila transcriptome

    Get PDF
    Drosophila is a valuable experimental organism can be used as a reverse genetics model. Drosophila Malpighian (renal) tubules are important epithelial tissue in which to study transport mechanisms. RNA-seq has been chosen to investigate Drosophila Malpighian (renal) tubules to identify novel genes following a three- way comparison between three popular transcriptome profiling methods. Two types of novel gene have been found in Drosophila tubules, coding genes and noncoding genes. Reverse genetics has been applied to identify novel coding gene function in Drosophila tubules. Three-way analysis of Drosophila expression microarrays, Drosophila tiling micrarrays and Drosophila RNA-seq reveal that most gene expression levels are well correlated between the three technologies. Drosophila expression microarrays and RNA-seq are correlated better than the correlation between Drosophila tiling microarrays and RNA-seq. Drosophila expression arrays and Drosophila tiling arrays all suffered from cross-hybridization, miss target detection and hybridization background noise, and also have low dynamic range for detecting lowly and highly expressed genes. Drosophila tiling microarrays also have a high false-positive detection rate, which may lead to overestimate the transcriptional activities of the genome. RNA-seq has overcome the drawbacks of microarrays and become the leading technology for genome sequencing, transcriptome profiling, novel gene discovery, and novel alternative splicing discovery with wide dynamic range. However, Drosophila expression microarrays and tiling microarrays still remain useful. Three-prime expression microarrays offer a means to measure the differential three-prime end processing, and tiling microarrays can be used for novel gene discovery. In this sense, the three technologies complement each other. Poly(A) selected RNA-seq has been used as a discovery tool for searching novel genes in Drosophila Malpighian tubules in this thesis. A TopHat and Cufflinks pipeline has been used as an analytical pipeline for novel gene discovery and differential gene expression analysis between Drosophila tubules and whole flies in order to find the tubule-enriched genes. Reverse genetics has been applied to Drosophila to achieve a gene knockdown and overexpression by using the unique Gal4/UAS system to achieve the novel gene knockdown or overexpression in specific tissue and cell types. Novel coding gene CG43968 has been discovered. The location of this gene has been confirmed in tubule main segments, principle cell cytoplasm or apical membrane. The function of this gene has been identified as involvement in tubule secretion, which may relate to calcium transport. Reverse genetics has been confirmed as particularly important for the functional study of novel genes

    Identification of left ventricular mass QTL in the stroke-prone spontaneously hypertensive rat

    Get PDF
    Left ventricular hypertrophy (LVH) is accepted as an important independent predictor of adverse cardiovascular outcome; the aetiology includes a number of well-recognized causes but there is considerable interest in the genetics underlying cardiac hypertrophy. Data from several twin studies indicates that left ventricular mass index (LVMI) has a significant genetic basis that is most likely polygenic. Given the heterogeneity of the human condition, there has been little progress made towards identification of the genes involved, in this now common disease state. As an adjuvant to current human studies, inbred animal models have been developed which in turn have led to the identification of quantitative trait loci (QTL), via investigation using a genome wide strategy. This generally involves high fidelity phenotyping of large segregating F2 populations, derived by crossing inbred strains of sufficiently differing phenotype and subsequent genotyping using a wide selection of polymorphic microsatellite markers spread across the entire rat genome. The research described in this thesis incorporated an improved analysis of a previous genome wide scan, to confirm and identify QTL containing determinants of left ventricular hypertrophy in the Glasgow SHRSP x WKY F2 cross. This genome wide scan was carried out in 134 F2 hybrids (male: female = 65:69). Systolic and diastolic blood pressure was measured by radio-telemetry at baseline and after a 3 week 1% salt challenge in addition to heart rate, motor activity and pulse pressure. Other phenotype data included body weight, heart and LV weight and plasma renin activity. QTL affecting a given phenotype were mapped relative to an improved genetic linkage map for rat chromosome 14, with the aid of JoinMap 3.0, MapManager QTXb and Windows QTL Cartographer software. The original method of single marker analysis was used initially to test previous and newly acquired genotype data and confirm the cited LVMI QTL on rat chromosome 14. More stringent and complex statistical approaches were integrated in analysis resulting in detection of a second QTL for LVMI at marker D14Got23 and a single QTL for cardiac mass at marker D14Woxl4. The identification of QTL, although a fundamental process, is only the initial step towards the end objective of gene identification. The next logical step is the physical capture and confirmation of QTL with the production of congenic strains and substrains. In this thesis, the process of verifying the chromosome 14 QTL began with the generation of congenic strains, using a marker assisted 'speed' congenic strategy, previously validated in rats by our group. This was achieved by backcross breeding two inbred rat strains (SHRSP X WKY) and introgression of marker delineated regions of chromosome 14, from one background into the recipient genome and vice versa. Complete homozygosity of the background genetic markers (n=168) was achieved after 4 backcross generations. In the time line allowed, it was not possible to achieve a fixed congenic line however based on data provided from QTL analysis it was possible to generate and analyse preliminary phenotype data from backcross 4 males on the SHRSP background. The initial readings from this pilot study provide physical evidence that substituting a portion of WKY chromosome 14 with SHRSP results in a reduced LVMI, despite equivalent systolic blood pressure. (Abstract shortened by ProQuest.)

    Anales del XIII Congreso Argentino de Ciencias de la Computación (CACIC)

    Get PDF
    Contenido: Arquitecturas de computadoras Sistemas embebidos Arquitecturas orientadas a servicios (SOA) Redes de comunicaciones Redes heterogéneas Redes de Avanzada Redes inalámbricas Redes móviles Redes activas Administración y monitoreo de redes y servicios Calidad de Servicio (QoS, SLAs) Seguridad informática y autenticación, privacidad Infraestructura para firma digital y certificados digitales Análisis y detección de vulnerabilidades Sistemas operativos Sistemas P2P Middleware Infraestructura para grid Servicios de integración (Web Services o .Net)Red de Universidades con Carreras en Informática (RedUNCI

    Anales del XIII Congreso Argentino de Ciencias de la Computación (CACIC)

    Get PDF
    Contenido: Arquitecturas de computadoras Sistemas embebidos Arquitecturas orientadas a servicios (SOA) Redes de comunicaciones Redes heterogéneas Redes de Avanzada Redes inalámbricas Redes móviles Redes activas Administración y monitoreo de redes y servicios Calidad de Servicio (QoS, SLAs) Seguridad informática y autenticación, privacidad Infraestructura para firma digital y certificados digitales Análisis y detección de vulnerabilidades Sistemas operativos Sistemas P2P Middleware Infraestructura para grid Servicios de integración (Web Services o .Net)Red de Universidades con Carreras en Informática (RedUNCI

    Expanding perspective on open science: communities, cultures and diversity in concepts and practices

    Get PDF
    Twenty-one years ago, the term ‘electronic publishing’ promised all manner of potential that the Web and network technologies could bring to scholarly communication, scientific research and technical innovation. Over the last two decades, tremendous developments have indeed taken place across all of these domains. One of the most important of these has been Open Science; perhaps the most widely discussed topic in research communications today. This book presents the proceedings of Elpub 2017, the 21st edition of the International Conference on Electronic Publishing, held in Limassol, Cyprus, in June 2017. Continuing the tradition of bringing together academics, publishers, lecturers, librarians, developers, entrepreneurs, users and all other stakeholders interested in the issues surrounding electronic publishing, this edition of the conference focuses on Open Science, and the 27 research and practitioner papers and 1 poster included here reflect the results and ideas of researchers and practitioners with diverse backgrounds from all around the world with regard to this important subject. Intended to generate discussion and debate on the potential and limitations of openness, the book addresses the current challenges and opportunities in the ecosystem of Open Science, and explores how to move forward in developing an inclusive system that will work for a much broader range of participants. It will be of interest to all those concerned with electronic publishing, and Open Science in particular

    Data Rescue : defining a comprehensive workflow that includes the roles and responsibilities of the research library.

    Get PDF
    Thesis (PhD (Research))--University of Pretoria, 2023.This study, comprising a case study at a selected South African research institute, focused on the creation of a workflow model for data rescue indicating the roles and responsibilities of the research library. Additional outcomes of the study include a series of recommendations addressing the troublesome findings that revealed data at risk to be a prevalent reality at the selected institute, showing the presence of a multitude of factors putting data at risk, disclosing the profusion of data rescue obstacles faced by researchers, and uncovering that data rescue at the institute is rarely implemented. The study consists of four main parts: (i) a literature review, (ii) content analysis of literature resulting in the creation of a data rescue workflow model, (iii) empirical data collection methods , and (iv) the adaptation and revision of the initial data rescue model to present a recommended version of the model. A literature review was conducted and addressed data at risk and data rescue terminology, factors putting data at risk, the nature, diversity and prevalence of data rescue projects, and the rationale for data rescue. The second part of the study entailed the application of content analysis to selected documented data rescue workflows, guidelines and models. Findings of the analysis led to the identification of crucial components of data rescue and brought about the creation of an initial Data Rescue Workflow Model. As a first draft of the model, it was crucial that the model be reviewed by institutional research experts during the next main stage of the study. The section containing the study methodology culminates in the implementation of four different empirical data collection methods. Data collected via a web-based questionnaire distributed to a sample of research group leaders (RGLs), one-on-one virtual interviews with a sample of the aforementioned RGLs, feedback supplied by RGLs after reviewing the initial Data Rescue Workflow Model, and a focus group session held with institutional research library experts resulted in findings producing insight into the institute’s data at risk and the state of data rescue. Feedback supplied by RGLs after examining the initial Data Rescue Workflow Model produced a list of concerns linked to the model and contained suggestions for changes to the model. RGL feedback was at times unrelated to the model or to data and necessitated the implementation of a mini focus group session involving institutional research library experts. The mini focus group session comprised discussions around requirements for a data rescue workflow model. The consolidation of RGL feedback and feedback supplied by research library experts enabled the creation of a recommended Data Rescue Workflow Model, with the model also indicating the various roles and responsibilities of the research library. The contribution of this research lies primarily in the increase in theoretical knowledge regarding data at risk and data rescue, and culminates in the presentation of a recommended Data Rescue Workflow Model. The model not only portrays crucial data rescue activities and outputs, but also indicates the roles and responsibilities of a sector that can enhance and influence the prevalence and execution of data rescue projects. In addition, participation in data rescue and an understanding of the activities and steps portrayed via the model can contribute towards an increase in the skills base of the library and information services sector and enhance collaboration projects with relevant research sectors. It is also anticipated that the study recommendations and exposure to the model may influence the viewing and handling of data by researchers and accompanying research procedures.Information SciencePhD (Research)Unrestricte
    corecore