772 research outputs found

    Apoio à avaliação da qualidade de dados em eScience : uma abordagem baseada em proveniência

    Get PDF
    Orientador: Claudia Maria Bauzer MedeirosTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Qualidade dos dados é um problema recorrente em todos os domínios da ciência. Os experimentos analisam e manipulam uma grande quantidade de conjuntos de dados gerando novos dados para serem (re) utilizados por outros experimentos. A base para a obtenção de bons resultados científicos está fortemente associada ao grau de qualidade de tais da- dos. No entanto, os dados utilizados nos experimentos são manipulados por uma diversa variedade de usuários, os quais visam interesses diferentes de pesquisa, utilizando seus próprios vocabulários, metodologias de trabalho, modelos, e necessidades de amostragem. Considerando este cenário, um desafio em ciência da computação é oferecer soluções que auxiliem aos cientistas na avaliação da qualidade dos seus dados. Diferentes esforços têm sido propostos abordando a avaliação de qualidade. Alguns trabalhos salientam que os atributos de proveniência dos dados poderiam ser utilizados para avaliar qualidade. No entanto, a maioria destas iniciativas aborda a avaliação de um atributo de qualidade específico, frequentemente focando em valores atômicos de dados. Isto reduz a aplicabilidade destas abordagens. Apesar destes esforços, há uma necessidade de novas soluções que os cientistas possam adotar para avaliar o quão bons seus dados são. Nesta pesquisa de doutorado, apresentamos uma abordagem para lidar com este problema, a qual explora a noção de proveniência de dados. Ao contrário de outras abordagens, nossa proposta combina os atributos de qualidade especificados dentro de um contexto pelos especialistas e os metadados que descrevem a proveniência de um conjunto de dados. As principais contribuições deste trabalho são: (i) a especificação de um framework que aproveita a proveniência dos dados para obter informação de qualidade, (ii) uma metodologia associada a este framework que descreve os procedimentos para apoiar a avaliação da qualidade, (iii) a proposta de dois modelos diferentes de proveniência que possibilitem a captura das informações de proveniência, para cenários fixos e extensíveis, e (iv) a validação dos itens (i) a (iii), com suas discussões via estudos de caso em agricultura e biodiversidadeAbstract: Data quality is a recurrent concern in all scientific domains. Experiments analyze and manipulate several kinds of datasets, and generate data to be (re)used by other experiments. The basis for obtaining good scientific results is highly associated with the degree of quality of such datasets. However, data involved with the experiments are manipulated by a wide range of users, with distinct research interests, using their own vocabularies, work methodologies, models, and sampling needs. Given this scenario, a challenge in computer science is to come up with solutions that help scientists to assess the quality of their data. Different efforts have been proposed addressing the estimation of quality. Some of these efforts outline that data provenance attributes should be used to evaluate quality. However, most of these initiatives address the evaluation of a specific quality attribute, frequently focusing on atomic data values, thereby reducing the applicability of these approaches. Taking this scenario into account, there is a need for new solutions that scientists can adopt to assess how good their data are. In this PhD research, we present an approach to attack this problem based on the notion of data provenance. Unlike other similar approaches, our proposal combines quality attributes specified within a context by specialists and metadata on the provenance of a data set. The main contributions of this work are: (i) the specification of a framework that takes advantage of data provenance to derive quality information; (ii) a methodology associated with this framework that outlines the procedures to support the assessment of quality; (iii) the proposal of two different provenance models to capture provenance information, for fixed and extensible scenarios; and (iv) validation of items (i) through (iii), with their discussion via case studies in agriculture and biodiversityDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã

    Scale aware modeling and monitoring of the urban energy chain

    Get PDF
    With energy modeling at different complexity levels for smart cities and the concurrent data availability revolution from connected devices, a steady surge in demand for spatial knowledge has been observed in the energy sector. This transformation occurs in population centers focused on efficient energy use and quality of life. Energy-related services play an essential role in this mix, as they facilitate or interact with all other city services. This trend is primarily driven by the current age of the Ger.: Energiewende or energy transition, a worldwide push towards renewable energy sources, increased energy use efficiency, and local energy production that requires precise estimates of local energy demand and production. This shift in the energy market occurs as the world becomes aware of human-induced climate change, to which the building stock has a significant contribution (40% in the European Union). At the current rate of refurbishment and building replacement, of the buildings existing in 2050 in the European Union, 75% would not be classified as energy-efficient. That means that substantial structural change in the built environment and the energy chain is required to achieve EU-wide goals concerning environmental and energy policy. These objectives provide strong motivation for this thesis work and are generally made possible by energy monitoring and modeling activities that estimate the urban energy needs and quantify the impact of refurbishment measures. To this end, a modeling library called aEneAs was developed in the scope of this thesis that can perform city-wide building energy modeling. The library performs its tasks at the level of a single building and was a first in its field, using standardized spatial energy data structures that allow for portability from one city to another. For data input, extensive use was made of digital twins provided from CAD, BIM, GIS, architectural models, and a plethora of energy data sources. The library first quantifies primary thermal energy demand and then the impact of refurbishment measures. Lastly, it estimates the potential of renewable energy production from solar radiation. aEneAs also includes network modeling components that consider energy distribution in the given context, showing a path toward data modeling and simulation required for distributed energy production at the neighborhood and district level. In order to validate modeling activities in solar radiation and green façade and roof installations, six spatial models were coupled with sensor installations. These digital twins are included in three experiments that highlight this monitoring side of the energy chain and portray energy-related use cases that utilize the spatially enabled web services SOS-SES-WNS, SensorThingsAPI, and FIWARE. To this author\u27s knowledge, this is the first work that surveys the capabilities of these three solutions in a unifying context, each having its specific design mindset. The modeling and monitoring activity and their corresponding literature review indicated gaps in scientific knowledge concerning data science in urban energy modeling. First, a lack of standardization regarding the spatial scales at which data is stored and used in urban energy modeling was observed. In order to identify the appropriate spatial levels for modeling and data aggregation, scale is explored in-depth in the given context and defined as a byproduct of resolution and extent, with ranges provided for both parameters. To that end, a survey of the encountered spatial scales and actors in six different geographical and cultural settings was performed. The information from this survey was used to put forth a standardized spatial scales definition and create a scale-dependent ontology for use in urban energy modeling. The ontology also provides spatially enabled persistent identifiers that resolve issues encountered with object relationships in modeling for inheritance, dependency, and association. The same survey also reveals two significant issues with data in urban energy modeling. These are data consistency across spatial scales and urban fabric contiguity. The impact of these issues and different solutions such as data generalization are explored in the thesis. Further advancement of scientific knowledge is provided specifically with spatial standards and spatial data infrastructure in urban energy modeling. A review of use cases in the urban energy chain and a taxonomy of the standards were carried out. These provide fundamental input for another piece of this thesis: inclusive software architecture methods that promote data integration and allow for external connectivity to modern and legacy systems. In order to reduce time-costly extraction, transformation, and load processes, databases and web services to ferry data to and from separate data sources were used. As a result, the spatial models become central linking elements of the different types of energy-related data in a novel perspective that differs from the traditional one, where spatial data tends to be non-interoperable / not linked with other data types. These distinct data fusion approaches provide flexibility in an energy chain environment with inconsistent data structures and software. Furthermore, the knowledge gathered from the experiments presented in this thesis is provided as a synopsis of good practices

    Design of an E-learning system using semantic information and cloud computing technologies

    Get PDF
    Humanity is currently suffering from many difficult problems that threaten the life and survival of the human race. It is very easy for all mankind to be affected, directly or indirectly, by these problems. Education is a key solution for most of them. In our thesis we tried to make use of current technologies to enhance and ease the learning process. We have designed an e-learning system based on semantic information and cloud computing, in addition to many other technologies that contribute to improving the educational process and raising the level of students. The design was built after much research on useful technology, its types, and examples of actual systems that were previously discussed by other researchers. In addition to the proposed design, an algorithm was implemented to identify topics found in large textual educational resources. It was tested and proved to be efficient against other methods. The algorithm has the ability of extracting the main topics from textual learning resources, linking related resources and generating interactive dynamic knowledge graphs. This algorithm accurately and efficiently accomplishes those tasks even for bigger books. We used Wikipedia Miner, TextRank, and Gensim within our algorithm. Our algorithm‘s accuracy was evaluated against Gensim, largely improving its accuracy. Augmenting the system design with the implemented algorithm will produce many useful services for improving the learning process such as: identifying main topics of big textual learning resources automatically and connecting them to other well defined concepts from Wikipedia, enriching current learning resources with semantic information from external sources, providing student with browsable dynamic interactive knowledge graphs, and making use of learning groups to encourage students to share their learning experiences and feedback with other learners.Programa de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Luis Sánchez Fernández.- Secretario: Luis de la Fuente Valentín.- Vocal: Norberto Fernández Garcí

    Music Encoding Conference Proceedings 2021, 19–22 July, 2021 University of Alicante (Spain): Onsite & Online

    Get PDF
    Este documento incluye los artículos y pósters presentados en el Music Encoding Conference 2021 realizado en Alicante entre el 19 y el 22 de julio de 2022.Funded by project Multiscore, MCIN/AEI/10.13039/50110001103

    Data Science and Knowledge Discovery

    Get PDF
    Data Science (DS) is gaining significant importance in the decision process due to a mix of various areas, including Computer Science, Machine Learning, Math and Statistics, domain/business knowledge, software development, and traditional research. In the business field, DS's application allows using scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data to support the decision process. After collecting the data, it is crucial to discover the knowledge. In this step, Knowledge Discovery (KD) tasks are used to create knowledge from structured and unstructured sources (e.g., text, data, and images). The output needs to be in a readable and interpretable format. It must represent knowledge in a manner that facilitates inferencing. KD is applied in several areas, such as education, health, accounting, energy, and public administration. This book includes fourteen excellent articles which discuss this trending topic and present innovative solutions to show the importance of Data Science and Knowledge Discovery to researchers, managers, industry, society, and other communities. The chapters address several topics like Data mining, Deep Learning, Data Visualization and Analytics, Semantic data, Geospatial and Spatio-Temporal Data, Data Augmentation and Text Mining

    ICSEA 2022: the seventeenth international conference on software engineering advances

    Get PDF
    The Seventeenth International Conference on Software Engineering Advances (ICSEA 2022), held between October 16th and October 20th, 2022, continued a series of events covering a broad spectrum of software-related topics. The conference covered fundamentals on designing, implementing, testing, validating and maintaining various kinds of software. Several tracks were proposed to treat the topics from theory to practice, in terms of methodologies, design, implementation, testing, use cases, tools, and lessons learned. The conference topics covered classical and advanced methodologies, open source, agile software, as well as software deployment and software economics and education. Other advanced aspects are related to on-time practical aspects, such as run-time vulnerability checking, rejuvenation process, updates partial or temporary feature deprecation, software deployment and configuration, and on-line software updates. These aspects trigger implications related to patenting, licensing, engineering education, new ways for software adoption and improvement, and ultimately, to software knowledge management. There are many advanced applications requiring robust, safe, and secure software: disaster recovery applications, vehicular systems, biomedical-related software, biometrics related software, mission critical software, E-health related software, crisis-situation software. These applications require appropriate software engineering techniques, metrics and formalisms, such as, software reuse, appropriate software quality metrics, composition and integration, consistency checking, model checking, provers and reasoning. The nature of research in software varies slightly with the specific discipline researchers work in, yet there is much common ground and room for a sharing of best practice, frameworks, tools, languages and methodologies. Despite the number of experts we have available, little work is done at the meta level, that is examining how we go about our research, and how this process can be improved. There are questions related to the choice of programming language, IDEs and documentation styles and standard. Reuse can be of great benefit to research projects yet reuse of prior research projects introduces special problems that need to be mitigated. The research environment is a mix of creativity and systematic approach which leads to a creative tension that needs to be managed or at least monitored. Much of the coding in any university is undertaken by research students or young researchers. Issues of skills training, development and quality control can have significant effects on an entire department. In an industrial research setting, the environment is not quite that of industry as a whole, nor does it follow the pattern set by the university. The unique approaches and issues of industrial research may hold lessons for researchers in other domains. We take here the opportunity to warmly thank all the members of the ICSEA 2022 technical program committee, as well as all the reviewers. The creation of such a high-quality conference program would not have been possible without their involvement. We also kindly thank all the authors who dedicated much of their time and effort to contribute to ICSEA 2022. We truly believe that, thanks to all these efforts, the final conference program consisted of top-quality contributions. We also thank the members of the ICSEA 2022 organizing committee for their help in handling the logistics of this event. We hope that ICSEA 2022 was a successful international forum for the exchange of ideas and results between academia and industry and for the promotion of progress in software engineering advances

    Semantic discovery and reuse of business process patterns

    Get PDF
    Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modelling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. As a statement of research-in-progress this paper focuses on business process patterns and proposes an initial methodological framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework borrows ideas from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse
    • …
    corecore