772 research outputs found

    A new auditing mechanism for open source NoSQL database a case study on open source MongoDB database

    Get PDF
    MongoDB as a NoSQL database management system is relatively new on the database market and it is used in many important projects and products. Security analysis for MongoDB revealed that it doesn’t provide any facilities for auditing actions performed in the database. Recently, MongoDB company tried to rectify the auditing gap by providing MongoDB new enterprise version 2.6 (8th of April 2014). The auditing system logs operations information including; schema data definition language operations and operations related to replica set in addition to operations of authentication and authorization, and eventually general operations. But unfortunately still cannot record Data Manipulation Language (DML). Thus, this study aims to improve the auditing functionality in MongoDB by presenting a new mechanism for auditing NoSQL MongoDB database to include Data Manipulation Language (DML)/ CRUD (Create, Read, Update and delete) operations

    I2ECR: Integrated and Intelligent Environment for Clinical Research

    Get PDF
    Clinical trials are designed to produce new knowledge about a certain disease, drug or treatment. During these studies, a huge amount of data is collected about participants, therapies, clinical procedures, outcomes, adverse events and so on. A multicenter, randomized, phase III clinical trial in Hematology enrolls up to hundreds of subjects and evaluates post-treatment outcomes on stratified sub- groups of subjects for a period of many years. Therefore, data collection in clinical trials is becoming complex, with huge amount of clinical and biological variables. Outside the medical field, data warehouses (DWs) are widely employed. A Data Ware-house is a “collection of integrated, subject-oriented databases designed to support the decision-making process”. To verify whether DWs might be useful for data quality and association analysis, a team of biomedical engineers, clinicians, biologists and statisticians developed the “I2ECR” project. I2ECR is an Integrated and Intelligent Environment for Clinical Research where clinical and omics data stand together for clinical use (reporting) and for generation of new clinical knowledge. I2ECR has been built from the “MCL0208” phase III, prospective, clinical trial, sponsored by the Fondazione Italiana Linfomi (FIL); this is actually a translational study, accounting for many clinical data, along with several clinical prognostic indexes (e.g. MIPI - Mantle International Prognostic Index), pathological information, treatment and outcome data, biological assessments of disease (MRD - Minimal Residue Disease), as well as many biological, ancillary studies, such as Mutational Analysis, Gene Expression Profiling (GEP) and Pharmacogenomics. In this trial forty-eight Italian medical centers were actively involved, for a total of 300 enrolled subjects. Therefore, I2ECR main objectives are: • to propose an integration project on clinical and molecular data quality concepts. The application of a clear row-data analysis as well as clinical trial monitoring strategies to implement a digital platform where clinical, biological and “omics” data are imported from different sources and well-integrated in a data- ware-house • to be a dynamic repository of data congruency quality rules. I2ECR allows to monitor, in a semi-automatic manner, the quality of data, in relation to the clinical data imported from eCRFs (electronic Case Report Forms) and from biologic and mutational datasets internally edited by local laboratories. Therefore, I2ECR will be able to detect missing data and mistakes derived from non-conventional data- entry activities by centers. • to provide to clinical stake-holders a platform from where they can easily design statistical and data mining analysis. The term Data Mining (DM) identifies a set of tools to searching for hidden patterns of interest in large and multivariate datasets. The applications of DM techniques in the medical field range from outcome prediction and patient classification to genomic medicine and molecular biology. I2ECR allows to clinical stake-holders to propose innovative methods of supervised and unsupervised feature extraction, data classification and statistical analysis on heterogeneous datasets associated to the MCL0208 clinical trial. Although MCL0208 study is the first example of data-population of I2ECR, the environment will be able to import data from clinical studies designed for other onco-hematologic diseases, too

    Anthropology of/in Circulation: The Future of Open Access and Scholarly Societies

    Get PDF
    In a conversation format, seven anthropologists with extensive expertise in new digital technologies, intellectual property, and journal publishing discuss issues related to open access, the anthropology of information circulation, and the future of scholarly societies. Among the topics discussed are current anthropological research on open source and open access; the effects of open access on traditional anthropological topics; the creation of community archives and new networking tools; potentially transformative uses of field notes and materials in new digital ecologies; the American Anthropological Association’s recent history with these issues, from the development of AnthroSource to its new publishing arrangement with Wiley-Blackwell; and the political economies of knowledge circulation more generally

    Learning Collective Behavior in Multi-relational Networks

    Get PDF
    With the rapid expansion of the Internet and WWW, the problem of analyzing social media data has received an increasing amount of attention in the past decade. The boom in social media platforms offers many possibilities to study human collective behavior and interactions on an unprecedented scale. In the past, much work has been done on the problem of learning from networked data with homogeneous topologies, where instances are explicitly or implicitly inter-connected by a single type of relationship. In contrast to traditional content-only classification methods, relational learning succeeds in improving classification performance by leveraging the correlation of the labels between linked instances. However, networked data extracted from social media, web pages, and bibliographic databases can contain entities of multiple classes and linked by various causal reasons, hence treating all links in a homogeneous way can limit the performance of relational classifiers. Learning the collective behavior and interactions in heterogeneous networks becomes much more complex. The contribution of this dissertation include 1) two classification frameworks for identifying human collective behavior in multi-relational social networks; 2) unsupervised and supervised learning models for relationship prediction in multi-relational collaborative networks. Our methods improve the performance of homogeneous predictive models by differentiating heterogeneous relations and capturing the prominent interaction patterns underlying the network structure. The work has been evaluated in various real-world social networks. We believe that this study will be useful for analyzing human collective behavior and interactions specifically in the scenario when the heterogeneous relationships in the network arise from various causal reasons

    Quantifying, Characterizing, and Leveraging Cross-Disciplinary Dependencies: Empirical Studies from a Video Game Development Setting

    Get PDF
    Continuous Integration (CI) is a common practice adopted by modern software organizations. It plays an especially important role for large corporations like Ubisoft, where thousands of build jobs are submitted daily. The CI process of video games, which are developed by studios like Ubisoft, involves assembling artifacts that are produced by personnel with various types of expertise, such as source code produced by developers, graphics produced by artists, and audio produced by musicians and sound experts. To weave these artifacts into a cohesive system, the build system—a key component in CI—processes each artifacts while respecting their intra- and inter-artifact dependencies. In such projects, a change produced by any team can impact artifacts from other teams, and may cause defects if the transitive impact of changes is not carefully considered. Therefore, to better understand the potential challenges and opportunities presented by multidisciplinary software projects, we conduct an empirical study of a recently launched video game project, which reveals that code files only make up 2.8% of the nodes in the build dependency graph, and code-to-code dependencies only make up 4.3% of all dependencies. We also observe that the impact of 44% of the studied source code changes crosses disciplinary boundaries, highlighting the importance of analyzing inter-artifact dependencies. A comparative analysis of cross-boundary changes with changes that do not cross boundaries indicates that cross-boundary changes are: (1) impacting a median of 120,368 files; (2) with a 51% probability of causing build failures; and (3) a 67% likelihood of introducing defects. All three measurements are larger than changes that do not cross boundaries to statistically significant degrees. We also find that cross-boundary changes are: (4) more commonly associated with gameplay functionality and feature additions that directly impact the game experience than changes that do not cross boundaries, and (5) disproportionately produced by a single team (74% of the contributors of cross-boundary changes are associated with that team). Next, we set out to explore whether analysis of cross-boundary changes can be leveraged to accelerate CI. Indeed, the cadence of development progress is constrained by the pace at which CI services process build jobs. To provide faster CI feedback, recent work explores how build outcomes can be anticipated. Although early results show plenty of promise, prior work on build outcome prediction has largely focused on open-source projects that are code-intensive, while the distinct characteristics of a AAA video game project at Ubisoft presents new challenges and opportunities for build outcome prediction. In the video game setting, changes that do not modify source code also incur build failures. Moreover, we find that the code changes that have an impact that crosses the source-data boundary are more prone to build failures than code changes that do not impact data files. Since such changes are not fully characterized by the existing set of build outcome prediction features, state-of-the-art models tend to underperform. Therefore, to accommodate the data context into build outcome prediction, we propose RavenBuild, a novel approach that leverages context, relevance, and dependency-aware features. We apply the state-of-the-art BuildFast model and RavenBuild to the video game project, and observe that RavenBuild improves the F1-score of the failing class by 46%, the recall of the failing class by 76%, and AUC by 28%. To ease adoption in settings with heterogeneous project sets, we also provide a simplified alternative RavenBuild-CR, which excludes dependency-aware features. We apply RavenBuild-CR on 22 open-source projects and the video game project, and observe across-the-board improvements as well. On the other hand, we find that a naive Parrot approach, which simply echoes the previous build outcome as its prediction, is surprisingly competitive with BuildFast and RavenBuild. Though Parrot fails to predict when the build outcome differs from their immediate predecessor, Parrot serves well as a tendency indicator of the sequences in build outcome datasets. Therefore, future studies should also consider comparing the Parrot approach as a baseline when evaluating build outcome prediction models

    An Extensive Green Bond Market Analysis from 2015 to 2021

    Get PDF
    Climate change is indubitably one of the biggest challenges for humanity in the coming decades. While the interest in solving this problem has been increasing recently, the window to minimize the temperature increase to 1.5°C has also been narrowing. Therefore, the transition to low-carbon economies to reach the Paris Agreement is significantly vital for the future of our home. In this transition, finance is crucial in mobilizing capital toward low-carbon investments. Several innovative products are in the market to make this capital shift possible. Green Bonds are one of the recent products — they are very similar to conventional (vanilla) bonds but differ by applying an environmental label; this green label restricts the use of proceeds to green projects and assets exclusively. The green bond market and research interest in the topic have expanded over the years. Researchers are asking if this market expansion is an appropriate way to mitigate the adverse effects of environmental pollution. There are different opinions about the expansion’s effect on mitigation efforts. However, the details of the expansion of the Green Bond Market are mostly uncovered. The critical contribution of this research is to explore the details of the expansion of the market between 2015 and 2021. This thesis integrates the extensive literature review with data analysis and concludes with further questions and comments. This research utilizes the database of the Climate Bonds Initiative (CBI) to examine the expansion of the green bond market. Specifically, the study employs a quantitative approach through descriptive analysis and statistical tests to analyze 8111 self-labelled qualified green bonds and similar debt instruments from 2015 to 2021. By examining the data by region, country, issuer type, external reviewer, date, the issued amount in USD, currency, and use of proceeds, the study aims to provide answers to the overall expansion of the green bond market, market comparison between regions and countries, types of green bond market participants, and market share of opinion providers. Furthermore, the study utilizes statistical tests to provide insights into the use of proceeds as well as a regional analysis of green bonds. It was found that the growth of the green debt market did not result in advantages for many countries. Rather, a small number of countries, mainly developed ones, were the primary beneficiaries of the raised capital. This phenomenon, which we termed "concentration," was observed. This concentration creates a lack of diversity, and instead, the market is dependent on several key players. For instance, in the US, which is the largest green bond issuer, almost half of the country's total amount was issued by a single entity, while just four second-party opinion providers held 93% of the opinion market. Similarly, in China, only one issuer type was responsible for half of the total amount issued. Overall, the top ten countries in the world accounted for 73.4% of the total capital, further highlighting the market's concentration. Also, multilateral and national development banks failed to play an intermediary role in the green bond market in less developed regions. The findings of this study may be significant in encouraging key stakeholders to explore means of enhancing the benefits that underdeveloped and developing countries receive from the green bond market. In addition to the findings, the comprehensive database presented in this research serves as a crucial resource for further research into the green bond market's structure and dynamics. This database, characterized by its novelty and detailed market expansion structure, is an important tool for both researchers and policymakers aiming to assess the role of green bonds and policy in fostering sustainable development and climate change mitigation. Moreover, the database lays a solid foundation for examining the relationship between green bond issuances and the actual reduction of greenhouse gas emissions for further studies, helping to address the critical question of whether the green bond market is genuinely "green.

    Semantic discovery and reuse of business process patterns

    Get PDF
    Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modelling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. As a statement of research-in-progress this paper focuses on business process patterns and proposes an initial methodological framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework borrows ideas from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse
    • …
    corecore