4,063 research outputs found

    Methods for Investigation of Dependencies between Attributes in Databases

    Get PDF
    This paper surveys research in the field of data mining, which is related to discovering the dependencies between attributes in databases. We consider a number of approaches to finding the distribution intervals of association rules, to discovering branching dependencies between a given set of attributes and a given attribute in a database relation, to finding fractional dependencies between a given set of attributes and a given attribute in a database relation, and to collaborative filtering

    Multi-Target Prediction: A Unifying View on Problems and Methods

    Full text link
    Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

    Learning sparse representations of depth

    Full text link
    This paper introduces a new method for learning and inferring sparse representations of depth (disparity) maps. The proposed algorithm relaxes the usual assumption of the stationary noise model in sparse coding. This enables learning from data corrupted with spatially varying noise or uncertainty, typically obtained by laser range scanners or structured light depth cameras. Sparse representations are learned from the Middlebury database disparity maps and then exploited in a two-layer graphical model for inferring depth from stereo, by including a sparsity prior on the learned features. Since they capture higher-order dependencies in the depth structure, these priors can complement smoothness priors commonly used in depth inference based on Markov Random Field (MRF) models. Inference on the proposed graph is achieved using an alternating iterative optimization technique, where the first layer is solved using an existing MRF-based stereo matching algorithm, then held fixed as the second layer is solved using the proposed non-stationary sparse coding algorithm. This leads to a general method for improving solutions of state of the art MRF-based depth estimation algorithms. Our experimental results first show that depth inference using learned representations leads to state of the art denoising of depth maps obtained from laser range scanners and a time of flight camera. Furthermore, we show that adding sparse priors improves the results of two depth estimation methods: the classical graph cut algorithm by Boykov et al. and the more recent algorithm of Woodford et al.Comment: 12 page

    Knowledge Engineering Architecture for the Analysis of Organizational Log Data: a software tool for log data analysis

    Get PDF
    Organisation log data are generate by software and can provide help to maintenance team addressing issues reported by the client. Once an application is in production, there are defects and other issues that need to be handled. This is also cause by customisation and maintenance of the software. That can compromise software integrity and functionality. This happening in production environment which the maintenance team don’t have access becomes a difficult to resolve. The issue must be handling in development environment which causes a condition to understand the problem in order to be able to fix it. To help with this, using log data from production to trace actions that occur of the issue. The log data doesn’t contain any of private data; it only contains actions events as a result of software usage. The main objective of this thesis work is to build a framework for an automatic log analyser to assist maintenance team addressing software issues. This framework also provides a knowledge management system allowing registering tacit experience into explicit knowledge. A prototype was developed to produce metrics and make a proof of this framework. This was done on a real environment and is related to a software migration project which means transferring data between databases that holds company business

    Integration in the European Research Area by means of the European Framework Programmes. Findings from Eigenvector filtered spatial interaction models

    Get PDF
    One of the main goals of the European Research Area (ERA) concept is to improve coherence and integration across the European research landscape by removing barriers for collaborative knowledge production in a European system of innovation. The cornerstone of policy instruments in this context is the European Framework Programme (FP) that supports pre-competitive collaborative R&D projects, creating a pan-European network of actors performing joint R&D. However, we know only little about the contribution of the FPs to the realisation of ERA. The objective of this study is to monitor progress towards ERA by identifying the evolution of separation effects, such as spatial, institutional, cultural or technological barriers, which influence cross-region R&D collaboration intensities between 255 European NUTS-2 regions in the FPs over the time period 1999-2006. By this, the study builds on recent work by Scherngell and Barber (2009) that addresses this question from a static perspective. We employ Poisson spatial interaction models taking into account spatial autocorrelation among residual flows by using Eigenvector spatial filtering methods. The results show that geographical distance and country border effects gradually decrease over time when correcting for spatial autocorrelation among flows. Thus, the study provides evidence for the contribution of the FPs to the realisation of ERA.

    DolphinNext: a distributed data processing platform for high throughput genomics

    Get PDF
    BACKGROUND: The emergence of high throughput technologies that produce vast amounts of genomic data, such as next-generation sequencing (NGS) is transforming biological research. The dramatic increase in the volume of data, the variety and continuous change of data processing tools, algorithms and databases make analysis the main bottleneck for scientific discovery. The processing of high throughput datasets typically involves many different computational programs, each of which performs a specific step in a pipeline. Given the wide range of applications and organizational infrastructures, there is a great need for highly parallel, flexible, portable, and reproducible data processing frameworks. Several platforms currently exist for the design and execution of complex pipelines. Unfortunately, current platforms lack the necessary combination of parallelism, portability, flexibility and/or reproducibility that are required by the current research environment. To address these shortcomings, workflow frameworks that provide a platform to develop and share portable pipelines have recently arisen. We complement these new platforms by providing a graphical user interface to create, maintain, and execute complex pipelines. Such a platform will simplify robust and reproducible workflow creation for non-technical users as well as provide a robust platform to maintain pipelines for large organizations. RESULTS: To simplify development, maintenance, and execution of complex pipelines we created DolphinNext. DolphinNext facilitates building and deployment of complex pipelines using a modular approach implemented in a graphical interface that relies on the powerful Nextflow workflow framework by providing 1. A drag and drop user interface that visualizes pipelines and allows users to create pipelines without familiarity in underlying programming languages. 2. Modules to execute and monitor pipelines in distributed computing environments such as high-performance clusters and/or cloud 3. Reproducible pipelines with version tracking and stand-alone versions that can be run independently. 4. Modular process design with process revisioning support to increase reusability and pipeline development efficiency. 5. Pipeline sharing with GitHub and automated testing 6. Extensive reports with R-markdown and shiny support for interactive data visualization and analysis. CONCLUSION: DolphinNext is a flexible, intuitive, web-based data processing and analysis platform that enables creating, deploying, sharing, and executing complex Nextflow pipelines with extensive revisioning and interactive reporting to enhance reproducible results
    • 

    corecore