4,063 research outputs found
Methods for Investigation of Dependencies between Attributes in Databases
This paper surveys research in the field of data mining, which
is related to discovering the dependencies between attributes in databases.
We consider a number of approaches to finding the distribution intervals of
association rules, to discovering branching dependencies between a given set
of attributes and a given attribute in a database relation, to finding fractional
dependencies between a given set of attributes and a given attribute in a
database relation, and to collaborative filtering
Multi-Target Prediction: A Unifying View on Problems and Methods
Multi-target prediction (MTP) is concerned with the simultaneous prediction
of multiple target variables of diverse type. Due to its enormous application
potential, it has developed into an active and rapidly expanding research field
that combines several subfields of machine learning, including multivariate
regression, multi-label classification, multi-task learning, dyadic prediction,
zero-shot learning, network inference, and matrix completion. In this paper, we
present a unifying view on MTP problems and methods. First, we formally discuss
commonalities and differences between existing MTP problems. To this end, we
introduce a general framework that covers the above subfields as special cases.
As a second contribution, we provide a structured overview of MTP methods. This
is accomplished by identifying a number of key properties, which distinguish
such methods and determine their suitability for different types of problems.
Finally, we also discuss a few challenges for future research
Learning sparse representations of depth
This paper introduces a new method for learning and inferring sparse
representations of depth (disparity) maps. The proposed algorithm relaxes the
usual assumption of the stationary noise model in sparse coding. This enables
learning from data corrupted with spatially varying noise or uncertainty,
typically obtained by laser range scanners or structured light depth cameras.
Sparse representations are learned from the Middlebury database disparity maps
and then exploited in a two-layer graphical model for inferring depth from
stereo, by including a sparsity prior on the learned features. Since they
capture higher-order dependencies in the depth structure, these priors can
complement smoothness priors commonly used in depth inference based on Markov
Random Field (MRF) models. Inference on the proposed graph is achieved using an
alternating iterative optimization technique, where the first layer is solved
using an existing MRF-based stereo matching algorithm, then held fixed as the
second layer is solved using the proposed non-stationary sparse coding
algorithm. This leads to a general method for improving solutions of state of
the art MRF-based depth estimation algorithms. Our experimental results first
show that depth inference using learned representations leads to state of the
art denoising of depth maps obtained from laser range scanners and a time of
flight camera. Furthermore, we show that adding sparse priors improves the
results of two depth estimation methods: the classical graph cut algorithm by
Boykov et al. and the more recent algorithm of Woodford et al.Comment: 12 page
Knowledge Engineering Architecture for the Analysis of Organizational Log Data: a software tool for log data analysis
Organisation log data are generate by software and can provide help to maintenance team addressing issues reported by the client. Once an application is in production, there are defects and other issues that need to be handled. This is also cause by customisation and maintenance of the software. That can compromise software integrity and functionality. This happening in production environment which the maintenance team donât have access becomes a difficult to resolve. The issue must be handling in development environment which causes a condition to understand the problem in order to be able to fix it. To help with this, using log data from production to trace actions that occur of the issue. The log data doesnât contain any of private data; it only contains actions events as a result of software usage. The main objective of this thesis work is to build a framework for an automatic log analyser to assist maintenance team addressing software issues. This framework also provides a knowledge management system allowing registering tacit experience into explicit knowledge. A prototype was developed to produce metrics and make a proof of this framework. This was done on a real environment and is related to a software migration project which means transferring data between databases that holds company business
Integration in the European Research Area by means of the European Framework Programmes. Findings from Eigenvector filtered spatial interaction models
One of the main goals of the European Research Area (ERA) concept is to improve coherence and integration across the European research landscape by removing barriers for collaborative knowledge production in a European system of innovation. The cornerstone of policy instruments in this context is the European Framework Programme (FP) that supports pre-competitive collaborative R&D projects, creating a pan-European network of actors performing joint R&D. However, we know only little about the contribution of the FPs to the realisation of ERA. The objective of this study is to monitor progress towards ERA by identifying the evolution of separation effects, such as spatial, institutional, cultural or technological barriers, which influence cross-region R&D collaboration intensities between 255 European NUTS-2 regions in the FPs over the time period 1999-2006. By this, the study builds on recent work by Scherngell and Barber (2009) that addresses this question from a static perspective. We employ Poisson spatial interaction models taking into account spatial autocorrelation among residual flows by using Eigenvector spatial filtering methods. The results show that geographical distance and country border effects gradually decrease over time when correcting for spatial autocorrelation among flows. Thus, the study provides evidence for the contribution of the FPs to the realisation of ERA.
DolphinNext: a distributed data processing platform for high throughput genomics
BACKGROUND: The emergence of high throughput technologies that produce vast amounts of genomic data, such as next-generation sequencing (NGS) is transforming biological research. The dramatic increase in the volume of data, the variety and continuous change of data processing tools, algorithms and databases make analysis the main bottleneck for scientific discovery. The processing of high throughput datasets typically involves many different computational programs, each of which performs a specific step in a pipeline. Given the wide range of applications and organizational infrastructures, there is a great need for highly parallel, flexible, portable, and reproducible data processing frameworks. Several platforms currently exist for the design and execution of complex pipelines. Unfortunately, current platforms lack the necessary combination of parallelism, portability, flexibility and/or reproducibility that are required by the current research environment. To address these shortcomings, workflow frameworks that provide a platform to develop and share portable pipelines have recently arisen. We complement these new platforms by providing a graphical user interface to create, maintain, and execute complex pipelines. Such a platform will simplify robust and reproducible workflow creation for non-technical users as well as provide a robust platform to maintain pipelines for large organizations.
RESULTS: To simplify development, maintenance, and execution of complex pipelines we created DolphinNext. DolphinNext facilitates building and deployment of complex pipelines using a modular approach implemented in a graphical interface that relies on the powerful Nextflow workflow framework by providing 1. A drag and drop user interface that visualizes pipelines and allows users to create pipelines without familiarity in underlying programming languages. 2. Modules to execute and monitor pipelines in distributed computing environments such as high-performance clusters and/or cloud 3. Reproducible pipelines with version tracking and stand-alone versions that can be run independently. 4. Modular process design with process revisioning support to increase reusability and pipeline development efficiency. 5. Pipeline sharing with GitHub and automated testing 6. Extensive reports with R-markdown and shiny support for interactive data visualization and analysis.
CONCLUSION: DolphinNext is a flexible, intuitive, web-based data processing and analysis platform that enables creating, deploying, sharing, and executing complex Nextflow pipelines with extensive revisioning and interactive reporting to enhance reproducible results
- âŠ