Search CORE

44,265 research outputs found

Infinite Probabilistic Databases

Author: Grohe Martin
Lindner Peter
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Conference on Database Theory (ICDT 2020)
Publication date: 01/01/2020
Field of study

Probabilistic databases (PDBs) are used to model uncertainty in data in a quantitative way. In the standard formal framework, PDBs are finite probability spaces over relational database instances. It has been argued convincingly that this is not compatible with an open-world semantics (Ceylan et al., KR 2016) and with application scenarios that are modeled by continuous probability distributions (Dalvi et al., CACM 2009). We recently introduced a model of PDBs as infinite probability spaces that addresses these issues (Grohe and Lindner, PODS 2019). While that work was mainly concerned with countably infinite probability spaces, our focus here is on uncountable spaces. Such an extension is necessary to model typical continuous probability distributions that appear in many applications. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics. It turns out that so-called finite point processes are the appropriate model from probability theory for dealing with probabilistic databases. This model allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and Datalog queries

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Finding Top-k Dominance on Incomplete Big Data Using Map-Reduce Framework

Author: Ezatpoor Payam
Publication venue: Digital Scholarship@UNLV
Publication date: 01/05/2017
Field of study

Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes huge. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms are present to enhance this process but are mostly efficient only when dealing with a small-size incomplete data. One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG) algorithm. This algorithm strongly improves the performance for incomplete data, but it is not originally capable of finding top-k dominant values in incomplete big data, nor is it designed to do so. Several other algorithms have been proposed to find the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also questionable. Algorithms developed previously were among the first attempts to apply TKD query on incomplete data; however, all these had weak performances or were not compatible with the incomplete data. This thesis proposes MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance queries on huge incomplete datasets. The proposed approach uses the MapReduce parallel computing approach using multiple computing nodes. The framework separates the tasks between several computing nodes that independently and simultaneously work to find the result. This method has achieved up to two times faster processing time in finding the TKD query result in comparison to previously presented algorithms

University of Nevada, Las Vegas Repository

Warranty Data Analysis: A Review

Author: Ahn
Alam
Attardi
Baik
Blischke
Blischke
Brennan
Buddhakulsomsiri
Buddhakulsomsiri
Chen
Chukova
Davis
Djamaludin
Duchesne
Elkins
Escobar
Fredette
Gertsbakh
Grabert
Honari
Hrycej
Hu
Hu
Hu
Hu
Ion
Iskandar
Jung
Kalbfleisch
Kalbfleisch
Kalbfleisch
Kalbfleisch
Kaminskiy
Karim
Karim
Karim
Karim
Kijima
Kleyner
Kleyner
Kleyner
Krivtsov
Lawless
Lawless
Lawless
Lawless
Lawless
Lawless
Majeske
Majeske
Majeske
Marcorin
Marshall
Meeker
Moskowitz
Murthy
Murthy
Murthy
Murthy
Oh
Pal
Phillips
Phillips
Phillips
Rahman
Rai
Rai
Rai
Robinson
Sahin
Singpurwalla
Singpurwalla
Sureka
Suzuki
Suzuki
Suzuki
Suzuki
Suzuki
Suzuki
Thomas
Thomas
Vinta
Vintr
Vittal
Wang
Wasserman
Wasserman
Wasserman
Wilson
Wu
Wu
Wu
Wu
Wu
Wu
Yang
Yang
Zuo
Publication venue: 'Wiley'
Publication date: 10/01/2012
Field of study

Warranty claims and supplementary data contain useful information about product quality and reliability. Analysing such data can therefore be of benefit to manufacturers in identifying early warnings of abnormalities in their products, providing useful information about failure modes to aid design modification, estimating product reliability for deciding on warranty policy and forecasting future warranty claims needed for preparing fiscal plans. In the last two decades, considerable research has been conducted in warranty data analysis (WDA) from several different perspectives. This article attempts to summarise and review the research and developments in WDA with emphasis on models, methods and applications. It concludes with a brief discussion on current practices and possible future trends in WDA

Crossref

Kent Academic Repository

Just how difficult can it be counting up R&D funding for emerging technologies (and is tech mining with proxy measures going to be any better?)

Author: Anon
Archibuigi D.
Augsdorfer P.
Bijker W. E.
Bowker G.
Freeman C.
Headrick D.
Jaffe A. B.
Josh Siepel
Martin P.
Matthews K.
Michael M. Hopkins
Morgan Jones M.
NESTA
Nicholaisen J.
Nuffield Council on Bioethics
OECD
OECD. 2009
Porter T.
Rafols I.
Rosenberg N.
Searle J.
Veugelers R.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2013
Field of study

Decision makers considering policy or strategy related to the development of emerging technologies expect high quality data on the support for different technological options. A natural starting point would be R&D funding statistics. This paper explores the limitations of such aggregated data in relation to the substance and quantification of funding for emerging technologies. Using biotechnology as an illustrative case, we test the utility of a novel taxonomy to demonstrate the endemic weaknesses in the availability and quality of data from public and private sources. Using the same taxonomy, we consider the extent to which tech-mining presents an alternative, or potentially complementary, way to determine support for emerging technologies using proxy measures such as patents and scientific publications

Crossref

Sussex Research Online

Measuring traffic flow and lane changing from semi-automatic video processing

Author: Sala Sanmartí Marcel
Soriguera Martí Francesc
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 01/01/2016
Field of study

Comprehensive databases are needed in order to extend our knowledge on the behavior of vehicular traffic. Nevertheless data coming from common traffic detectors is incomplete. Detectors only provide vehicle count, detector occupancy and speed at discrete locations. To enrich these databases additional measurements from other data sources, like video recordings, are used. Extracting data from videos by actually watching the entire length of the recordings and manually counting is extremely time-consuming. The alternative is to set up an automatic video detection system. This is also costly in terms of money and time, and generally does not pay off for sporadic usage on a pilot test. An adaptation of the semi-automatic video processing methodology proposed by Patire (2010) is presented here. It makes possible to count flow and lane changes 90% faster than actually counting them by looking at the video. The method consists in selecting some specific lined pixels in the video, and converting them into a set of space – time images. The manual time is only spent in counting from these images. The method is adaptive, in the sense that the counting is always done at the maximum speed, not constrained by the video playback speed. This allows going faster when there are a few counts and slower when a lot of counts happen. This methodology has been used for measuring off-ramp flows and lane changing at several locations in the B-23 freeway (Soriguera & Sala, 2014). Results show that, as long as the video recordings fulfill some minimum requirements in framing and quality, the method is easy to use, fast and reliable. This method is intended for research purposes, when some hours of video recording have to be analyzed, not for long term use in a Traffic Management Center.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Avoiding Problems of Traditional Sampling Strategies for Household Surveys in Germany: Some New Suggestions

Author: Rainer Schnell
Publication venue
Publication date
Field of study

All of the sampling plans currently in use for general population surveys in Germany suffer from methodological and practical problems. A new sampling plan is thus urgently needed: one with a low cost overhead that can be prepared in a very short time. Germany also lacks a sampling plan covering all institutional populations, immigrants in general, and illegal immigrants in particular. The availability of new databases covering these populations suggests ways of developing, implementing, and testing new sampling plans for population surveys in Germany. One such sampling plan (G-Plan) is proposed here for the first time. The implementation problems of this design must be studied in a number of empirical pretests.

Research Papers in Economics