3,829 research outputs found
Smooth Multibidding Mechanisms
We propose a smooth multibidding mechanism for environments where a group of agents have to choose one out of several projects. Our proposal is related to the multibidding mechanism (Pérez-Castrillo and Wettstein, 2002) but it is “smoother” in the sense that small variations in an agent’s bids do not lead to dramatic changes in the probability of selecting a project. This mechanism is shown to possess several interesting properties. First, the equilibrium outcome is unique. Second, it ensures an equal sharing of the surplus that it induces. Finally, it enables reaching an outcome as close to efficiency as is desired.mechanism design, NIMBY
Sketching for Large-Scale Learning of Mixture Models
Learning parameters from voluminous data can be prohibitive in terms of
memory and computational requirements. We propose a "compressive learning"
framework where we estimate model parameters from a sketch of the training
data. This sketch is a collection of generalized moments of the underlying
probability distribution of the data. It can be computed in a single pass on
the training set, and is easily computable on streams or distributed datasets.
The proposed framework shares similarities with compressive sensing, which aims
at drastically reducing the dimension of high-dimensional signals while
preserving the ability to reconstruct them. To perform the estimation task, we
derive an iterative algorithm analogous to sparse reconstruction algorithms in
the context of linear inverse problems. We exemplify our framework with the
compressive estimation of a Gaussian Mixture Model (GMM), providing heuristics
on the choice of the sketching procedure and theoretical guarantees of
reconstruction. We experimentally show on synthetic data that the proposed
algorithm yields results comparable to the classical Expectation-Maximization
(EM) technique while requiring significantly less memory and fewer computations
when the number of database elements is large. We further demonstrate the
potential of the approach on real large-scale data (over 10 8 training samples)
for the task of model-based speaker verification. Finally, we draw some
connections between the proposed framework and approximate Hilbert space
embedding of probability distributions using random features. We show that the
proposed sketching operator can be seen as an innovative method to design
translation-invariant kernels adapted to the analysis of GMMs. We also use this
theoretical framework to derive information preservation guarantees, in the
spirit of infinite-dimensional compressive sensing
ALOJA: A benchmarking and predictive platform for big data performance analysis
The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectivenessof Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and web-based analytic tools to gather insights about system's cost-performance1.
This article describes the evolution of the project's focus and research
lines from over a year of continuously benchmarking Hadoop under dif-
ferent configuration and deployments options, presents results, and dis
cusses the motivation both technical and market-based of such changes.
During this time, ALOJA's target has evolved from a previous low-level
profiling of Hadoop runtime, passing through extensive benchmarking
and evaluation of a large body of results via aggregation, to currently
leveraging Predictive Analytics (PA) techniques. Modeling benchmark
executions allow us to estimate the results of new or untested configu-
rations or hardware set-ups automatically, by learning techniques from
past observations saving in benchmarking time and costs.This work is partially supported the BSC-Microsoft Research Centre, the Span-
ish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft
Right of asylum and refugee status: Balance of 26 years of legislative development
El derecho de asilo (art. 13.4 CE), ha operado una cierta evolución a la par que el fenómeno de la inmigración y el tratamiento jurídico de la extranjería, tanto en nuestro país como en los de nuestro entorno. Además, el incremento de los flujos migratorios en Europa ha condicionado las respuestas estatales sobre la política de asilo y refugio, que han experimentado desde la década de los años ochenta del siglo pasado una considerable restricción. Desde la opción inicial del constituyente en torno a su configuración legal, hasta la proliferación de instrumentos de protección temporal por causas humanitarias, la actual regulación de asilo y refugio y sus carencias, es consecuencia de nuestra adaptación a la política común de asilo de la Unión Europea, si bien cada sistema, es obvio, presenta características específicas. Así, en nuestro caso el sistema de asilo está vinculado al incremento de flujos inmigratorios de carácter mixto, como a consideraciones de política migratoria. La nueva regulación en materia de asilo y protección subsidiaria aquí comentada precisará de desarrollo reglamentario, así como de aplicación administrativa e interpretación jurisprudencial para evidenciar todas sus potencialidades, ya que su reciente entrada en vigor sólo nos ha permitido una evaluación teórica, en la que debemos destacar el esfuerzo garantista en los procedimientos, la concreción conceptual y la casi completa equiparación entre asilo y protección subsidiaria.The right of asylum (art. 13.4 Spanish Constitution), has undergone an evolution like immigration and the legal treatment of foreigners, both at our country and in other European countries. Moreover, the increase of migration flows in Europe has conditioned state responses on asylum and refugee policy, which have experienced since the early eighties of last century a considerable restriction. Since the initial choice of constituents around legal settings of the right to the proliferation of temporary protection instruments for humanitarian reasons, the current asylum and refugee regulations and shortcomings is a consequence of our adaptation to the common asylum policy of the European Union, although each country system has obviously special features. Thus, in Spain the asylum system is linked to increased mixed migratory flows in nature, and to immigration policy considerations as well. The new regulation on asylum and subsidiary protection discussed here will require enabling legislation and administrative implementation and judicial interpretation to show its full potential, since its recent enforcement only has allowed a theoretical evaluation, which highlights safeguarding effort in procedures, conceptual clarification and almost complete equivalence between asylum and subsidiary protection
ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments
This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement
No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version
Evaluation of CNN architectures for gait recognition based on optical flow maps
This work targets people identification in video based on the way they walk (\ie gait) by using deep learning architectures. We explore the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (\ie optical flow components). The low number of training samples for each subject and the use of a test set containing subjects different from the training ones makes the search of a good CNN architecture a challenging task.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec
The state of SQL-on-Hadoop in the cloud
Managed Hadoop in the cloud, especially SQL-on-Hadoop, has been gaining attention recently. On Platform-as-a-Service (PaaS), analytical services like Hive and Spark come preconfigured for general-purpose and ready to use. Thus, giving companies a quick entry and on-demand deployment of ready SQL-like solutions for their big data needs. This study evaluates cloud services from an end-user perspective, comparing providers including: Microsoft Azure, Amazon Web Services, Google Cloud,
and Rackspace. The study focuses on performance, readiness, scalability, and cost-effectiveness of the different solutions at entry/test level clusters sizes. Results are based on over 15,000 Hive queries derived from the industry standard TPC-H benchmark.
The study is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines.
The ALOJA Project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization.
The study benchmarks cloud providers across a diverse range instance types, and uses input data scales from 1GB to 1TB, in order to survey the popular entry-level PaaS SQL-on-Hadoop solutions, thereby establishing a common results-base upon which subsequent research can be carried out by the project. Initial results already show the main performance trends to both hardware and software configuration, pricing, similarities and architectural differences of the evaluated PaaS solutions. Whereas some
providers focus on decoupling storage and computing resources while offering network-based elastic storage, others choose to keep the local processing model from Hadoop for high performance, but reducing flexibility. Results also show the importance of application-level tuning and how keeping up-to-date hardware and software stacks can influence performance even more than replicating the on-premises model in the cloud.This work is partially supported by the Microsoft Azure for Research program, the European Research Council (ERC) under
the EUs Horizon 2020 programme (GA 639595), the Spanish Ministry of Education (TIN2015-65316-P), and the Generalitat
de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft
Auditing Cost Overrun Claims
We consider a cost-reimbursement or a cost-sharing procurement contract between the administration and a firm. The firm privately learns the true cost overrun once the project has started and it can manipulate this information. We characterize the optimal auditing policy of cost overrun claims as a function of the initial contractual payment, the share of the cost overrun paid by the administration, the cost and the accuracy of the auditing technology, and the penalty rate that can be imposed on fraudulent firms. We also show that this possibility of misreporting reduces the set of projects carried out and biases the choice of the quality level of those projects that the administration carries out.cost overruns; auditing; procurement
Data modeling as a main source of discrepancies in single and multiple marker association methods
Genome-wide association studies have successfully identified several loci underlying complex diseases in humans. The development of high density SNP maps in domestic animal species should allow the detection of QTLs for economically important traits through association studies with much higher accuracy than traditional linkage analysis. Here we report the association analysis of the dataset simulated for the XII QTL-MAS meeting (Uppsala). We used two strategies, single marker association and haplotype-based association (Blossoc) that were applied to i) the raw data, and ii) the data corrected for infinitesimal, sex and generation effects. Both methods performed similarly in detecting the most strongly associated SNPs, about ten loci in total. The most significant ones were located in chromosomes 1, 4 and 5. Overall, the largest differences were found between corrected and raw data, rather than between single and multiple marker analysis. The use of raw data increased greatly the number of significant loci, but possibly also the rate of false positives. Bootstrap model aggregation removed most of discrepancies between adjusted and raw data when SMA was employed. Model choice should be carefully considered in genome-wide association studies
- …