10,352 research outputs found
The Rational Agent Benchmark for Data Visualization
Understanding how helpful a visualization is from experimental results is
difficult because the observed performance is confounded with aspects of the
study design, such as how useful the information that is visualized is for the
task. We develop a rational agent framework for designing and interpreting
visualization experiments. Our framework conceives two experiments with the
same setup: one with behavioral agents (human subjects), the other one with a
hypothetical rational agent. A visualization is evaluated by comparing the
expected performance of behavioral agents to that of rational agent under
different assumptions. Using recent visualization decision studies from the
literature, we demonstrate how the framework can be used to pre-experimentally
evaluate the experiment design by bounding the expected improvement in
performance from having access to visualizations, and post-experimentally to
deconfound errors of information extraction from errors of optimization, among
other analyses
Joint stochastic simulation of petrophysical properties with elastic attributes based on parametric copula models
The spatial stochastic co-simulation method based on copulas is a general method that allows simulating variables with any type of dependency and probability distribution functions. This flexibility comes from the use of a copula model for the representation of the joint probability distribution function. The method has been mainly implemented through a non-parametric approach using Bernstein copulas and has been successfully applied for the simulation of petrophysical properties using elastic seismic attributes as secondary variables. In the present work this method is implemented through two other approaches: parametric and semi-parametric. Specifically, for the parametric approach the family of Archimedean copulas is used. First, the parametric approach is validated against a published case, and then a comparison of the three approaches in terms of accuracy and performance is made. The results showed that the parametric approach is the one that reproduces the data statistics worse and presents greater uncertainty with a lower computational cost, while the non-parametric approach was the one that best reproduces the dependence of the data at a high computational cost. The semi-parametric approach reduces the computational cost by 10% compared to the non-parametric approach, but its accuracy is significantly degraded
Likelihood Asymptotics in Nonregular Settings: A Review with Emphasis on the Likelihood Ratio
This paper reviews the most common situations where one or more regularity
conditions which underlie classical likelihood-based parametric inference fail.
We identify three main classes of problems: boundary problems, indeterminate
parameter problems -- which include non-identifiable parameters and singular
information matrices -- and change-point problems. The review focuses on the
large-sample properties of the likelihood ratio statistic. We emphasize
analytical solutions and acknowledge software implementations where available.
We furthermore give summary insight about the possible tools to derivate the
key results. Other approaches to hypothesis testing and connections to
estimation are listed in the annotated bibliography of the Supplementary
Material
On the competitive facility location problem with a Bayesian spatial interaction model
The competitive facility location problem arises when businesses plan to enter a new market or expand their presence. We introduce a Bayesian spatial interaction model which provides probabilistic estimates on location-specific revenues and then formulate a mathematical framework to simultaneously identify the location and design of new facilities that maximise revenue. To solve the allocation optimisation problem, we develop a hierarchical search algorithm and associated sampling techniques that explore geographic regions of varying spatial resolution. We demonstrate the approach by producing optimal facility locations and corresponding designs for two large-scale applications in the supermarket and pub sectors of Greater London
PreFair: Privately Generating Justifiably Fair Synthetic Data
When a database is protected by Differential Privacy (DP), its usability is
limited in scope. In this scenario, generating a synthetic version of the data
that mimics the properties of the private data allows users to perform any
operation on the synthetic data, while maintaining the privacy of the original
data. Therefore, multiple works have been devoted to devising systems for DP
synthetic data generation. However, such systems may preserve or even magnify
properties of the data that make it unfair, endering the synthetic data unfit
for use. In this work, we present PreFair, a system that allows for DP fair
synthetic data generation. PreFair extends the state-of-the-art DP data
generation mechanisms by incorporating a causal fairness criterion that ensures
fair synthetic data. We adapt the notion of justifiable fairness to fit the
synthetic data generation scenario. We further study the problem of generating
DP fair synthetic data, showing its intractability and designing algorithms
that are optimal under certain assumptions. We also provide an extensive
experimental evaluation, showing that PreFair generates synthetic data that is
significantly fairer than the data generated by leading DP data generation
mechanisms, while remaining faithful to the private data.Comment: 15 pages, 11 figure
Discovering the hidden structure of financial markets through bayesian modelling
Understanding what is driving the price of a financial asset is a question that is currently mostly unanswered. In this work we go beyond the classic one step ahead prediction and instead construct models that create new information on the behaviour of these time series. Our aim is to get a better understanding of the hidden structures that drive the moves of each financial time series and thus the market as a whole.
We propose a tool to decompose multiple time series into economically-meaningful variables to explain the endogenous and exogenous factors driving their underlying variability. The methodology we introduce goes beyond the direct model forecast. Indeed, since our model continuously adapts its variables and coefficients, we can study the time series of coefficients and selected variables. We also present a model to construct the causal graph of relations between these time series and include them in the exogenous factors.
Hence, we obtain a model able to explain what is driving the move of both each specific time series and the market as a whole. In addition, the obtained graph of the time series provides new information on the underlying risk structure of this environment. With this deeper understanding of the hidden structure we propose novel ways to detect and forecast risks in the market. We investigate our results with inferences up to one month into the future using stocks, FX futures and ETF futures, demonstrating its superior performance according to accuracy of large moves, longer-term prediction and consistency over time. We also go in more details on the economic interpretation of the new variables and discuss the created graph structure of the market.Open Acces
Modeling Uncertainty for Reliable Probabilistic Modeling in Deep Learning and Beyond
[ES] Esta tesis se enmarca en la intersección entre las técnicas modernas de Machine Learning, como las Redes Neuronales Profundas, y el modelado probabilístico confiable. En muchas aplicaciones, no solo nos importa la predicción hecha por un modelo (por ejemplo esta imagen de pulmón presenta cáncer) sino también la confianza que tiene el modelo para hacer esta predicción (por ejemplo esta imagen de pulmón presenta cáncer con 67% probabilidad). En tales aplicaciones, el modelo ayuda al tomador de decisiones (en este caso un médico) a tomar la decisión final. Como consecuencia, es necesario que las probabilidades proporcionadas por un modelo reflejen las proporciones reales presentes en el conjunto al que se ha asignado dichas probabilidades; de lo contrario, el modelo es inútil en la práctica. Cuando esto sucede, decimos que un modelo está perfectamente calibrado.
En esta tesis se exploran tres vias para proveer modelos más calibrados. Primero se muestra como calibrar modelos de manera implicita, que son descalibrados por técnicas de aumentación de datos. Se introduce una función de coste que resuelve esta descalibración tomando como partida las ideas derivadas de la toma de decisiones con la regla de Bayes. Segundo, se muestra como calibrar modelos utilizando una etapa de post calibración implementada con una red neuronal Bayesiana. Finalmente, y en base a las limitaciones estudiadas en la red neuronal Bayesiana, que hipotetizamos que se basan en un prior mispecificado, se introduce un nuevo proceso estocástico que sirve como distribución a priori en un problema de inferencia Bayesiana.[CA] Aquesta tesi s'emmarca en la intersecció entre les tècniques modernes de Machine Learning, com ara les Xarxes Neuronals Profundes, i el modelatge probabilístic fiable. En moltes aplicacions, no només ens importa la predicció feta per un model (per ejemplem aquesta imatge de pulmó presenta càncer) sinó també la confiança que té el model per fer aquesta predicció (per exemple aquesta imatge de pulmó presenta càncer amb 67% probabilitat). En aquestes aplicacions, el model ajuda el prenedor de decisions (en aquest cas un metge) a prendre la decisió final. Com a conseqüència, cal que les probabilitats proporcionades per un model reflecteixin les proporcions reals presents en el conjunt a què s'han assignat aquestes probabilitats; altrament, el model és inútil a la pràctica. Quan això passa, diem que un model està perfectament calibrat.
En aquesta tesi s'exploren tres vies per proveir models més calibrats. Primer es mostra com calibrar models de manera implícita, que són descalibrats per tècniques d'augmentació de dades. S'introdueix una funció de cost que resol aquesta descalibració prenent com a partida les idees derivades de la presa de decisions amb la regla de Bayes. Segon, es mostra com calibrar models utilitzant una etapa de post calibratge implementada amb una xarxa neuronal Bayesiana. Finalment, i segons les limitacions estudiades a la xarxa neuronal Bayesiana, que es basen en un prior mispecificat, s'introdueix un nou procés estocàstic que serveix com a distribució a priori en un problema d'inferència Bayesiana.[EN] This thesis is framed at the intersection between modern Machine Learning techniques, such as Deep Neural Networks, and reliable probabilistic modeling. In many machine learning applications, we do not only care about the prediction made by a model (e.g. this lung image presents cancer) but also in how confident is the model in making this prediction (e.g. this lung image presents cancer with 67% probability). In such applications, the model assists the decision-maker (in this case a doctor) towards making the final decision. As a consequence, one needs that the probabilities provided by a model reflects the true underlying set of outcomes, otherwise the model is useless in practice. When this happens, we say that a model is perfectly calibrated.
In this thesis three ways are explored to provide more calibrated models. First, it is shown how to calibrate models implicitly, which are decalibrated by data augmentation techniques. A cost function is introduced that solves this decalibration taking as a starting point the ideas derived from decision making with Bayes' rule. Second, it shows how to calibrate models using a post-calibration stage implemented with a Bayesian neural network. Finally, and based on the limitations studied in the Bayesian neural network, which we hypothesize that came from a mispecified prior, a new stochastic process is introduced that serves as a priori distribution in a Bayesian inference problem.Maroñas Molano, J. (2022). Modeling Uncertainty for Reliable Probabilistic Modeling in Deep Learning and Beyond [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/181582TESI
Targeting Fusion Proteins of HIV-1 and SARS-CoV-2
Viruses are disease-causing pathogenic agents that require host cells to replicate. Fusion of host and viral membranes is critical for the lifecycle of enveloped viruses. Studying viral fusion proteins can allow us to better understand how they shape immune responses and inform the design of therapeutics such as drugs, monoclonal antibodies, and vaccines. This thesis discusses two approaches to targeting two fusion proteins: Env from HIV-1 and S from SARS-CoV-2. The first chapter of this thesis is an introduction to viruses with a specific focus on HIV-1 CD4 mimetic drugs and antibodies against SARS-CoV-2. It discusses the architecture of these viruses and fusion proteins and how small molecules, peptides, and antibodies can target these proteins successfully to treat and prevent disease. In addition, a brief overview is included of the techniques involved in structural biology and how it has informed the study of viruses. For the interested reader, chapter 2 contains a review article that serves as a more in-depth introduction for both viruses as well as how the use of structural biology has informed the study of viral surface proteins and neutralizing antibody responses to them. The subsequent chapters provide a body of work divided into two parts. The first part in chapter 3 involves a study on conformational changes induced in the HIV-1 Env protein by CD4-mimemtic drugs using single particle cryo-EM. The second part encompassing chapters 4 and 5 includes two studies on antibodies isolated from convalescent COVID-19 donors. The former involves classification of antibody responses to the SARS-CoV-2 S receptor-binding domain (RBD). The latter discusses an anti-RBD antibody class that binds to a conserved epitope on the RBD and shows cross-binding and cross-neutralization to other coronaviruses in the sarbecovirus subgenus.</p
Psychographic And Behavioral Segmentation Of Food Delivery Application Customers To Increase Intention To Use
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThis study presents a framework for segmenting Food Delivery Application (FDA) customers based on
psychographic and behavioral variables as an alternative to existing segmentation. Customer segments
are proposed by applying clustering methods to primary data from an electronic survey. Psychographic
and behavioral constructs are formulated as hypotheses based on existing literature, and then
evaluated as segmentation variables regarding their discriminatory power for customer segmentation.
Detected relevant variables are used in the application of clustering techniques to find adequate
boundaries within customer groupings for segmentation purposes. Characterization of customer
segments is performed and enriched with implications of findings in FDA marketing strategies. This
paper contributes to theory by providing new findings on segmentation that are relevant for an online
context. In addition, it contributes to practice by detailing implications of customer segments in an
online sales strategy, allowing marketing managers and FDA businesses to capitalize knowledge in their
conversion funnel designs
Annals [...].
Pedometrics: innovation in tropics; Legacy data: how turn it useful?; Advances in soil sensing; Pedometric guidelines to systematic soil surveys.Evento online. Coordenado por: Waldir de Carvalho Junior, Helena Saraiva Koenow Pinheiro, Ricardo Simão Diniz Dalmolin
- …