275 research outputs found
Characterizing the Temperature of SAT Formulas
The remarkable advances in SAT solving achieved in the last years have allowed to use this technology to solve many real-world applications, such as planning, formal verification and cryptography, among others. Interestingly, these industrial SAT problems are commonly believed to be easier than classical random SAT formulas, but estimating their actual hardness is still a very challenging question, which in some cases even requires to solve them. In this context, realistic pseudo-industrial random SAT generators have emerged with the aim of reproducing the main features of these application problems to better understand the success of those SAT solving techniques on them. In this work, we present a model to estimate the temperature of real-world SAT instances. This temperature represents the degree of distortion into the expected structure of the formula, from highly structured benchmarks (more similar to real-world SAT instances) to the complete absence of structure (observed in the classical random SAT model). Our solution is based on the popularity–similarity random model for SAT, which has been recently presented to reproduce two crucial features of application SAT benchmarks: scale-free and community structures. This model is able to control the hardness of the generated formula by introducing some randomizations in the expected structure. Using our regression model, we observe that the estimated temperature of the applications benchmarks used in the last SAT Competitions correlates to their hardness in most of the cases.Juan de la Cierva program, fellowship IJC2019-040489-I, funded by MCIN and AE
Improving Skip-Gram based Graph Embeddings via Centrality-Weighted Sampling
Network embedding techniques inspired by word2vec represent an effective
unsupervised relational learning model. Commonly, by means of a Skip-Gram
procedure, these techniques learn low dimensional vector representations of the
nodes in a graph by sampling node-context examples. Although many ways of
sampling the context of a node have been proposed, the effects of the way a
node is chosen have not been analyzed in depth. To fill this gap, we have
re-implemented the main four word2vec inspired graph embedding techniques under
the same framework and analyzed how different sampling distributions affects
embeddings performance when tested in node classification problems. We present
a set of experiments on different well known real data sets that show how the
use of popular centrality distributions in sampling leads to improvements,
obtaining speeds of up to 2 times in learning times and increasing accuracy in
all cases
Knowledge discovery in multi-relational graphs
Ante el reducido abanico de metodologías para llevar a cabo tareas de aprendizaje automático relacional, el objetivo principal de esta tesis es realizar un análisis de los métodos existentes, modificando u optimizando en la medida de lo posible algunos de ellos, y aportar nuevos métodos que proporcionen nuevas vías para abordar esta difícil tarea. Para ello, y sin nombrar objetivos relacionados con revisiones bibliográficas ni comparativas entre modelos e implementaciones, se plantean una serie de objetivos concretos a ser cubiertos:
1. Definir estructuras flexibles y potentes que permitan modelar fenómenos en base a los elementos que los componen y a las relaciones establecidas entre éstos. Dichas estructuras deben poder expresar de manera natural propiedades complejas (valores continuos o categóricos, vectores, matrices, diccionarios, grafos,...) de los elementos, así como relaciones heterogéneas entre éstos que a su vez puedan poseer el mismo nivel de propiedades complejas. Además, dichas estructuras deben permitir modelar fenómenos en los que las relaciones entre los elementos no siempre se dan de forma binaria (intervienen únicamente dos elementos), sino que puedan intervenir un número cualquiera de ellos.
2. Definir herramientas para construir, manipular y medir dichas estructuras. Por muy potente y flexible que sea una estructura, será de poca utilidad si no se poseen las herramientas adecuadas para manipularla y estudiarla. Estas herramientas deben ser eficientes en su implementación y cubrir labores de construcción y consulta.
3. Desarrollar nuevos algoritmos de aprendizaje automático relacional de caja negra. En aquellas tareas en las que nuestro objetivo no es obtener modelos explicativos, podremos permitirnos utilizar modelos de caja negra, sacrificando la interpretabilidad a favor de una mayor eficiencia computacional.
4. Desarrollar nuevos algoritmos de aprendizaje automático relacional de caja blanca.
Cuando estamos interesados en una explicación acerca del funcionamiento de los sistemas que se analizan, buscaremos modelos de aprendizaje automático de caja blanca.
5. Mejorar las herramientas de consulta, análisis y reparación para bases de datos. Algunas de las consultas a larga distancia en bases de datos presentan un coste computacional demasiado alto, lo que impide realizar análisis adecuados en algunos sistemas de información. Además, las bases de datos en grafo carecen de métodos que permitan normalizar o reparar los datos de manera automática o bajo la supervisión de un humano. Es interesante aproximarse al desarrollo de herramientas que lleven a cabo este tipo de tareas aumentando la eficiencia y ofreciendo una nueva capa de
consulta y normalización que permita curar los datos para un almacenamiento y una recuperación más óptimos.
Todos los objetivos marcados son desarrollados sobre una base formal sólida, basada en Teoría de la Información, Teoría del Aprendizaje, Teoría de Redes Neuronales Artificiales y Teoría de Grafos. Esta base permite que los resultados obtenidos sean suficientemente formales como para que los aportes que se realicen puedan ser fácilmente evaluados. Además,
los modelos abstractos desarrollados son fácilmente implementables sobre máquinas reales para poder verificar experimentalmente su funcionamiento y poder ofrecer a la comunidad científica soluciones útiles en un corto espacio de tiempo
Detecting the ultra low dimensionality of real networks
Reducing dimension redundancy to find simplifying patterns in high dimensional datasets and complex networks has become a major endeavor
in many scientific fields. However, detecting the dimensionality of their latent
space is challenging but necessary to generate efficient embeddings to be used
in a multitude of downstream tasks. Here, we propose a method to infer the
dimensionality of networks without the need for any a priori spatial embed ding. Due to the ability of hyperbolic geometry to capture the complex con nectivity of real networks, we detect ultra low dimensionality far below values
reported using other approaches. We applied our method to real networks
from different domains and found unexpected regularities, including: tissue specific biomolecular networks being extremely low dimensional; brain con nectomes being close to the three dimensions of their anatomical embedding;
and social networks and the Internet requiring slightly higher dimensionality.
Beyond paving the way towards an ultra efficient dimensional reduction, our
findings help address fundamental issues that hinge on dimensionality, such as
universality in critical behavior.Agencia Estatal de Investigación PID2019-106290GB-C22/AEI/10.13039/501100011033Generalitat de Catalunya 2017SGR106
Visualizing type II error in normality tests
This is an Accepted Manuscript of an article published by Taylor & Francis in “Visualizing type II error in normality tests” on 19th January 2017, available online: http://www.tandfonline.com/doi/full/10.1080/00031305.2016.1278035A Skewed Exponential Power Distribution, with parameters defining kurtosis and skewness, is introduced as a way to visualize Type II error in normality tests. By varying these parameters a mosaic of distributions is built, ranging from double exponential to uniform or from positive to negative exponential; the normal distribution is a particular case located in the center of the mosaic. Using a sequential color scheme, a different color is assigned to each distribution in the mosaic depending on the probability of committing a Type II error. This graph gives a visual representation of the power of the performed test. This way of representing results facilitates the comparison of the power of various tests and the influence of sample size. A script to perform this graphical representation, programmed in the R statistical software, is available online as supplementary material.Peer ReviewedPostprint (author's final draft
Blocking versus robustness in industrial contexts
This is the peer reviewed version of the following article: “Grima, P, Marco, L, Tort-Martorell, X. (2017) Blocking versus robustness in industrial contexts, (2017)1-13.” which has been published in final form at [doi: 10.1002/qre.2173]. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving."The paper discusses the similarities and differences between blocking factors (blocked designs) and noise factors (robust designs) in industrial two-level factorial experiments. The discussion covers from the objectives of both design types and the nature of blocking and noise factors to the types of designs and the assumptions needed in each case. The conclusions are as follows: the nature and characteristics of noise and blocking factors are equal or very similar; the designs used in both situations are also similar; and the main differences lie in the assumptions and the objectives. The paper argues that the objectives are not in conflict and can easily be harmonized. In consequence, we argue in favor of a unified approach that would clarify the issue, especially for students and practitioners.Peer ReviewedPostprint (author's final draft
Perfiles motivacionales de deportistas adolescentes españoles
The aim of this study was to detect possible motivational profiles in a sample of adolescent athletes. Furthermore, the study analysed the differences in the perceived motivational climate's sub-factors and the satisfaction of the basic psychological needs of the different profiles that were found. A sample of 608 athletes was used, and subjects had a mean age of 14.43 years. The perceived motivational climate (PMCSQ-2), psychological mediators (BPNES), and motivation of the athletes in sport (SMS) were measured. Cluster analysis revealed two profiles. A highly motivated profile, with high scores in both forms of motivation: self-determined (intrinsic motivation and identified regulation) and non-self-determined motivation (introjected and external regulation), save for amotivation; and a moderately motivated profile, with moderate scores (around 3 and 4) in forms of self-determined and non-determined motivation. In the multivariate analysis of the perceived motivational climate's sub-factors and of the basic psychological needs according to the profile, significant differences were found in favour of the highly motivated profile both for the task-involving climate and egoinvolving climate sub-factors, as well as for the three psychological mediators. The results are discussed in regard to the importance of encouraging a climate that involves tasks and that tries to satisfy the need for autonomy, competence, and relationships with others during training sessions in order to obtain more self-determined motivational profiles
The time has come: Statistics in bestselling books
Beyond textbooks, statistics is also present in bestselling books, those
that appear on the top 10 lists of bookshops and online bookstores. This
paper discusses five of those books, highlighting the role of statistics in
each one. Besides describing the general topics of the books, we want to
show that the knowledge of the world around us – and also the knowledge
on ourselves – advances thanks to the application of the scientific method
of which statistics is a key element. The paper finishes with some thoughts
on the desirability of a practical approach to teaching statisticsPostprint (published version
- …