74 research outputs found
Proceedings of the 2022 XCSP3 Competition
This document represents the proceedings of the 2022 XCSP3 Competition. The
results of this competition of constraint solvers were presented at FLOC
(Federated Logic Conference) 2022 Olympic Games, held in Haifa, Israel from
31th July 2022 to 7th August, 2022.Comment: arXiv admin note: text overlap with arXiv:1901.0183
Recommender system in a non-stationary context: recommending job ads in pandemic times
International audienceThis paper focuses on the recommendation of job ads to job seekers, exploiting proprietary data from the French Public Employment Service (PES) and focusing more specifically on low or unskilled workers. Besides the usual challenges of data sparsity, the signal to noise ratio is high (few job seekers have diplomas), and scalability requirements are paramount.As a first contribution, a two-tiered approach is designed to handle these requirements; its empirical validation shows significant computational gains with no performance loss compared to boosted tree ensembles representative of the state of the art.A second contribution is a methodology aimed to assess the impact of the non-stationarity of the item and user distributions. Specifically, during the last three periods (before, during and after the Covid lock-downs), the numbers of job ads and job seekers dramatically vary in some industries. A normalized recall indicator is proposed to filter out the impact of variations of the number of job ads. This normalization suggests that the same score function adapts to the multi-faceted changes of the environment, resulting in different recommendations but with similar accuracy as before - at least for the job seekers finding a job
Recognition and Exploitation of Gate Structure in SAT Solving
In der theoretischen Informatik ist das SAT-Problem der archetypische Vertreter der Klasse der NP-vollständigen Probleme, weshalb effizientes SAT-Solving im Allgemeinen als unmöglich angesehen wird.
Dennoch erzielt man in der Praxis oft erstaunliche Resultate, wo einige Anwendungen Probleme mit Millionen von Variablen erzeugen, die von neueren SAT-Solvern in angemessener Zeit gelöst werden können.
Der Erfolg von SAT-Solving in der Praxis ist auf aktuelle Implementierungen des Conflict Driven Clause-Learning (CDCL) Algorithmus zurückzuführen, dessen Leistungsfähigkeit weitgehend von den verwendeten Heuristiken abhängt, welche implizit die Struktur der in der industriellen Praxis erzeugten Instanzen ausnutzen.
In dieser Arbeit stellen wir einen neuen generischen Algorithmus zur effizienten Erkennung der Gate-Struktur in CNF-Encodings von SAT Instanzen vor, und außerdem drei Ansätze, in denen wir diese Struktur explizit ausnutzen.
Unsere Beiträge umfassen auch die Implementierung dieser Ansätze in unserem SAT-Solver Candy und die Entwicklung eines Werkzeugs für die verteilte Verwaltung von Benchmark-Instanzen und deren Attribute, der Global Benchmark Database (GBD)
On Maximum Weight Clique Algorithms, and How They Are Evaluated
Maximum weight clique and maximum weight independent set solvers are often benchmarked using maximum clique problem instances, with weights allocated to vertices by taking the vertex number mod 200 plus 1. For constraint programming approaches, this rule has clear implications, favouring weight-based rather than degree-based heuristics. We show that similar implications hold for dedicated algorithms, and that additionally, weight distributions affect whether certain inference rules are cost-effective. We look at other families of benchmark instances for the maximum weight clique problem, coming from winner determination problems, graph colouring, and error-correcting codes, and introduce two new families of instances, based upon kidney exchange and the Research Excellence Framework. In each case the weights carry much more interesting structure, and do not in any way resemble the 200 rule. We make these instances available in the hopes of improving the quality of future experiments
On the Nature and Types of Anomalies: A Review
Anomalies are occurrences in a dataset that are in some way unusual and do
not fit the general patterns. The concept of the anomaly is generally
ill-defined and perceived as vague and domain-dependent. Moreover, despite some
250 years of publications on the topic, no comprehensive and concrete overviews
of the different types of anomalies have hitherto been published. By means of
an extensive literature review this study therefore offers the first
theoretically principled and domain-independent typology of data anomalies, and
presents a full overview of anomaly types and subtypes. To concretely define
the concept of the anomaly and its different manifestations, the typology
employs five dimensions: data type, cardinality of relationship, anomaly level,
data structure and data distribution. These fundamental and data-centric
dimensions naturally yield 3 broad groups, 9 basic types and 61 subtypes of
anomalies. The typology facilitates the evaluation of the functional
capabilities of anomaly detection algorithms, contributes to explainable data
science, and provides insights into relevant topics such as local versus global
anomalies.Comment: 38 pages (30 pages content), 10 figures, 3 tables. Preprint; review
comments will be appreciated. Improvements in version 2: Explicit mention of
fifth anomaly dimension; Added section on explainable anomaly detection;
Added section on variations on the anomaly concept; Various minor additions
and improvement
Artificial cognitive architecture with self-learning and self-optimization capabilities. Case studies in micromachining processes
Tesis doctoral inĂ©dita leĂda en la Universidad AutĂłnoma de Madrid, Escuela PolitĂ©cnica Superior, Departamento de IngenierĂa Informática. Fecha de lectura : 22-09-201
Hierarchical Text Classification: a review of current research
t is often the case that collections of documents are annotated with hierarchically-structured concepts. However, the benefits of this structure are rarely taken into account by commonly-used classification techniques. Conversely, Hierarchical Text Classification methods are devisedto take advantage of the labels’ organization to boost classification performance. With this work,we aim to deliver an updated overview of current research in this domain. We begin by definingthe task and framing it within the broader text classification area, examining important shared concepts such as text representation. Then, we dive into details regarding the specific task,providing a high-level description of its traditional approaches. We then summarize recentlyproposed methods, highlighting their main contributions. We additionally provide statisticsfor the most adopted datasets and describe the benefits of using evaluation metrics tailored to hierarchical settings. Finally, a selection of recent proposals is benchmarked against
non-hierarchical baselines on five domain-specific datasets
- …