2,537 research outputs found

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Mining Butterflies in Streaming Graphs

    Get PDF
    This thesis introduces two main-memory systems sGrapp and sGradd for performing the fundamental analytic tasks of biclique counting and concept drift detection over a streaming graph. A data-driven heuristic is used to architect the systems. To this end, initially, the growth patterns of bipartite streaming graphs are mined and the emergence principles of streaming motifs are discovered. Next, the discovered principles are (a) explained by a graph generator called sGrow; and (b) utilized to establish the requirements for efficient, effective, explainable, and interpretable management and processing of streams. sGrow is used to benchmark stream analytics, particularly in the case of concept drift detection. sGrow displays robust realization of streaming growth patterns independent of initial conditions, scale and temporal characteristics, and model configurations. Extensive evaluations confirm the simultaneous effectiveness and efficiency of sGrapp and sGradd. sGrapp achieves mean absolute percentage error up to 0.05/0.14 for the cumulative butterfly count in streaming graphs with uniform/non-uniform temporal distribution and a processing throughput of 1.5 million data records per second. The throughput and estimation error of sGrapp are 160x higher and 0.02x lower than baselines. sGradd demonstrates an improving performance over time, achieves zero false detection rates when there is not any drift and when drift is already detected, and detects sequential drifts in zero to a few seconds after their occurrence regardless of drift intervals

    Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing

    Full text link
    Developers often dedicate significant time to maintaining and refactoring existing code. However, most prior work on generative models for code focuses solely on creating new code, neglecting the unique requirements of editing existing code. In this work, we explore a multi-round code auto-editing setting, aiming to predict edits to a code region based on recent changes within the same codebase. Our model, Coeditor, is a fine-tuned CodeT5 model with enhancements specifically designed for code editing tasks. We encode code changes using a line diff format and employ static analysis to form large customized model contexts, ensuring appropriate information for prediction. We collect a code editing dataset from the commit histories of 1650 open-source Python projects for training and evaluation. In a simplified single-round, single-edit task, Coeditor significantly outperforms the best code completion approach -- nearly doubling its exact-match accuracy, despite using a much smaller model -- demonstrating the benefits of incorporating editing history for code completion. In a multi-round, multi-edit setting, we observe substantial gains by iteratively prompting the model with additional user edits. We open-source our code, data, and model weights to encourage future research and release a VSCode extension powered by our model for interactive usage

    Planar Disjoint Paths, Treewidth, and Kernels

    Full text link
    In the Planar Disjoint Paths problem, one is given an undirected planar graph with a set of kk vertex pairs (si,ti)(s_i,t_i) and the task is to find kk pairwise vertex-disjoint paths such that the ii-th path connects sis_i to tit_i. We study the problem through the lens of kernelization, aiming at efficiently reducing the input size in terms of a parameter. We show that Planar Disjoint Paths does not admit a polynomial kernel when parameterized by kk unless coNP ⊆\subseteq NP/poly, resolving an open problem by [Bodlaender, Thomass{\'e}, Yeo, ESA'09]. Moreover, we rule out the existence of a polynomial Turing kernel unless the WK-hierarchy collapses. Our reduction carries over to the setting of edge-disjoint paths, where the kernelization status remained open even in general graphs. On the positive side, we present a polynomial kernel for Planar Disjoint Paths parameterized by k+twk + tw, where twtw denotes the treewidth of the input graph. As a consequence of both our results, we rule out the possibility of a polynomial-time (Turing) treewidth reduction to tw=kO(1)tw= k^{O(1)} under the same assumptions. To the best of our knowledge, this is the first hardness result of this kind. Finally, combining our kernel with the known techniques [Adler, Kolliopoulos, Krause, Lokshtanov, Saurabh, Thilikos, JCTB'17; Schrijver, SICOMP'94] yields an alternative (and arguably simpler) proof that Planar Disjoint Paths can be solved in time 2O(k2)⋅nO(1)2^{O(k^2)}\cdot n^{O(1)}, matching the result of [Lokshtanov, Misra, Pilipczuk, Saurabh, Zehavi, STOC'20].Comment: To appear at FOCS'23, 82 pages, 30 figure

    Private set intersection: A systematic literature review

    Get PDF
    Secure Multi-party Computation (SMPC) is a family of protocols which allow some parties to compute a function on their private inputs, obtaining the output at the end and nothing more. In this work, we focus on a particular SMPC problem named Private Set Intersection (PSI). The challenge in PSI is how two or more parties can compute the intersection of their private input sets, while the elements that are not in the intersection remain private. This problem has attracted the attention of many researchers because of its wide variety of applications, contributing to the proliferation of many different approaches. Despite that, current PSI protocols still require heavy cryptographic assumptions that may be unrealistic in some scenarios. In this paper, we perform a Systematic Literature Review of PSI solutions, with the objective of analyzing the main scenarios where PSI has been studied and giving the reader a general taxonomy of the problem together with a general understanding of the most common tools used to solve it. We also analyze the performance using different metrics, trying to determine if PSI is mature enough to be used in realistic scenarios, identifying the pros and cons of each protocol and the remaining open problems.This work has been partially supported by the projects: BIGPrivDATA (UMA20-FEDERJA-082) from the FEDER Andalucía 2014– 2020 Program and SecTwin 5.0 funded by the Ministry of Science and Innovation, Spain, and the European Union (Next Generation EU) (TED2021-129830B-I00). The first author has been funded by the Spanish Ministry of Education under the National F.P.U. Program (FPU19/01118). Funding for open access charge: Universidad de Málaga/CBU

    Reasoning about quantities and concepts: studies in social learning

    Get PDF
    We live and learn in a ‘society of mind’. This means that we form beliefs not just based on our own observations and prior expectations but also based on the communications from other people, such as our social network peers. Across seven experiments, I study how people combine their own private observations with other people’s communications to form and update beliefs about the environment. I will follow the tradition of rational analysis and benchmark human learning against optimal Bayesian inference at Marr’s computational level. To accommodate human resource constraints and cognitive biases, I will further contrast human learning with a variety of process level accounts. In Chapters 2–4, I examine how people reason about simple environmental quantities. I will focus on the effect of dependent information sources on the success of group and individual learning across a series of single-player and multi-player judgement tasks. Overall, the results from Chapters 2–4 highlight the nuances of real social network dynamics and provide insights into the conditions under which we can expect collective success versus failures such as the formation of inaccurate worldviews. In Chapter 5, I develop a more complex social learning task which goes beyond estimation of environmental quantities and focuses on inductive inference with symbolic concepts. Here, I investigate how people search compositional theory spaces to form and adapt their beliefs, and how symbolic belief adaptation interfaces with individual and social learning in a challenging active learning task. Results from Chapter 5 suggest that people might explore compositional theory spaces using local incremental search; and that it is difficult for people to use another person’s learning data to improve upon their hypothesis

    Behavior quantification as the missing link between fields: Tools for digital psychiatry and their role in the future of neurobiology

    Full text link
    The great behavioral heterogeneity observed between individuals with the same psychiatric disorder and even within one individual over time complicates both clinical practice and biomedical research. However, modern technologies are an exciting opportunity to improve behavioral characterization. Existing psychiatry methods that are qualitative or unscalable, such as patient surveys or clinical interviews, can now be collected at a greater capacity and analyzed to produce new quantitative measures. Furthermore, recent capabilities for continuous collection of passive sensor streams, such as phone GPS or smartwatch accelerometer, open avenues of novel questioning that were previously entirely unrealistic. Their temporally dense nature enables a cohesive study of real-time neural and behavioral signals. To develop comprehensive neurobiological models of psychiatric disease, it will be critical to first develop strong methods for behavioral quantification. There is huge potential in what can theoretically be captured by current technologies, but this in itself presents a large computational challenge -- one that will necessitate new data processing tools, new machine learning techniques, and ultimately a shift in how interdisciplinary work is conducted. In my thesis, I detail research projects that take different perspectives on digital psychiatry, subsequently tying ideas together with a concluding discussion on the future of the field. I also provide software infrastructure where relevant, with extensive documentation. Major contributions include scientific arguments and proof of concept results for daily free-form audio journals as an underappreciated psychiatry research datatype, as well as novel stability theorems and pilot empirical success for a proposed multi-area recurrent neural network architecture.Comment: PhD thesis cop

    Contributions to time series analysis, modelling and forecasting to increase reliability in industrial environments.

    Get PDF
    356 p.La integración del Internet of Things en el sector industrial es clave para alcanzar la inteligencia empresarial. Este estudio se enfoca en mejorar o proponer nuevos enfoques para aumentar la confiabilidad de las soluciones de IA basadas en datos de series temporales en la industria. Se abordan tres fases: mejora de la calidad de los datos, modelos y errores. Se propone una definición estándar de métricas de calidad y se incluyen en el paquete dqts de R. Se exploran los pasos del modelado de series temporales, desde la extracción de características hasta la elección y aplicación del modelo de predicción más eficiente. El método KNPTS, basado en la búsqueda de patrones en el histórico, se presenta como un paquete de R para estimar datos futuros. Además, se sugiere el uso de medidas elásticas de similitud para evaluar modelos de regresión y la importancia de métricas adecuadas en problemas de clases desbalanceadas. Las contribuciones se validaron en casos de uso industrial de diferentes campos: calidad de producto, previsión de consumo eléctrico, detección de porosidad y diagnóstico de máquinas

    2015 GREAT Day Program

    Get PDF
    SUNY Geneseo’s Ninth Annual GREAT Day.https://knightscholar.geneseo.edu/program-2007/1009/thumbnail.jp

    Algorithms for sparse convolution and sublinear edit distance

    Get PDF
    In this PhD thesis on fine-grained algorithm design and complexity, we investigate output-sensitive and sublinear-time algorithms for two important problems. (1) Sparse Convolution: Computing the convolution of two vectors is a basic algorithmic primitive with applications across all of Computer Science and Engineering. In the sparse convolution problem we assume that the input and output vectors have at most t nonzero entries, and the goal is to design algorithms with running times dependent on t. For the special case where all entries are nonnegative, which is particularly important for algorithm design, it is known since twenty years that sparse convolutions can be computed in near-linear randomized time O(t log^2 n). In this thesis we develop a randomized algorithm with running time O(t \log t) which is optimal (under some mild assumptions), and the first near-linear deterministic algorithm for sparse nonnegative convolution. We also present an application of these results, leading to seemingly unrelated fine-grained lower bounds against distance oracles in graphs. (2) Sublinear Edit Distance: The edit distance of two strings is a well-studied similarity measure with numerous applications in computational biology. While computing the edit distance exactly provably requires quadratic time, a long line of research has lead to a constant-factor approximation algorithm in almost-linear time. Perhaps surprisingly, it is also possible to approximate the edit distance k within a large factor O(k) in sublinear time O~(n/k + poly(k)). We drastically improve the approximation factor of the known sublinear algorithms from O(k) to k^{o(1)} while preserving the O(n/k + poly(k)) running time.In dieser Doktorarbeit über feinkörnige Algorithmen und Komplexität untersuchen wir ausgabesensitive Algorithmen und Algorithmen mit sublinearer Lauf-zeit für zwei wichtige Probleme. (1) Dünne Faltungen: Die Berechnung der Faltung zweier Vektoren ist ein grundlegendes algorithmisches Primitiv, das in allen Bereichen der Informatik und des Ingenieurwesens Anwendung findet. Für das dünne Faltungsproblem nehmen wir an, dass die Eingabe- und Ausgabevektoren höchstens t Einträge ungleich Null haben, und das Ziel ist, Algorithmen mit Laufzeiten in Abhängigkeit von t zu entwickeln. Für den speziellen Fall, dass alle Einträge nicht-negativ sind, was insbesondere für den Entwurf von Algorithmen relevant ist, ist seit zwanzig Jahren bekannt, dass dünn besetzte Faltungen in nahezu linearer randomisierter Zeit O(t \log^2 n) berechnet werden können. In dieser Arbeit entwickeln wir einen randomisierten Algorithmus mit Laufzeit O(t \log t), der (unter milden Annahmen) optimal ist, und den ersten nahezu linearen deterministischen Algorithmus für dünne nichtnegative Faltungen. Wir stellen auch eine Anwendung dieser Ergebnisse vor, die zu scheinbar unverwandten feinkörnigen unteren Schranken gegen Distanzorakel in Graphen führt. (2) Sublineare Editierdistanz: Die Editierdistanz zweier Zeichenketten ist ein gut untersuchtes Ähnlichkeitsmaß mit zahlreichen Anwendungen in der Computerbiologie. Während die exakte Berechnung der Editierdistanz nachweislich quadratische Zeit erfordert, hat eine lange Reihe von Forschungsarbeiten zu einem Approximationsalgorithmus mit konstantem Faktor in fast-linearer Zeit geführt. Überraschenderweise ist es auch möglich, die Editierdistanz k innerhalb eines großen Faktors O(k) in sublinearer Zeit O~(n/k + poly(k)) zu approximieren. Wir verbessern drastisch den Approximationsfaktor der bekannten sublinearen Algorithmen von O(k) auf k^{o(1)} unter Beibehaltung der O(n/k + poly(k))-Laufzeit
    • …
    corecore