674 research outputs found

    Longest Increasing Subsequence under Persistent Comparison Errors

    Full text link
    We study the problem of computing a longest increasing subsequence in a sequence SS of nn distinct elements in the presence of persistent comparison errors. In this model, every comparison between two elements can return the wrong result with some fixed (small) probability p p , and comparisons cannot be repeated. Computing the longest increasing subsequence exactly is impossible in this model, therefore, the objective is to identify a subsequence that (i) is indeed increasing and (ii) has a length that approximates the length of the longest increasing subsequence. We present asymptotically tight upper and lower bounds on both the approximation factor and the running time. In particular, we present an algorithm that computes an O(logn)O(\log n)-approximation in time O(nlogn)O(n\log n), with high probability. This approximation relies on the fact that that we can approximately sort nn elements in O(nlogn)O(n\log n) time such that the maximum dislocation of an element is at most O(logn)O(\log n). For the lower bounds, we prove that (i) there is a set of sequences, such that on a sequence picked randomly from this set every algorithm must return an Ω(logn)\Omega(\log n)-approximation with high probability, and (ii) any O(logn)O(\log n)-approximation algorithm for longest increasing subsequence requires Ω(nlogn)\Omega(n \log n) comparisons, even in the absence of errors

    Is there only one way out of in-work poverty? Difference by gender and race in the US

    Full text link
    The persistency of in-work poverty during the last de cades challenges the idea that employment is sufficient to escape poverty. Research has focused on the risk factors associated with in-work poverty, but scholars know little about individuals' experiences after exiting it. The Sequence Analysis Multistate Model procedure is applied to three high-quality longitudinal data sources (NLSY79, NLSY97, and PSID) to establish a typo logy of employment pathways out of in-work poverty and estimate how gender and race are associated with each pathway. We identify five distinct pathways characterized by varying degrees of labor market attachment, economic vulnerability, and volatility. White men are most likely exit in-work poverty into stable employment outside of poverty, while Black men and women likely remain vulnerable and at-risk of social exclusion as well as recurrent spells of in-work poverty. Gender and race differences persist even after controlling for labor market related characteristics and family demographic behavior

    A Group-Based Ring Oscillator Physical Unclonable Function

    Get PDF
    Silicon Physical Unclonable Function (PUF) is a physical structure of the chip which has functional characteristics that are hard to predict before fabrication but are expected to be unique after fabrication. This is caused by the random fabrication variations. The secret characteristics can only be extracted through physical measurement and will vanish immediately when the chip is powered down. PUF promises a securer means for cryptographic key generation and storage among many other security applications. However, there are still many practical challenges to cost effectively build secure and reliable PUF secrecy. This dissertation proposes new architectures for ring oscillator (RO) PUFs to answer these challenges. First, our temperature-aware cooperative (TAC) RO PUF can utilize certain ROs that were otherwise discarded due to their instability. Second, our novel group-based algorithm can generate secrecy higher than the theoretical upper bound of the conventional pairwise comparisons approach. Third, we build the first regression-based entropy distiller that can turn the PUF secrecy statistically random and robust, meeting the NIST standards. Fourth, we develop a unique Kendall syndrome coding (KSC) that makes the PUF secrecy error resilient against potential environmental fluctuations. Each of these methods can improve the hardware efficiency of the RO PUF implementation by 1.5X to 8X while improving the security and reliability of the PUF secrecy

    The user attribution problem and the challenge of persistent surveillance of user activity in complex networks

    Get PDF
    In telecommunication networks, the user attribution problem refers to the challenge faced in recognizing communication traffic as belonging to a given user when information needed to identify the user is missing. This problem becomes more difficult to tackle as users move across many mobile networks (complex networks) owned and operated by different providers. The traditional approach of using the source IP address as a tracking identifier does not work when used to identify mobile users. Recent efforts to address this problem by exclusively relying on web browsing behavior to identify users, brought to light the challenges of solutions which try to link up multiple user sessions together when these approaches rely exclusively on the frequency of web sites visited by the user. This study has tackled this problem by utilizing behavior based identification while accounting for time and the sequential order of web visits by a user. Hierarchical Temporal Memories (HTM) were used to classify historical navigational patterns for different users. This approach enables linking multiple user sessions together forgoing the need for a tracking identifier such as the source IP address. Results are promising. HTMs outperform traditional Markov chains based approaches and can provide high levels of identification accuracy

    Speckle statistics in adaptive optics images at visible wavelengths

    Full text link
    Residual speckles in adaptive optics (AO) images represent a well-known limitation on the achievement of the contrast needed for faint source detection. Speckles in AO imagery can be the result of either residual atmospheric aberrations, not corrected by the AO, or slowly evolving aberrations induced by the optical system. We take advantage of the high temporal cadence (1 ms) of the data acquired by the System for Coronagraphy with High-order Adaptive Optics from R to K bands-VIS forerunner experiment at the Large Binocular Telescope to characterize the AO residual speckles at visible wavelengths. An accurate knowledge of the speckle pattern and its dynamics is of paramount importance for the application of methods aimed at their mitigation. By means of both an automatic identification software and information theory, we study the main statistical properties of AO residuals and their dynamics. We therefore provide a speckle characterization that can be incorporated into numerical simulations to increase their realism and to optimize the performances of both real-time and postprocessing techniques aimed at the reduction of the speckle noise

    Client service capability matching

    Get PDF
    In order to tailor web-content to the requirements of a device, it is necessary to access information about the attributes of both the device and the web content Profiles containing such information from heterogeneous sources may use many different terms to represent the same concept (eg Resolution/Screen_Res/Res). This can present problems for applications which try to interpret the semantics of these terms In this thesis, we present an architecture which, when given profiles describing a device and web service, can identify terms that are present in an ontology of recognised terms in the domain of device capabilities and web service requirements The architecture can semi-automatically identify unknown terms by combining the results of several schemamatching applications. The ontology can be expanded based on end-user’s interaction with the semi-automatic matchers and thus over time the application’s ontology will grow to include previously unknown terms

    Utilizing Big Data in Identification and Correction of OCR Errors

    Full text link
    In this thesis, we report on our experiments for detection and correction of OCR errors with web data. More specifically, we utilize Google search to access the big data resources available to identify possible candidates for correction. We then use a combination of the Longest Common Subsequences (LCS) and Bayesian estimates to automatically pick the proper candidate. Our experimental results on a small set of historical newspaper data show a recall and precision of 51% and 100%, respectively. The work in this thesis further provides a detailed classification and analysis of all errors. In particular, we point out the shortcomings of our approach in its ability to suggest proper candidates to correct the remaining errors

    Predicting Flavonoid UGT Regioselectivity with Graphical Residue Models and Machine Learning.

    Get PDF
    Machine learning is applied to a challenging and biologically significant protein classification problem: the prediction of flavonoid UGT acceptor regioselectivity from primary protein sequence. Novel indices characterizing graphical models of protein residues are introduced. The indices are compared with existing amino acid indices and found to cluster residues appropriately. A variety of models employing the indices are then investigated by examining their performance when analyzed using nearest neighbor, support vector machine, and Bayesian neural network classifiers. Improvements over nearest neighbor classifications relying on standard alignment similarity scores are reported

    The User Attribution Problem and the Challenge of Persistent Surveillance of User Activity in Complex Networks

    Get PDF
    In the context of telecommunication networks, the user attribution problem refers to the challenge faced in recognizing communication traffic as belonging to a given user when information needed to identify the user is missing. This is analogous to trying to recognize a nameless face in a crowd. This problem worsens as users move across many mobile networks (complex networks) owned and operated by different providers. The traditional approach of using the source IP address, which indicates where a packet comes from, does not work when used to identify mobile users. Recent efforts to address this problem by exclusively relying on web browsing behavior to identify users were limited to a small number of users (28 and 100 users). This was due to the inability of solutions to link up multiple user sessions together when they rely exclusively on the web sites visited by the user. This study has tackled this problem by utilizing behavior based identification while accounting for time and the sequential order of web visits by a user. Hierarchical Temporal Memories (HTM) were used to classify historical navigational patterns for different users. Each layer of an HTM contains variable order Markov chains of connected nodes which represent clusters of web sites visited in time order by the user (user sessions). HTM layers enable inference generalization by linking Markov chains within and across layers and thus allow matching longer sequences of visited web sites (multiple user sessions). This approach enables linking multiple user sessions together without the need for a tracking identifier such as the source IP address. Results are promising. HTMs can provide high levels of accuracy using synthetic data with 99% recall accuracy for up to 500 users and good levels of recall accuracy of 95 % and 87% for 5 and 10 users respectively when using cellular network data. This research confirmed that the presence of long tail web sites (rarely visited) among many repeated destinations can create unique differentiation. What was not anticipated prior to this research was the very high degree of repetitiveness of some web destinations found in real network data

    Accelerating Event Stream Processing in On- and Offline Systems

    Get PDF
    Due to a growing number of data producers and their ever-increasing data volume, the ability to ingest, analyze, and store potentially never-ending streams of data is a mission-critical task in today's data processing landscape. A widespread form of data streams are event streams, which consist of continuously arriving notifications about some real-world phenomena. For example, a temperature sensor naturally generates an event stream by periodically measuring the temperature and reporting it with measurement time in case of a substantial change to the previous measurement. In this thesis, we consider two kinds of event stream processing: online and offline. Online refers to processing events solely in main memory as soon as they arrive, while offline means processing event data previously persisted to non-volatile storage. Both modes are supported by widely used scale-out general-purpose stream processing engines (SPEs) like Apache Flink or Spark Streaming. However, such engines suffer from two significant deficiencies that severely limit their processing performance. First, for offline processing, they load the entire stream from non-volatile secondary storage and replay all data items into the associated online engine in order of their original arrival. While this naturally ensures unified query semantics for on- and offline processing, the costs for reading the entire stream from non-volatile storage quickly dominate the overall processing costs. Second, modern SPEs focus on scaling out computations across the nodes of a cluster, but use only a fraction of the available resources of individual nodes. This thesis tackles those problems with three different approaches. First, we present novel techniques for the offline processing of two important query types (windowed aggregation and sequential pattern matching). Our methods utilize well-understood indexing techniques to reduce the total amount of data to read from non-volatile storage. We show that this improves the overall query runtime significantly. In particular, this thesis develops the first index-based algorithms for pattern queries expressed with the Match_Recognize clause, a new and powerful language feature of SQL that has received little attention so far. Second, we show how to maximize resource utilization of single nodes by exploiting the capabilities of modern hardware. Therefore, we develop a prototypical shared-memory CPU-GPU-enabled event processing system. The system provides implementations of all major event processing operators (filtering, windowed aggregation, windowed join, and sequential pattern matching). Our experiments reveal that regarding resource utilization and processing throughput, such a hardware-enabled system is superior to hardware-agnostic general-purpose engines. Finally, we present TPStream, a new operator for pattern matching over temporal intervals. TPStream achieves low processing latency and, in contrast to sequential pattern matching, is easily parallelizable even for unpartitioned input streams. This results in maximized resource utilization, especially for modern CPUs with multiple cores
    corecore