148,286 research outputs found

    A New Approach for Text String Detection from Natural Scenes By Grouping & Partition

    Get PDF
    In this paper we have reviewed and analyzed different methods to find strings of characters from natural scene images. We have reviewed different techniques like extraction of character string regions from scenery images based on contours and thickness of characters, efficient binarization and enhancement technique followed by a suitable connected component analysis procedure, text string detection from natural scenes by structure - based partition and grouping, and a robust algorithm for text detection in images. It is assumed that characters have closed contours, and a character string consists of characters which lie on a straight line in most cases. Therefore, by extracting closed contours and searching neighbors of them, character string regions can be extracted; Image binarization successfully processed natural scene images having shadows, non - uniform illumination, low contrast and large signal - dependent noise. Connected component analysis is used to define the final binary images that mainly consist of text regions. One technique chooses the candidate text characters from connected components by gradient feature and color feature. The text line grouping method performs Hough transform to fit text line among the centroids of text candidates. Each fitte d text line describes the orientation of a potential text string. The detected text string is presented by a rectangle region coveri ng all characters whose centroids are cascaded in its text line. To improve efficiency and accuracy, our algorithms are carried out in multi - scales. The proposed methods outperform the state - of - the - art results on the public Robust Reading Dataset, which contains text only in horizontal orientation. Furthermore, the effectiveness of our methods to detect text strings with arbitrary orientations is evaluated on the Oriented Scene Text Dataset collected by ourselves containing text strings in no horizontal orientations

    Energy Efficient Hardware Accelerators for Packet Classification and String Matching

    Get PDF
    This thesis focuses on the design of new algorithms and energy efficient high throughput hardware accelerators that implement packet classification and fixed string matching. These computationally heavy and memory intensive tasks are used by networking equipment to inspect all packets at wire speed. The constant growth in Internet usage has made them increasingly difficult to implement at core network line speeds. Packet classification is used to sort packets into different flows by comparing their headers to a list of rules. A flow is used to decide a packet’s priority and the manner in which it is processed. Fixed string matching is used to inspect a packet’s payload to check if it contains any strings associated with known viruses, attacks or other harmful activities. The contributions of this thesis towards the area of packet classification are hardware accelerators that allow packet classification to be implemented at core network line speeds when classifying packets using rulesets containing tens of thousands of rules. The hardware accelerators use modified versions of the HyperCuts packet classification algorithm. An adaptive clocking unit is also presented that dynamically adjusts the clock speed of a packet classification hardware accelerator so that its processing capacity matches the processing needs of the network traffic. This keeps dynamic power consumption to a minimum. Contributions made towards the area of fixed string matching include a new algorithm that builds a state machine that is used to search for strings with the aid of default transition pointers. The use of default transition pointers keep memory consumption low, allowing state machines capable of searching for thousands of strings to be small enough to fit in the on-chip memory of devices such as FPGAs. A hardware accelerator is also presented that uses these state machines to search through the payloads of packets for strings at core network line speeds

    Regular Expression Search on Compressed Text

    Full text link
    We present an algorithm for searching regular expression matches in compressed text. The algorithm reports the number of matching lines in the uncompressed text in time linear in the size of its compressed version. We define efficient data structures that yield nearly optimal complexity bounds and provide a sequential implementation --zearch-- that requires up to 25% less time than the state of the art.Comment: 10 pages, published in Data Compression Conference (DCC'19

    Ultra-high throughput string matching for deep packet inspection

    Get PDF
    Deep Packet Inspection (DPI) involves searching a packet's header and payload against thousands of rules to detect possible attacks. The increase in Internet usage and growing number of attacks which must be searched for has meant hardware acceleration has become essential in the prevention of DPI becoming a bottleneck to a network if used on an edge or core router. In this paper we present a new multi-pattern matching algorithm which can search for the fixed strings contained within these rules at a guaranteed rate of one character per cycle independent of the number of strings or their length. Our algorithm is based on the Aho-Corasick string matching algorithm with our modifications resulting in a memory reduction of over 98% on the strings tested from the Snort ruleset. This allows the search structures needed for matching thousands of strings to be small enough to fit in the on-chip memory of an FPGA. Combined with a simple architecture for hardware, this leads to high throughput and low power consumption. Our hardware implementation uses multiple string matching engines working in parallel to search through packets. It can achieve a throughput of over 40 Gbps (OC-768) when implemented on a Stratix 3 FPGA and over 10 Gbps (OC-192) when implemented on the lower power Cyclone 3 FPGA

    Corpus access for beginners: the W3Corpora project

    Get PDF

    Searching by approximate personal-name matching

    Get PDF
    We discuss the design, building and evaluation of a method to access theinformation of a person, using his name as a search key, even if it has deformations. We present a similarity function, the DEA function, based on the probabilities of the edit operations accordingly to the involved letters and their position, and using a variable threshold. The efficacy of DEA is quantitatively evaluated, without human relevance judgments, very superior to the efficacy of known methods. A very efficient approximate search technique for the DEA function is also presented based on a compacted trie-tree structure.Postprint (published version

    ESPOONERBAC_{{ERBAC}}: Enforcing Security Policies In Outsourced Environments

    Full text link
    Data outsourcing is a growing business model offering services to individuals and enterprises for processing and storing a huge amount of data. It is not only economical but also promises higher availability, scalability, and more effective quality of service than in-house solutions. Despite all its benefits, data outsourcing raises serious security concerns for preserving data confidentiality. There are solutions for preserving confidentiality of data while supporting search on the data stored in outsourced environments. However, such solutions do not support access policies to regulate access to a particular subset of the stored data. For complex user management, large enterprises employ Role-Based Access Controls (RBAC) models for making access decisions based on the role in which a user is active in. However, RBAC models cannot be deployed in outsourced environments as they rely on trusted infrastructure in order to regulate access to the data. The deployment of RBAC models may reveal private information about sensitive data they aim to protect. In this paper, we aim at filling this gap by proposing \textbf{ESPOONERBAC\mathit{ESPOON_{ERBAC}}} for enforcing RBAC policies in outsourced environments. ESPOONERBAC\mathit{ESPOON_{ERBAC}} enforces RBAC policies in an encrypted manner where a curious service provider may learn a very limited information about RBAC policies. We have implemented ESPOONERBAC\mathit{ESPOON_{ERBAC}} and provided its performance evaluation showing a limited overhead, thus confirming viability of our approach.Comment: The final version of this paper has been accepted for publication in Elsevier Computers & Security 2013. arXiv admin note: text overlap with arXiv:1306.482

    Direct Observation of Cosmic Strings via their Strong Gravitational Lensing Effect: II. Results from the HST/ACS Image Archive

    Full text link
    We have searched 4.5 square degrees of archival HST/ACS images for cosmic strings, identifying close pairs of similar, faint galaxies and selecting groups whose alignment is consistent with gravitational lensing by a long, straight string. We find no evidence for cosmic strings in five large-area HST treasury surveys (covering a total of 2.22 square degrees), or in any of 346 multi-filter guest observer images (1.18 square degrees). Assuming that simulations ccurately predict the number of cosmic strings in the universe, this non-detection allows us to place upper limits on the unitless Universal cosmic string tension of G mu/c^2 < 2.3 x 10^-6, and cosmic string density of Omega_s < 2.1 x 10^-5 at the 95% confidence level (marginalising over the other parameter in each case). We find four dubious cosmic string candidates in 318 single filter guest observer images (1.08 square degrees), which we are unable to conclusively eliminate with existing data. The confirmation of any one of these candidates as cosmic strings would imply G mu/c^2 ~ 10^-6 and Omega_s ~ 10^-5. However, we estimate that there is at least a 92% chance that these string candidates are random alignments of galaxies. If we assume that these candidates are indeed false detections, our final limits on G mu/c^2 and Omega_s fall to 6.5 x 10^-7 and 7.3 x 10^-6. Due to the extensive sky coverage of the HST/ACS image archive, the above limits are universal. They are quite sensitive to the number of fields being searched, and could be further reduced by more than a factor of two using forthcoming HST data.Comment: 21 pages, 18 figure
    corecore