310,990 research outputs found

    Query Enrichment for Image Collections by Reuse of Classification Rules

    Get PDF
    User queries over image collections, based on semantic similarity, can be processed in several ways. In this paper, we propose to reuse the rules produced by rule-based classifiers in their recognition models as query pattern definitions for searching image collections

    tacg – a grep for DNA

    Get PDF
    BACKGROUND: Pattern matching is the core of bioinformatics; it is used in database searching, restriction enzyme mapping, and finding open reading frames. It is done repeatedly over increasingly long sequences, thus codes must be efficient and insensitive to sequence length. Such patterns of interest include simple motifs with IUPAC degeneracies, regular expressions, patterns allowing mismatches, and probability matrices. RESULTS: I describe a small application which allows searching for all the above pattern types individually, which further allows these atomic motifs to be assembled into logical rules for more sophisticated analysis. CONCLUSION: tacg is small, portable, faster and more capable than most alternatives, relatively easy to modify, and freely available in source code

    ANALISIS IMPLEMENTASI ALGORITMA PATTERN DECOMPOSITION UNTUK MENCARI POLA ASOSIASI PADA DATA MINING ANALYSIS OF PATTERN DECOMPOSITION ALGORITHM IMPLEMENTATION FOR SEARCHING ASSOCIATION RULES IN DATA MINING

    Get PDF
    ABSTRAKSI: Data mining adalah salah satu bidang yang berkembang pesat karena besarnya kebutuhan akan nilai tambah dari basis data skala besar yang makin banyak terakumulasi sejalan dengan pertumbuhan teknologi informasi. Implementasi dari data mining dapat memberikan kontribusi yang penting dalam dunia bisnis. Pola-pola asosiasi yang dihasilkan dapat digunakan sebagai bahan pertimbangan dalam pengambilan keputusan dalam suatu perusahaan. Asosiasi merupakan salah satu fungsionalitas atau teknik dari data mining untuk menemukan aturan assosiatif antara suatu kombinasi item. Berbagai algoritma pernah dikembangkan untuk mendapatkan pola-pola asosiasi dengan mempertimbangkan aspek efektifitas dan efisiensi. Tugas Akhir ini membahas analisis data mining untuk mencari pola asosiasi dari suatu data transaksi pada sebuah aplikasi yang menerapkan algoritma Pattern Decomposition. Analisis dilakukakan berdasarkan hasil pengujian. Dari pengujian didapatkan bahwa semakin besar nilai minimum support maka akan semakin kecil frequent itemsets yang dapat dibangkitkan. Hubungan linear antara minimum support dan frequent itemsets digambarkan dalam sebuah rumusan : f(x) = a*1/x^2 + c Dataset yang digunakan dalam pengujian sangat berpengaruh. Dataset dengan jumlah record yang besar akan membutuhkan waktu yang lama dalam proses pembangkitan, terlebih jika minimum support-nya kecil. Kata Kunci : data mining, asosiasi, pattern decomposition, minimum support, frequent itemsets.ABSTRACT: Data Mining is one of area which grows rapidly because level of requirement of added value from big scale database which gets a lot of accumulation in line with information technology growth. Implementation from data mining can give important contribution in the world of business. Association pattern yielded can be used as consideration material in decision making in a company. Association represent one of fungsionalist or technique from data mining to find the assosiatif order among an item combination. Various algorithm have been developed to get the association pattern by considering aspect of good effective and efficiency. This final task criticism analysis of data mining to look for the association rules from a transaction dataset on application by using Pattern Decomposition algorithm. Analysis based on experimentation result. From the experimentation we can get information that if minimum support is large, frequent itemsets is small. Linear relationship between minimum support dan frequent itemsets can be difined with this formula : f(x) = a*1/x^2 + c A dataset that used atexperimentation is very influential. Dataset with high number record will need many time for processing all mining, moreover with small minimum support.Keyword: data mining, association, pattern decomposition, minimum support, frequent itemsets

    Ultra-high throughput string matching for deep packet inspection

    Get PDF
    Deep Packet Inspection (DPI) involves searching a packet's header and payload against thousands of rules to detect possible attacks. The increase in Internet usage and growing number of attacks which must be searched for has meant hardware acceleration has become essential in the prevention of DPI becoming a bottleneck to a network if used on an edge or core router. In this paper we present a new multi-pattern matching algorithm which can search for the fixed strings contained within these rules at a guaranteed rate of one character per cycle independent of the number of strings or their length. Our algorithm is based on the Aho-Corasick string matching algorithm with our modifications resulting in a memory reduction of over 98% on the strings tested from the Snort ruleset. This allows the search structures needed for matching thousands of strings to be small enough to fit in the on-chip memory of an FPGA. Combined with a simple architecture for hardware, this leads to high throughput and low power consumption. Our hardware implementation uses multiple string matching engines working in parallel to search through packets. It can achieve a throughput of over 40 Gbps (OC-768) when implemented on a Stratix 3 FPGA and over 10 Gbps (OC-192) when implemented on the lower power Cyclone 3 FPGA

    Algorithmic and technical improvements: Optimal solutions to the (Generalized) Multi-Weber Problem

    Get PDF
    Rosing has recently demonstrated a new method for obtaining optimal solutions to the (Generalized) Multi-Weber Problem and proved the optimality of the results. The method develops all convex hulls and then covers the destinations with disjoint convex hulls. This paper seeks to improve implementation of the algorithm to make such solutions economically attractive. Four areas are considered: sharper decision rules to eliminate unnecessary searching, bit pattern matching as a method of recording a history and eliminating duplication, vector intrinsic functions to speed up comparisons, and profiling a program to maximize operating efficiency. Computational experience is also presented

    siEDM: an efficient string index and search algorithm for edit distance with moves

    Full text link
    Although several self-indexes for highly repetitive text collections exist, developing an index and search algorithm with editing operations remains a challenge. Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string into another. Although the problem of computing EDM is intractable, it has a wide range of potential applications, especially in approximate string retrieval. Despite the importance of computing EDM, there has been no efficient method for indexing and searching large text collections based on the EDM measure. We propose the first algorithm, named string index for edit distance with moves (siEDM), for indexing and searching strings with EDM. The siEDM algorithm builds an index structure by leveraging the idea behind the edit sensitive parsing (ESP), an efficient algorithm enabling approximately computing EDM with guarantees of upper and lower bounds for the exact EDM. siEDM efficiently prunes the space for searching query strings by the proposed method, which enables fast query searches with the same guarantee as ESP. We experimentally tested the ability of siEDM to index and search strings on benchmark datasets, and we showed siEDM's efficiency.Comment: 23 page
    corecore