Search CORE

2,585 research outputs found

Decrypting The Java Gene Pool: Predicting Objects' Lifetimes with Micro-patterns

Author: Jones Richard
Marion Sebastien
Ryder Chris
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/10/2007
Field of study

Pretenuring long-lived and immortal objects into infrequently or never collected regions reduces garbage collection costs significantly. However, extant approaches either require computationally expensive, application-specific, off-line profiling, or consider only allocation sites common to all programs, i.e. invoked by the virtual machine rather than application programs. In contrast, we show how a simple program analysis, combined with an object lifetime knowledge bank, can be exploited to match both runtime system and application program structure with object lifetimes. The complexity of the analysis is linear in the size of the program, so need not be run ahead of time. We obtain performance gains between 6-77% in GC time against a generational copying collector for several SPEC jvm98 programs

DiffNodesets: An Efficient Structure for Fast Mining Frequent Itemsets

Author: Deng Zhi-Hong
Publication venue: 'Elsevier BV'
Publication date: 06/07/2015
Field of study

Mining frequent itemsets is an essential problem in data mining and plays an important role in many data mining applications. In recent years, some itemset representations based on node sets have been proposed, which have shown to be very efficient for mining frequent itemsets. In this paper, we propose DiffNodeset, a novel and more efficient itemset representation, for mining frequent itemsets. Based on the DiffNodeset structure, we present an efficient algorithm, named dFIN, to mining frequent itemsets. To achieve high efficiency, dFIN finds frequent itemsets using a set-enumeration tree with a hybrid search strategy and directly enumerates frequent itemsets without candidate generation under some case. For evaluating the performance of dFIN, we have conduct extensive experiments to compare it against with existing leading algorithms on a variety of real and synthetic datasets. The experimental results show that dFIN is significantly faster than these leading algorithms.Comment: 22 pages, 13 figure

arXiv.org e-Print Archive

DCADE: divide and conquer alignment with dynamic encoding for full page data extraction

Author: Chang Chia-Hui
Yuliana Oviliani Yenty
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/07/2019
Field of study

In this paper, we consider the problem of full schema induction from either multiple list pages or singleton pages with the same template. Existing approaches do not work well for this problem because they use fixed abstraction schemes that are suitable for data-rich detection, but they are not appropriate for small records and complex data found in other sections. We propose an unsupervised full schema web data extraction via Divide-and-Conquer Alignment with Dynamic Encoding (DCADE for short). We define the Content Equivalence Class (CEC) and Typeset Equivalence Class (TEC) based on leaf node content. We then combine HTML attributes (i.e., id and class) in the paths for various levels of encoding, so that the proposed algorithm can align leaf nodes by exploring patterns at various levels from specific to general. We conducted experiments on 49 real-world websites used in TEX and ExAlg. The proposed DCADE achieved a 0.962 F1 measure for non-recordset data extraction (denoted by FD), and a 0.936 F1 measure for recordset data extraction (denoted by FS), which outperformed other page-level web data extraction methods, i.e., DCA ( FD=0.660), TEX (FD=0.454 and FS=0.549), RoadRunner (FD=0.396 and FS=0.330), and UWIDE (FD=0.260 and FS=0.081)

Feature Model Extraction from Large Collections of Informal Product Descriptions

Author: Davril Jean-Marc
Delfosse Edouard
Publication venue
Publication date: 03/09/2013
Field of study

Evolving rules for document classification

Author: A. Bergström
C. Apté
C.M. Tan
D. Montana
D.R. Tauritz
F. Sebastiani
G. Salton
H. Lodhi
J.R. Koza
K. Bennet
M. Damashek
T. Joachims
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

We describe a novel method for using Genetic Programming to create compact classification rules based on combinations of N-Grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that because the induced rules are meaningful to a human analyst they may have a number of other uses beyond classification and provide a basis for text mining applications

CiteSeerX