Search CORE

393 research outputs found

Non-derivable itemset mining

Author: A Bykowski
A Dobra
A Dobra
AA Melkman
B Goethals
Bart Goethals
C Bonferroni
C Jordan
J Boulicaut
J Kahn
J Pei
M Fréchet
Toon Calders
Y Bastide
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Towards Rare Itemset Mining

Author: Napoli Amedeo
Szathmary Laszlo
Valtchev Petko
Publication venue: HAL CCSD
Publication date: 29/10/2007
Field of study

site de la conférence : http://ictai07.ceid.upatras.gr/International audienceWe describe here a general approach for rare itemset mining. While mining literature has been almost exclusively focused on frequent itemsets, in many practical situations rare ones are of higher interest (e.g., in medical databases, rare combinations of symptoms might provide useful insights for the physicians). Based on an examination of the relevant substructures of the mining space, our approach splits the rare itemset mining task into two steps, i.e., frequent itemset part traversal and rare itemset listing. We propose two algorithms for step one, a naive and an optimized one, respectively, and another algorithm for step two. We also provide some empirical evidence about the performance gains due to the optimized traversal

INRIA a CCSD electronic archive server

On utilising change over time in data mining

Author: Böttcher Mirko
Publication venue: Universitätsbibl.
Publication date
Field of study

Magdeburg, Univ., Fak. für Informatik, Diss., 2013von Mirko Böttche

Digital University Library Saxony-Anhalt

A Survey of Symbolic Execution Techniques

Author: Baldoni Roberto
Coppa Emilio
D'Elia Daniele Cono
Demetrescu Camil
Finocchi Irene
Publication venue
Publication date: 01/01/2018
Field of study

Many security and software testing applications require checking whether certain properties of a program hold for any possible usage scenario. For instance, a tool for identifying software vulnerabilities may need to rule out the existence of any backdoor to bypass a program's authentication. One approach would be to test the program using different, possibly random inputs. As the backdoor may only be hit for very specific program workloads, automated exploration of the space of possible inputs is of the essence. Symbolic execution provides an elegant solution to the problem, by systematically exploring many possible execution paths at the same time without necessarily requiring concrete inputs. Rather than taking on fully specified input values, the technique abstractly represents them as symbols, resorting to constraint solvers to construct actual instances that would cause property violations. Symbolic execution has been incubated in dozens of tools developed over the last four decades, leading to major practical breakthroughs in a number of prominent software reliability applications. The goal of this survey is to provide an overview of the main ideas, challenges, and solutions developed in the area, distilling them for a broad audience. The present survey has been accepted for publication at ACM Computing Surveys. If you are considering citing this survey, we would appreciate if you could use the following BibTeX entry: http://goo.gl/Hf5FvcComment: This is the authors pre-print copy. If you are considering citing this survey, we would appreciate if you could use the following BibTeX entry: http://goo.gl/Hf5Fv

arXiv.org e-Print Archive

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Archivio della ricerca- Università di Roma La Sapienza

Darstellung und stochastische Auflösung von Ambiguität in constraint-basiertem Parsing

Author: Eisele Andreas
Publication venue
Publication date: 05/02/2013
Field of study

Diese Arbeit untersucht zwei komplementäre Ansätze zum Umgang mit Mehrdeutigkeiten bei der automatischen Verarbeitung natürlicher Sprache. Zunächst werden Methoden vorgestellt, die es erlauben, viele konkurrierende Interpretationen in einer gemeinsamen Datenstruktur kompakt zu repräsentieren. Dann werden Ansätze vorgeschlagen, die verschiedenen Interpretationen mit Hilfe von stochastischen Modellen zu bewerten. Für das dabei auftretende Problem, Wahrscheinlichkeiten von seltenen Ereignissen zu schätzen, die in den Trainingsdaten nicht auftraten, werden neuartige Methoden vorgeschlagen.This thesis investigates two complementary approches to cope with ambiguities in natural language processing. It first presents methods that allow to store many competing interpretations compactly in one shared datastructure. It then suggests approaches to score the different interpretations using stochastic models. This leads to the problem of estimation of probabilities of rare events that have not been observed in the training data, for which novel methods are proposed

A heuristic-based approach to code-smell detection

Author: Kirk D.
Roper M.
Wood M.
Publication venue: Nova Science Publishers, Inc.
Publication date: 01/01/2007
Field of study

Encapsulation and data hiding are central tenets of the object oriented paradigm. Deciding what data and behaviour to form into a class and where to draw the line between its public and private details can make the difference between a class that is an understandable, flexible and reusable abstraction and one which is not. This decision is a difficult one and may easily result in poor encapsulation which can then have serious implications for a number of system qualities. It is often hard to identify such encapsulation problems within large software systems until they cause a maintenance problem (which is usually too late) and attempting to perform such analysis manually can also be tedious and error prone. Two of the common encapsulation problems that can arise as a consequence of this decomposition process are data classes and god classes. Typically, these two problems occur together – data classes are lacking in functionality that has typically been sucked into an over-complicated and domineering god class. This paper describes the architecture of a tool which automatically detects data and god classes that has been developed as a plug-in for the Eclipse IDE. The technique has been evaluated in a controlled study on two large open source systems which compare the tool results to similar work by Marinescu, who employs a metrics-based approach to detecting such features. The study provides some valuable insights into the strengths and weaknesses of the two approache

University of Strathclyde Institutional Repository