37,912 research outputs found
Discovering Knowledge from Local Patterns with Global Constraints
It is well known that local patterns are at the core of a lot of
knowledge which may be discovered from data. Nevertheless, use of local
patterns is limited by
their huge number and computational costs. Several approaches (e.g.,
condensed representations, pattern set discovery) aim at grouping or
synthesizing local patterns to provide a global view of the data. A
global pattern is a pattern which is a set or a synthesis of local
patterns coming from the data. In this paper, we propose the idea of
global constraints to write queries addressing global patterns. A key
point is the ability to bias the designing of global patterns according
to the expectation of the user. For instance, a global pattern can be
oriented towards the search of exceptions or a clustering. It requires
to write queries taking into account such biases. Open issues are to
design a generic framework to express powerful global constraints and
solvers to mine them. We think that global constraints are a promising
way to discover relevant global patterns
Unexpected rules using a conceptual distance based on fuzzy ontology
AbstractOne of the major drawbacks of data mining methods is that they generate a notably large number of rules that are often obvious or useless or, occasionally, out of the user’s interest. To address such drawbacks, we propose in this paper an approach that detects a set of unexpected rules in a discovered association rule set. Generally speaking, the proposed approach investigates the discovered association rules using the user’s domain knowledge, which is represented by a fuzzy domain ontology. Next, we rank the discovered rules according to the conceptual distances of the rules
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic
There has been much attention given recently to the task
of finding interesting patterns in temporal databases. Since there are so
many different approaches to the problem of discovering temporal patterns,
we first present a characterization of different discovery tasks and
then focus on one task of discovering interesting patterns of events in
temporal sequences. Given an (infinite) temporal database or a sequence
of events one can, in general, discover an infinite number of temporal
patterns in this data. Therefore, it is important to specify some measure
of interestingness for discovered patterns and then select only the patterns
interesting according to this measure. We present a probabilistic
measure of interestingness based on unexpectedness, whereby a pattern P
is deemed interesting if the ratio of the actual number of occurrences of
P exceeds the expected number of occurrences of P by some user defined
threshold. We then make use of a subset of the propositional, linear temporal
logic and present an efficient algorithm that discovers unexpected
patterns in temporal data. Finally, we apply this algorithm to synthetic
data, UNIX operating system calls, and Web logfiles and present the
results of these experiments.Information Systems Working Papers Serie
Virus evolution : the emergence of new ideas (and re-emergence of old ones)
Reputed intractable, the question of the origin of viruses has long been
neglected. In the modern literature 'Virus evolution' has come to refer to
study more akin to population genetics, such as the world-wide scrutiny on new
polymorphisms appearing daily in the H5N1 avian flu virus [1], than to the
fundamental interrogation: where do viruses come from? This situation is now
rapidly changing, due to the coincidence of bold new ideas (and sometimes the
revival of old ones), the unexpected features exhibited by recently isolated
spectacular viruses [2] (see at URL: www.giantvirus.org), as well as the steady
increase of genomic sequences for 'regular' viruses and cellular organisms
enhancing the power of comparative genomics [3]. After being considered
non-living and relegated in the wings by a majority of biologists, viruses are
now pushed back on the center stage: they might have been at the origin of DNA,
of the eukaryotic cell, and even of today's partition of biological organisms
into 3 domains of life: bacteria, archaea and eukarya. Here, I quickly survey
some of the recent discoveries and the new evolutionary thoughts they have
prompted, before adding to the confusion with one interrogation of my own: what
if we totally missed the true nature of (at least some) viruses?Comment: submitte
No wisdom in the crowd: genome annotation at the time of big data - current status and future prospects
Science and engineering rely on the accumulation
and dissemination of knowledge to make discoveries
and create new designs. Discovery-driven genome
research rests on knowledge passed on via gene
annotations. In response to the deluge of sequencing
big data, standard annotation practice employs automated
procedures that rely on majority rules. We
argue this hinders progress through the generation
and propagation of errors, leading investigators into
blind alleys. More subtly, this inductive process discourages
the discovery of novelty, which remains
essential in biological research and reflects the nature
of biology itself. Annotation systems, rather than
being repositories of facts, should be tools that support
multiple modes of inference. By combining
deduction, induction and abduction, investigators can
generate hypotheses when accurate knowledge is
extracted from model databases. A key stance is to
depart from ‘the sequence tells the structure tells the
function’ fallacy, placing function first. We illustrate
our approach with examples of critical or unexpected
pathways, using MicroScope to demonstrate how
tools can be implemented following the principles we
advocate. We end with a challenge to the reader
- …