3,148 research outputs found
A New Quartet Tree Heuristic for Hierarchical Clustering
We consider the problem of constructing an an optimal-weight tree from the
3*(n choose 4) weighted quartet topologies on n objects, where optimality means
that the summed weight of the embedded quartet topologiesis optimal (so it can
be the case that the optimal tree embeds all quartets as non-optimal
topologies). We present a heuristic for reconstructing the optimal-weight tree,
and a canonical manner to derive the quartet-topology weights from a given
distance matrix. The method repeatedly transforms a bifurcating tree, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. This contrasts to other heuristic search methods
from biological phylogeny, like DNAML or quartet puzzling, which, repeatedly,
incrementally construct a solution from a random order of objects, and
subsequently add agreement values.Comment: 22 pages, 14 figure
The Google Similarity Distance
Words and phrases acquire meaning from the way they are used in society, from
their relative semantics to other words and phrases. For computers the
equivalent of `society' is `database,' and the equivalent of `use' is `way to
search the database.' We present a new theory of similarity between words and
phrases based on information distance and Kolmogorov complexity. To fix
thoughts we use the world-wide-web as database, and Google as search engine.
The method is also applicable to other search engines and databases. This
theory is then applied to construct a method to automatically extract
similarity, the Google similarity distance, of words and phrases from the
world-wide-web using Google page counts. The world-wide-web is the largest
database on earth, and the context information entered by millions of
independent users averages out to provide automatic semantics of useful
quality. We give applications in hierarchical clustering, classification, and
language translation. We give examples to distinguish between colors and
numbers, cluster names of paintings by 17th century Dutch masters and names of
books by English novelists, the ability to understand emergencies, and primes,
and we demonstrate the ability to do a simple automatic English-Spanish
translation. Finally, we use the WordNet database as an objective baseline
against which to judge the performance of our method. We conduct a massive
randomized trial in binary classification using support vector machines to
learn categories based on our Google distance, resulting in an a mean agreement
of 87% with the expert crafted WordNet categories.Comment: 15 pages, 10 figures; changed some text/figures/notation/part of
theorem. Incorporated referees comments. This is the final published version
up to some minor changes in the galley proof
Normalized Web Distance and Word Similarity
There is a great deal of work in cognitive psychology, linguistics, and
computer science, about using word (or phrase) frequencies in context in text
corpora to develop measures for word similarity or word association, going back
to at least the 1960s. The goal of this chapter is to introduce the
normalizedis a general way to tap the amorphous low-grade knowledge available
for free on the Internet, typed in by local users aiming at personal
gratification of diverse objectives, and yet globally achieving what is
effectively the largest semantic electronic database in the world. Moreover,
this database is available for all by using any search engine that can return
aggregate page-count estimates for a large range of search-queries. In the
paper introducing the NWD it was called `normalized Google distance (NGD),' but
since Google doesn't allow computer searches anymore, we opt for the more
neutral and descriptive NWD. web distance (NWD) method to determine similarity
between words and phrases. ItComment: Latex, 20 pages, 7 figures, to appear in: Handbook of Natural
Language Processing, Second Edition, Nitin Indurkhya and Fred J. Damerau
Eds., CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010, ISBN
978-142008592
A Fast Quartet Tree Heuristic for Hierarchical Clustering
The Minimum Quartet Tree Cost problem is to construct an optimal weight tree
from the weighted quartet topologies on objects, where
optimality means that the summed weight of the embedded quartet topologies is
optimal (so it can be the case that the optimal tree embeds all quartets as
nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized
hill climbing, for approximating the optimal weight tree, given the quartet
topology weights. The method repeatedly transforms a dendrogram, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. The problem and the solution heuristic has been
extensively used for general hierarchical clustering of nontree-like
(non-phylogeny) data in various domains and across domains with heterogeneous
data. We also present a greatly improved heuristic, reducing the running time
by a factor of order a thousand to ten thousand. All this is implemented and
available, as part of the CompLearn package. We compare performance and running
time of the original and improved versions with those of UPGMA, BioNJ, and NJ,
as implemented in the SplitsTree package on genomic data for which the latter
are optimized.
Keywords: Data and knowledge visualization, Pattern
matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering,
Global optimization, Quartet tree, Randomized hill-climbing,Comment: LaTeX, 40 pages, 11 figures; this paper has substantial overlap with
arXiv:cs/0606048 in cs.D
Worrying and rumination are both associated with reduced cognitive control
Persistent negative thought is a hallmark feature of both major depressive disorder and generalized anxiety disorder. Despite its clinical significance, little is known about the underlying mechanisms of persistent negative thought. Recent studies suggest that reduced cognitive control might be an explanatory factor. We investigated the association between persistent negative thought and switching between internal representations in working memory, using the internal shift task (IST). The IST was administered to a group of undergraduates, classified as high-ruminators versus low-ruminators, or high-worriers versus low-worriers. Results showed that high-ruminators and high-worriers have more difficulties to switch between internal representations in working memory as opposed to low-ruminators and low-worriers. Importantly, results were only significant when the negative stimuli used in the IST reflected personally relevant worry themes for the participants. The results of this study indicate that rumination and worrying are both associated with reduced cognitive control for verbal information that is personally relevant
Normalized Information Distance
The normalized information distance is a universal distance measure for
objects of all kinds. It is based on Kolmogorov complexity and thus
uncomputable, but there are ways to utilize it. First, compression algorithms
can be used to approximate the Kolmogorov complexity if the objects have a
string representation. Second, for names and abstract concepts, page count
statistics from the World Wide Web can be used. These practical realizations of
the normalized information distance can then be applied to machine learning
tasks, expecially clustering, to perform feature-free and parameter-free data
mining. This chapter discusses the theoretical foundations of the normalized
information distance and both practical realizations. It presents numerous
examples of successful real-world applications based on these distance
measures, ranging from bioinformatics to music clustering to machine
translation.Comment: 33 pages, 12 figures, pdf, in: Normalized information distance, in:
Information Theory and Statistical Learning, Eds. M. Dehmer, F.
Emmert-Streib, Springer-Verlag, New-York, To appea
Diffusion-Induced Oscillations of Extended Defects
From a simple model for the driven motion of a planar interface under the
influence of a diffusion field we derive a damped nonlinear oscillator equation
for the interface position. Inside an unstable regime, where the damping term
is negative, we find limit-cycle solutions, describing an oscillatory
propagation of the interface. In case of a growing solidification front this
offers a transparent scenario for the formation of solute bands in binary
alloys, and, taking into account the Mullins-Sekerka instability, of banded
structures
A20 deficiency sensitizes pancreatic beta cells to cytokine-induced apoptosis in vitro but does not influence type 1 diabetes development in vivo
SCOPUS: ar.jinfo:eu-repo/semantics/publishe
Implementing ‘PATRIOT’ As an Integrated Model of Instruction to Rebuild the Culture of Entrepreneurship in Higher Education
This research was aimed to find an instructional technology program of Entrepreneurship from theory to practice. Those integrated events were showed through mastering theoretical knowledge acquisition, and then applied in the business firms and completed by action. The research activities start from prototyping four instructional packet programs based on the PATRIOT’s model of instruction, and then offered to students through integrated instruction to increase abilities to conduct the Totally Entrepreneurship actions (
- …