116,909 research outputs found
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
Recommended from our members
Modeling the effects of combining diverse software fault detection techniques
The software engineering literature contains many studies of the efficacy of fault finding techniques. Few of these, however, consider what happens when several different techniques are used together. We show that the effectiveness of such multitechnique approaches depends upon quite subtle interplay between their individual efficacies and dependence between them. The modelling tool we use to study this problem is closely related to earlier work on software design diversity. The earliest of these results showed that, under quite plausible assumptions, it would be unreasonable even to expect software versions that were developed ‘truly independently’ to fail independently of one another. The key idea here was a ‘difficulty function’ over the input space. Later work extended these ideas to introduce a notion of ‘forced’ diversity, in which it became possible to obtain system failure behaviour better even than could be expected if the versions failed independently. In this paper we show that many of these results for design diversity have counterparts in diverse fault detection in a single software version. We define measures of fault finding effectiveness, and of diversity, and show how these might be used to give guidance for the optimal application of different fault finding procedures to a particular program. We show that the effects upon reliability of repeated applications of a particular fault finding procedure are not statistically independent - in fact such an incorrect assumption of independence will always give results that are too optimistic. For diverse fault finding procedures, on the other hand, things are different: here it is possible for effectiveness to be even greater than it would be under an assumption of statistical independence. We show that diversity of fault finding procedures is, in a precisely defined way, ‘a good thing’, and should be applied as widely as possible. The new model and its results are illustrated using some data from an experimental investigation into diverse fault finding on a railway signalling application
New Czechoslovak Hyphenation Patterns, Word Lists, and Workflow
Space- and time-effective segmentation and hyphenation of natural languages stay at the core of every document preparation system, web browser, or mobile rendering system. We use the unreasonable effectiveness of pattern generation with patgen. It is possible to use hyphenation patterns to solve the dictionary problem also for close languages without compromise. In this article, we show how we applied the marvelous effectiveness of patgen for the generation of the new Czechoslovak hyphenation patterns that cover both Czech and Slovak languages. We show that developing universal, up-to-date, high-coverage and high-generalization hyphenation patterns is feasible, generated from semi-automatically prepared word lists from actual language usage. We evaluate the new approach and argue that the new Czechoslovak hyphenation patterns bring significant coverage and generalization improvements, and space savings. We share all the data, word lists, and workflow for reproducibility and usage
Mathematics Is Physics
In this essay, I argue that mathematics is a natural science---just like
physics, chemistry, or biology---and that this can explain the alleged
"unreasonable" effectiveness of mathematics in the physical sciences. The main
challenge for this view is to explain how mathematical theories can become
increasingly abstract and develop their own internal structure, whilst still
maintaining an appropriate empirical tether that can explain their later use in
physics. In order to address this, I offer a theory of mathematical
theory-building based on the idea that human knowledge has the structure of a
scale-free network and that abstract mathematical theories arise from a
repeated process of replacing strong analogies with new hubs in this network.
This allows mathematics to be seen as the study of regularities, within
regularities, within ..., within regularities of the natural world. Since
mathematical theories are derived from the natural world, albeit at a much
higher level of abstraction than most other scientific theories, it should come
as no surprise that they so often show up in physics.
This version of the essay contains an addendum responding to Slyvia
Wenmackers' essay and comments that were made on the FQXi website.Comment: 15 pages, LaTeX. Second prize winner in 2015 FQXi Essay Contest (see
http://fqxi.org/community/forum/topic/2364
The utterly prosaic connection between physics and mathematics
Eugene Wigner famously argued for the "unreasonable effectiveness of
mathematics" for describing physics and other natural sciences in his 1960
essay. That essay has now led to some 55 years of (sometimes anguished) soul
searching --- responses range from "So what? Why do you think we developed
mathematics in the first place?", through to extremely speculative ruminations
on the existence of the universe (multiverse) as a purely mathematical entity
--- the Mathematical Universe Hypothesis. In the current essay I will steer an
utterly prosaic middle course: Much of the mathematics we develop is informed
by physics questions we are tying to solve; and those physics questions for
which the most utilitarian mathematics has successfully been developed are
typically those where the best physics progress has been made.Comment: 12 pages. Minor edits on an essay written for the 2015 FQXi essay
contest: "Trick or truth: The mysterious connection between physics and
mathematics
- …