3,238 research outputs found
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
A computer vision system for detecting and analysing critical events in cities
Whether for commuting or leisure, cycling is a growing transport mode in many cities worldwide. However, it is still perceived as a dangerous activity. Although serious incidents related to cycling leading to major injuries are rare, the fear of getting hit or falling hinders the expansion of cycling as a major transport mode. Indeed, it has been shown that focusing on serious injuries only touches the tip of the iceberg. Near miss data can provide much more information about potential problems and how to avoid risky situations that may lead to serious incidents. Unfortunately, there is a gap in the knowledge in identifying and analysing near misses. This hinders drawing statistically significant conclusions to provide measures for the built-environment that ensure a safer environment for people on bikes. In this research, we develop a method to detect and analyse near misses and their risk factors using artificial intelligence. This is accomplished by analysing video streams linked to near miss incidents within a novel framework relying on deep learning and computer vision. This framework automatically detects near misses and extracts their risk factors from video streams before analysing their statistical significance. It also provides practical solutions implemented in a camera with embedded AI (URBAN-i Box) and a cloud-based service (URBAN-i Cloud) to tackle the stated issue in the real-world settings for use by researchers, policy-makers, or citizens. The research aims to provide human-centred evidence that may enable policy-makers and planners to provide a safer built environment for cycling in London, or elsewhere. More broadly, this research aims to contribute to the scientific literature with the theoretical and empirical foundations of a computer vision system that can be utilised for detecting and analysing other critical events in a complex environment. Such a system can be applied to a wide range of events, such as traffic incidents, crime or overcrowding
LSST Science Book, Version 2.0
A survey that can cover the sky in optical bands over wide fields to faint
magnitudes with a fast cadence will enable many of the exciting science
opportunities of the next decade. The Large Synoptic Survey Telescope (LSST)
will have an effective aperture of 6.7 meters and an imaging camera with field
of view of 9.6 deg^2, and will be devoted to a ten-year imaging survey over
20,000 deg^2 south of +15 deg. Each pointing will be imaged 2000 times with
fifteen second exposures in six broad bands from 0.35 to 1.1 microns, to a
total point-source depth of r~27.5. The LSST Science Book describes the basic
parameters of the LSST hardware, software, and observing plans. The book
discusses educational and outreach opportunities, then goes on to describe a
broad range of science that LSST will revolutionize: mapping the inner and
outer Solar System, stellar populations in the Milky Way and nearby galaxies,
the structure of the Milky Way disk and halo and other objects in the Local
Volume, transient and variable objects both at low and high redshift, and the
properties of normal and active galaxies at low and high redshift. It then
turns to far-field cosmological topics, exploring properties of supernovae to
z~1, strong and weak lensing, the large-scale distribution of galaxies and
baryon oscillations, and how these different probes may be combined to
constrain cosmological models and the physics of dark energy.Comment: 596 pages. Also available at full resolution at
http://www.lsst.org/lsst/sciboo
Fine-Grained Image Analysis with Deep Learning: A Survey
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem
in computer vision and pattern recognition, and underpins a diverse set of
real-world applications. The task of FGIA targets analyzing visual objects from
subordinate categories, e.g., species of birds or models of cars. The small
inter-class and large intra-class variation inherent to fine-grained image
analysis makes it a challenging problem. Capitalizing on advances in deep
learning, in recent years we have witnessed remarkable progress in deep
learning powered FGIA. In this paper we present a systematic survey of these
advances, where we attempt to re-define and broaden the field of FGIA by
consolidating two fundamental fine-grained research areas -- fine-grained image
recognition and fine-grained image retrieval. In addition, we also review other
key issues of FGIA, such as publicly available benchmark datasets and related
domain-specific applications. We conclude by highlighting several research
directions and open problems which need further exploration from the community.Comment: Accepted by IEEE TPAM
Recommended from our members
Exploiting multimodality and structure in world representations
An essential aim of artificial intelligence research is to design agents that will eventually cooperate with humans within the real world. To this end, embodied learning is emerging as one of the most important efforts contributed by the machine learning community towards this goal. Recently developing sub-fields concern various aspects of such systems---visual reasoning, language representations, causal mechanisms, robustness to out-of-distribution inputs, to name only a few.
In particular, multimodal learning and language grounding are vital to achieving a strong understanding of the real world. Humans build internal representations via interacting with their environment, learning complex associations between visual, auditory and linguistic concepts. Since the world abounds with structure, graph-based encodings are also likely to be incorporated in reasoning and decision-making modules. Furthermore, these relational representations are rather symbolic in nature---providing advantages over other formats, such as raw pixels---and can encode various types of links (temporal, causal, spatial) which can be essential for understanding and acting in the real world.
This thesis presents three research works that study and develop likely aspects of future intelligent agents. The first contribution centers on vision-and-language learning, introducing a challenging embodied task that shifts the focus of an existing one to the visual reasoning problem. By extending popular visual question answering (VQA) paradigms, I also designed several models that were evaluated on the novel dataset. This produced initial performance estimates for environment understanding, through the lens of a more challenging VQA downstream task. The second work presents two ways of obtaining hierarchical representations of graph-structured data. These methods either scaled to much larger graphs than the ones processed by the best-performing method at the time, or incorporated theoretical properties via the use of topological data analysis algorithms. Both approaches competed with contemporary state-of-the-art graph classification methods, even outside social domains in the second case, where the inductive bias was PageRank-driven. Finally, the third contribution delves further into relational learning, presenting a probabilistic treatment of graph representations in complex settings such as few-shot, multi-task learning and scarce-labelled data regimes. By adding relational inductive biases to neural processes, the resulting framework can model an entire distribution of functions which generate datasets with structure. This yielded significant performance gains, especially in the aforementioned complex scenarios, with semantically-accurate uncertainty estimates that drastically improved over the neural process baseline. This type of framework may eventually contribute to developing lifelong-learning systems, due to its ability to adapt to novel tasks and distributions.
The benchmark, methods and frameworks that I have devised during my doctoral studies suggest important future directions for embodied and graph representation learning research. These areas have increasingly proved their relevance to designing intelligent and collaborative agents, which we may interact with in the near future. By addressing several challenges in this problem space, my contributions therefore take a few steps towards building machine learning systems to be deployed in real-life settings.DREAM CD
BlogForever: D3.1 Preservation Strategy Report
This report describes preservation planning approaches and strategies recommended by the BlogForever project as a core component of a weblog repository design. More specifically, we start by discussing why we would want to preserve weblogs in the first place and what it is exactly that we are trying to preserve. We further present a review of past and present work and highlight why current practices in web archiving do not address the needs of weblog preservation adequately. We make three distinctive contributions in this volume: a) we propose transferable practical workflows for applying a combination of established metadata and repository standards in developing a weblog repository, b) we provide an automated approach to identifying significant properties of weblog content that uses the notion of communities and how this affects previous strategies, c) we propose a sustainability plan that draws upon community knowledge through innovative repository design
- …