59,293 research outputs found
Thick 2D Relations for Document Understanding
We use a propositional language of qualitative rectangle relations to detect the reading order from document images. To this end, we define the notion of a document encoding rule and we analyze possible formalisms to express document encoding rules such as LATEX and SGML. Document encoding rules expressed in the propositional language of rectangles are used to build a reading order detector for document images. In order to achieve robustness and avoid brittleness when applying the system to real life document images, the notion of a thick boundary interpretation for a qualitative relation is introduced. The framework is tested on a collection of heterogeneous document images showing recall rates up to 89%
Big Data and Reliability Applications: The Complexity Dimension
Big data features not only large volumes of data but also data with
complicated structures. Complexity imposes unique challenges in big data
analytics. Meeker and Hong (2014, Quality Engineering, pp. 102-116) provided an
extensive discussion of the opportunities and challenges in big data and
reliability, and described engineering systems that can generate big data that
can be used in reliability analysis. Meeker and Hong (2014) focused on large
scale system operating and environment data (i.e., high-frequency multivariate
time series data), and provided examples on how to link such data as covariates
to traditional reliability responses such as time to failure, time to
recurrence of events, and degradation measurements. This paper intends to
extend that discussion by focusing on how to use data with complicated
structures to do reliability analysis. Such data types include high-dimensional
sensor data, functional curve data, and image streams. We first provide a
review of recent development in those directions, and then we provide a
discussion on how analytical methods can be developed to tackle the challenging
aspects that arise from the complexity feature of big data in reliability
applications. The use of modern statistical methods such as variable selection,
functional data analysis, scalar-on-image regression, spatio-temporal data
models, and machine learning techniques will also be discussed.Comment: 28 pages, 7 figure
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
The Biodiversity and Climate Change Virtual Laboratory: Where ecology meets big data
Advances in computing power and infrastructure, increases in the number and size of ecological and environmental datasets, and the number and type of data collection methods, are revolutionizing the field of Ecology. To integrate these advances, virtual laboratories offer a unique tool to facilitate, expedite, and accelerate research into the impacts of climate change on biodiversity. We introduce the uniquely cloud-based Biodiversity and Climate Change Virtual Laboratory (BCCVL), which provides access to numerous species distribution modelling tools; a large and growing collection of biological, climate, and other environmental datasets; and a variety of experiment types to conduct research into the impact of climate change on biodiversity. Users can upload and share datasets, potentially increasing collaboration, cross-fertilisation of ideas, and innovation among the user community. Feedback confirms that the BCCVL's goals of lowering the technical requirements for species distribution modelling, and reducing time spent on such research, are being met
Forecasting of commercial sales with large scale Gaussian Processes
This paper argues that there has not been enough discussion in the field of
applications of Gaussian Process for the fast moving consumer goods industry.
Yet, this technique can be important as it e.g., can provide automatic feature
relevance determination and the posterior mean can unlock insights on the data.
Significant challenges are the large size and high dimensionality of commercial
data at a point of sale. The study reviews approaches in the Gaussian Processes
modeling for large data sets, evaluates their performance on commercial sales
and shows value of this type of models as a decision-making tool for
management.Comment: 1o pages, 5 figure
- …