6,414 research outputs found
Time frequency analysis in terahertz pulsed imaging
Recent advances in laser and electro-optical technologies have made the previously under-utilized terahertz frequency band of the electromagnetic spectrum
accessible for practical imaging. Applications are emerging, notably in the biomedical domain. In this chapter the technique of terahertz pulsed imaging is
introduced in some detail. The need for special computer vision methods, which arises from the use of pulses of radiation and the acquisition of a time series at
each pixel, is described. The nature of the data is a challenge since we are interested not only in the frequency composition of the pulses, but also how these differ for different parts of the pulse. Conventional and short-time Fourier transforms and wavelets were used in preliminary experiments on the analysis of terahertz
pulsed imaging data. Measurements of refractive index and absorption coefficient were compared, wavelet compression assessed and image classification by multidimensional
clustering techniques demonstrated. It is shown that the timefrequency methods perform as well as conventional analysis for determining material properties. Wavelet compression gave results that were robust through compressions that used only 20% of the wavelet coefficients. It is concluded that the time-frequency methods hold great promise for optimizing the extraction of the spectroscopic information contained in each terahertz pulse, for the analysis of more complex signals comprising multiple pulses or from recently introduced acquisition techniques
Impliance: A Next Generation Information Management Appliance
ably successful in building a large market and adapting to the changes of the
last three decades, its impact on the broader market of information management
is surprisingly limited. If we were to design an information management system
from scratch, based upon today's requirements and hardware capabilities, would
it look anything like today's database systems?" In this paper, we introduce
Impliance, a next-generation information management system consisting of
hardware and software components integrated to form an easy-to-administer
appliance that can store, retrieve, and analyze all types of structured,
semi-structured, and unstructured information. We first summarize the trends
that will shape information management for the foreseeable future. Those trends
imply three major requirements for Impliance: (1) to be able to store, manage,
and uniformly query all data, not just structured records; (2) to be able to
scale out as the volume of this data grows; and (3) to be simple and robust in
operation. We then describe four key ideas that are uniquely combined in
Impliance to address these requirements, namely the ideas of: (a) integrating
software and off-the-shelf hardware into a generic information appliance; (b)
automatically discovering, organizing, and managing all data - unstructured as
well as structured - in a uniform way; (c) achieving scale-out by exploiting
simple, massive parallel processing, and (d) virtualizing compute and storage
resources to unify, simplify, and streamline the management of Impliance.
Impliance is an ambitious, long-term effort to define simpler, more robust, and
more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement
(http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute,
display, and perform the work, make derivative works and make commercial use
of the work, but, you must attribute the work to the author and CIDR 2007.
3rd Biennial Conference on Innovative Data Systems Research (CIDR) January
710, 2007, Asilomar, California, US
Segmentation of skin lesions in 2D and 3D ultrasound images using a spatially coherent generalized Rayleigh mixture model
This paper addresses the problem of jointly estimating the statistical distribution and segmenting lesions in multiple-tissue high-frequency skin ultrasound images. The distribution of multiple-tissue images is modeled as a spatially coherent finite mixture of heavy-tailed Rayleigh distributions. Spatial coherence inherent to biological tissues is modeled by enforcing local dependence between the mixture components. An original Bayesian algorithm combined with a Markov chain Monte Carlo method is then proposed to jointly estimate the mixture parameters and a label-vector associating each voxel to a tissue. More precisely, a hybrid Metropolis-within-Gibbs sampler is used to draw samples that are asymptotically distributed according to the posterior distribution of the Bayesian model. The Bayesian estimators of the model parameters are then computed from the generated samples. Simulation results are conducted on synthetic data to illustrate the performance of the proposed estimation strategy. The method is then successfully applied to the segmentation of in vivo skin tumors in high-frequency 2-D and 3-D ultrasound images
Determining the Intrinsic Structure of Public Software Development History
Background. Collaborative software development has produced a wealth of
version control system (VCS) data that can now be analyzed in full. Little is
known about the intrinsic structure of the entire corpus of publicly available
VCS as an interconnected graph. Understanding its structure is needed to
determine the best approach to analyze it in full and to avoid methodological
pitfalls when doing so. Objective. We intend to determine the most salient
network topol-ogy properties of public software development history as captured
by VCS. We will explore: degree distributions, determining whether they are
scale-free or not; distribution of connect component sizes; distribution of
shortest path lengths.Method. We will use Software Heritage-which is the
largest corpus of public VCS data-compress it using webgraph compression
techniques, and analyze it in-memory using classic graph algorithms. Analyses
will be performed both on the full graph and on relevant subgraphs.
Limitations. The study is exploratory in nature; as such no hypotheses on the
findings is stated at this time. Chosen graph algorithms are expected to scale
to the corpus size, but it will need to be confirmed experimentally. External
validity will depend on how representative Software Heritage is of the software
commons.Comment: MSR 2020 - 17th International Conference on Mining Software
Repositories, Oct 2020, Seoul, South Kore
The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development
We introduce the Software Heritage filesystem (SwhFS), a user-space
filesystem that integrates large-scale open source software archival with
development workflows. SwhFS provides a POSIX filesystem view of Software
Heritage, the largest public archive of software source code and version
control system (VCS) development history.Using SwhFS, developers can quickly
"checkout" any of the 2 billion commits archived by Software Heritage, even
after they disappear from their previous known location and without incurring
the performance cost of repository cloning. SwhFS works across unrelated
repositories and different VCS technologies. Other source code artifacts
archived by Software Heritage-individual source code files and trees, releases,
and branches-can also be accessed using common programming tools and custom
scripts, as if they were locally available.A screencast of SwhFS is available
online at dx.doi.org/10.5281/zenodo.4531411
A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data
Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified
single-cell genomes, and metagenomes has enabled investigation of a wide range
of organisms and ecosystems. However, sampling variation in short-read data
sets and high sequencing error rates of modern sequencers present many new
computational challenges in data interpretation. These challenges have led to
the development of new classes of mapping tools and {\em de novo} assemblers.
These algorithms are challenged by the continued improvement in sequencing
throughput. We here describe digital normalization, a single-pass computational
algorithm that systematizes coverage in shotgun sequencing data sets, thereby
decreasing sampling variation, discarding redundant data, and removing the
majority of errors. Digital normalization substantially reduces the size of
shotgun data sets and decreases the memory and time requirements for {\em de
novo} sequence assembly, all without significantly impacting content of the
generated contigs. We apply digital normalization to the assembly of microbial
genomic data, amplified single-cell genomic data, and transcriptomic data. Our
implementation is freely available for use and modification
Identifying modular flows on multilayer networks reveals highly overlapping organization in social systems
Unveiling the community structure of networks is a powerful methodology to
comprehend interconnected systems across the social and natural sciences. To
identify different types of functional modules in interaction data aggregated
in a single network layer, researchers have developed many powerful methods.
For example, flow-based methods have proven useful for identifying modular
dynamics in weighted and directed networks that capture constraints on flow in
the systems they represent. However, many networked systems consist of agents
or components that exhibit multiple layers of interactions. Inevitably,
representing this intricate network of networks as a single aggregated network
leads to information loss and may obscure the actual organization. Here we
propose a method based on compression of network flows that can identify
modular flows in non-aggregated multilayer networks. Our numerical experiments
on synthetic networks show that the method can accurately identify modules that
cannot be identified in aggregated networks or by analyzing the layers
separately. We capitalize on our findings and reveal the community structure of
two multilayer collaboration networks: scientists affiliated to the Pierre
Auger Observatory and scientists publishing works on networks on the arXiv.
Compared to conventional aggregated methods, the multilayer method reveals
smaller modules with more overlap that better capture the actual organization
- …