109 research outputs found
PARMA-CC: Parallel Multiphase Approximate Cluster Combining
Clustering is a common component in data analysis applications. Despite the extensive literature, the continuously increasing volumes of data produced by sensors (e.g. rates of several MB/s by 3D scanners such as LIDAR sensors), and the time-sensitivity of the applications leveraging the clustering outcomes (e.g. detecting critical situations, that are known to be accuracy-dependent), demand for novel approaches that respond faster while coping with large data sets. The latter is the challenge we address in this paper. We propose an algorithm, PARMA-CC, that complements existing density-based and distance-based clustering methods. PARMA-CC is based on approximate, data parallel cluster combining, where parallel threads can compute summaries of clusters of data (sub)sets and, through combining, together construct a comprehensive summary of the sets of clusters. By approximating clusters with their respective geometrical summaries, our technique scales well with increased data volumes, and, by computing and efficiently combining the summaries in parallel, it enables latency improvements. PARMA-CC combines the summaries using special data structures that enable parallelism through in-place data processing. As we show in our analysis and evaluation, PARMA-CC can complement and outperform well-established methods, with significantly better scalability, while still providing highly accurate results in a variety of data sets, even with skewed data distributions, which cause the traditional approaches to exhibit their worst-case behaviour. In the paper we also describe how PARMA-CC can facilitate time-critical applications through appropriate use of the summaries
Distinguishing N-acetylneuraminic acid linkage isomers on glycopeptides by ion mobility-mass spectrometry
Differentiating the structure of isobaric glycopeptides represents a major
challenge for mass spectrometry-based characterisation techniques. Here we
show that the regiochemistry of the most common N-acetylneuraminic acid
linkages of N-glycans can be identified in a site-specific manner from
individual glycopeptides using ion mobility-mass spectrometry analysis of
diagnostic fragment ions
On the reliability of the theoretical internal conversion coefficients
Possible sources of uncertainties in the calculations of the internal
conversion coefficients are studied. The uncertainties induced by them are
estimated.Comment: 16 pages (including 3 figures inserted by 'epsfig' macro
FEBUKO and MODMEP: Field measurements and modelling of aerosol and cloud multiphase processes
An overview of the two FEBUKO aerosol–cloud interaction field experiments in the Thüringer Wald (Germany) in October 2001 and 2002 and the corresponding modelling project MODMEP is given. Experimentally, a variety of measurement methods were deployed to probe the gas phase, particles and cloud droplets at three sites upwind, downwind and within an orographic cloud with special emphasis on the budgets and interconversions of organic gas and particle phase constituents. Out of a total of 14 sampling periods within 30 cloud events three events (EI, EII and EIII) are selected for detailed analysis. At various occasions an impact of the cloud process on particle chemical composition such as on the organic compounds content, sulphate and nitrate and also on particle size distributions and particle mass is observed. Moreover, direct phase transfer of polar organic compound from the gas phase is found to be very important for the understanding of cloudwater composition. For the modelling side, a main result of the MODMEP project is the development of a cloud model, which combines a complex multiphase chemistry with detailed microphysics. Both components are described in a fine-resolved particle/drop spectrum. New numerical methods are developed for an efficient solution of the entire complex model. A further development of the CAPRAM mechanism has lead to a more detailed description of tropospheric aqueous phase organic chemistry. In parallel, effective tools for the reduction of highly complex reaction schemes are provided. Techniques are provided and tested which allow the description of complex multiphase chemistry and of detailed microphysics in multidimensional chemistry-transport models
A Parameterization of Heterogeneous Hydrolysis of N2O5 for 3-D Atmospheric Modelling
During night-time, the heterogeneous hydrolysis of N 2O 5 on the surface of deliquescent aerosol particles represents a major source for the formation of HNO 3 and leads to an important reduction of NO x in the atmosphere. In Chen et al., Atmos. Chem. Phys. 18:673–689, 2018 [5], we investigate an improved parameterization of the heterogeneous N 2O 5 hydrolysis. This approach is based on laboratory experiments and takes into account the temperature, relative humidity, aerosol particle composition as well as the surface area concentration. The parametrization was implemented in the online coupled model system COSMO-MUSCAT (Consortium for Small-scale Modelling and Multi-Scale Chemistry Aerosol Transport, https://cosmo-muscat.tropos.de). In Chen et al., Atmos. Chem. Phys. 18:673–689, 2018 [5], the modified model was applied for the simulation of the HOPE-Melpitz campaign (10–25 September 2013) where especially the nitrate prediction over western and central Europe was analysed. The modelled particulate nitrate concentrations were compared with filter measurements over Germany. In this first study, the particulate nitrate results are significantly improved by using the developed N 2O 5 parametrization, particularly if the particulate nitrate was dominated by the local chemical formation (September 12, 17–18 and 25). The aim of the current study consists in an evaluation over a longer time period for different meteorological conditions and emission situations. For this reason, we have simulated the period from March to November 2010. The results were compared with other approaches and evaluated by filter measurements. The improvement was confirmed for the results in spring and autumn, but nitrate is strongly over-predicted also for the new parametrization during the summer time
Community landscapes: an integrative approach to determine overlapping network module hierarchy, identify key nodes and predict network dynamics
Background: Network communities help the functional organization and
evolution of complex networks. However, the development of a method, which is
both fast and accurate, provides modular overlaps and partitions of a
heterogeneous network, has proven to be rather difficult. Methodology/Principal
Findings: Here we introduce the novel concept of ModuLand, an integrative
method family determining overlapping network modules as hills of an influence
function-based, centrality-type community landscape, and including several
widely used modularization methods as special cases. As various adaptations of
the method family, we developed several algorithms, which provide an efficient
analysis of weighted and directed networks, and (1) determine pervasively
overlapping modules with high resolution; (2) uncover a detailed hierarchical
network structure allowing an efficient, zoom-in analysis of large networks;
(3) allow the determination of key network nodes and (4) help to predict
network dynamics. Conclusions/Significance: The concept opens a wide range of
possibilities to develop new approaches and applications including network
routing, classification, comparison and prediction.Comment: 25 pages with 6 figures and a Glossary + Supporting Information
containing pseudo-codes of all algorithms used, 14 Figures, 5 Tables (with 18
module definitions, 129 different modularization methods, 13 module
comparision methods) and 396 references. All algorithms can be downloaded
from this web-site: http://www.linkgroup.hu/modules.ph
Relating gene expression data on two-component systems to functional annotations in Escherichia coli
<p>Abstract</p> <p>Background</p> <p>Obtaining physiological insights from microarray experiments requires computational techniques that relate gene expression data to functional information. Traditionally, this has been done in two consecutive steps. The first step identifies important genes through clustering or statistical techniques, while the second step assigns biological functions to the identified groups. Recently, techniques have been developed that identify such relationships in a single step.</p> <p>Results</p> <p>We have developed an algorithm that relates patterns of gene expression in a set of microarray experiments to functional groups in one step. Our only assumption is that patterns co-occur frequently. The effectiveness of the algorithm is demonstrated as part of a study of regulation by two-component systems in <it>Escherichia coli</it>. The significance of the relationships between expression data and functional annotations is evaluated based on density histograms that are constructed using product similarity among expression vectors. We present a biological analysis of three of the resulting functional groups of proteins, develop hypotheses for further biological studies, and test one of these hypotheses experimentally. A comparison with other algorithms and a different data set is presented.</p> <p>Conclusion</p> <p>Our new algorithm is able to find interesting and biologically meaningful relationships, not found by other algorithms, in previously analyzed data sets. Scaling of the algorithm to large data sets can be achieved based on a theoretical model.</p
Clustering Algorithms: Their Application to Gene Expression Data
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure
Calculated wind climatology of the South-Saxonian/North-Czech mountain topography including improved resolution of mountains
A mesoscale model has been applied to
calculate climatological means of the surface wind. A reliable average requires
more than 40 model runs, which are differentiated by the direction and speed of
the geostrophic wind under the assumption of neutral stratification. The
frequency distributions of the geostrophic wind have been taken from
observations of the 850-hPa winds at the radiosonde station in Prague for a
10-year period. The simulation results have been averaged over all sectors and
speed classes of the geostrophic wind according to their frequencies. A
comparison of the calculated mean wind speeds with observed ones shows
deviations of about 0.4 ms–1 outside the mountains. The
representation of steep topography and isolated mountains on the basis of a 3-km
horizontal resolution of the simulations needs special treatment in order to
reduce the gap of up to 4 ms–1 between observed and simulated mean
wind speeds over mountains. Therefore, an empiric speed-up formula has been
applied to the isolated mountains that otherwise would fall through the 3-km
meshes. The corresponding deviations have been reduced to 1.5 ms–1
- …