    Fat polygonal partitions with applications to visualization and embeddings

    Let T be a rooted and weighted tree, where the weight of any node is equal to the sum of the weights of its children. The popular Treemap algorithm visualizes such a tree as a hierarchical partition of a square into rectangles, where the area of the rectangle corresponding to any node in T is equal to the weight of that node. The aspect ratio of the rectangles in such a rectangular partition necessarily depends on the weights and can become arbitrarily high. We introduce a new hierarchical partition scheme, called a polygonal partition, which uses convex polygons rather than just rectangles. We present two methods for constructing polygonal partitions, both having guarantees on the worst-case aspect ratio of the constructed polygons; in particular, both methods guarantee a bound on the aspect ratio that is independent of the weights of the nodes. We also consider rectangular partitions with slack, where the areas of the rectangles may differ slightly from the weights of the corresponding nodes. We show that this makes it possible to obtain partitions with constant aspect ratio. This result generalizes to hyper-rectangular partitions in Rd. We use these partitions with slack for embedding ultrametrics into d-dimensional Euclidean space: we give a polylog(¿)-approximation algorithm for embedding n-point ultrametrics into Rd with minimum distortion, where ¿ denotes the spread of the metric. The previously best-known approximation ratio for this problem was polynomial in n. This is the first algorithm for embedding a non-trivial family of weighted-graph metrics into a space of constant dimension that achieves polylogarithmic approximation ratio

    Temporal Clustering

    We study the problem of clustering sequences of unlabeled point sets taken from a common metric space. Such scenarios arise naturally in applications where a system or process is observed in distinct time intervals, such as biological surveys and contagious disease surveillance. In this more general setting existing algorithms for classical (i.e. static) clustering problems are not applicable anymore. We propose a set of optimization problems which we collectively refer to as temporal clustering. The quality of a solution to a temporal clustering instance can be quantified using three parameters: the number of clusters k, the spatial clustering cost r, and the maximum cluster displacement delta between consecutive time steps. We consider spatial clustering costs which generalize the well-studied k-center, discrete k-median, and discrete k-means objectives of classical clustering problems. We develop new algorithms that achieve trade-offs between the three objectives k, r, and delta. Our upper bounds are complemented by inapproximability results

    Temporal Hierarchical Clustering

    We study hierarchical clusterings of metric spaces that change over time. This is a natural geo- metric primitive for the analysis of dynamic data sets. Specifically, we introduce and study the problem of finding a temporally coherent sequence of hierarchical clusterings from a sequence of unlabeled point sets. We encode the clustering objective by embedding each point set into an ultrametric space, which naturally induces a hierarchical clustering of the set of points. We enforce temporal coherence among the embeddings by finding correspondences between successive pairs of ultrametric spaces which exhibit small distortion in the Gromov-Hausdorff sense. We present both upper and lower bounds on the approximability of the resulting optimization problems

    In silico assessment of nanoparticle toxicity powered by the Enalos Cloud Platform:Integrating automated machine learning and synthetic data for enhanced nanosafety evaluation

    The rapid advance of nanotechnology has led to the development and widespread application of nanomaterials, raising concerns regarding their potential adverse effects on human health and the environment. Traditional (experimental) methods for assessing the nanoparticles (NPs) safety are time-consuming, expensive, and resource-intensive, and raise ethical concerns due to their reliance on animals. To address these challenges, we propose an in silico workflow that serves as an alternative or complementary approach to conventional hazard and risk assessment strategies, which incorporates state-of-the-art computational methodologies. In this study we present an automated machine learning (autoML) scheme that employs dose-response toxicity data for silver (Ag), titanium dioxide (TiO2), and copper oxide (CuO) NPs. This model is further enriched with atomistic descriptors to capture the NPs’ underlying structural properties. To overcome the issue of limited data availability, synthetic data generation techniques are used. These techniques help in broadening the dataset, thus improving the representation of different NP classes. A key aspect of this approach is a novel three-step applicability domain method (which includes the development of a local similarity approach) that enhances user confidence in the results by evaluating the prediction's reliability. We anticipate that this approach will significantly expedite the nanosafety assessment process enabling regulation to keep pace with innovation, and will provide valuable insights for the design and development of safe and sustainable NPs. The ML model developed in this study is made available to the scientific community as an easy-to-use web-service through the Enalos Cloud Platform (www.enaloscloud.novamechanics.com/sabydoma/safenanoscope/), facilitating broader access and collaborative advancements in nanosafety.</p

    Tensor Regression with Applications in Neuroimaging Data Analysis

    Classical regression methods treat covariates as a vector and estimate a corresponding vector of regression coefficients. Modern applications in medical imaging generate covariates of more complex form such as multidimensional arrays (tensors). Traditional statistical and computational methods are proving insufficient for analysis of these high-throughput data due to their ultrahigh dimensionality as well as complex structure. In this article, we propose a new family of tensor regression models that efficiently exploit the special structure of tensor covariates. Under this framework, ultrahigh dimensionality is reduced to a manageable level, resulting in efficient estimation and prediction. A fast and highly scalable estimation algorithm is proposed for maximum likelihood estimation and its associated asymptotic properties are studied. Effectiveness of the new methods is demonstrated on both synthetic and real MRI imaging data.Comment: 27 pages, 4 figure
