18 research outputs found

    A distributed workload-aware approach to partitioning geospatial big data for cybergis analytics

    Get PDF
    Numerous applications and scientific domains have contributed to tremendous growth of geospatial data during the past several decades. To resolve the volume and velocity of such big data, distributed system approaches have been extensively studied to partition data for scalable analytics and associated applications. However, previous work on partitioning large geospatial data focuses on bulk-ingestion and static partitioning, hence is unable to handle dynamic variability in both data and computation that are particularly common for streaming data. To eliminate this limitation, this thesis holistically addresses computational intensity and dynamic data workload to achieve optimal data partitioning for scalable geospatial applications. Specifically, novel data partitioning algorithms have been developed to support scalable geospatial and temporal data management with new data models designed to represent dynamic data workload. Optimal partitions are realized by formulating a fine-grain spatial optimization problem that is solved using an evolutionary algorithm with spatially explicit operations. As an overarching approach to integrating the algorithms, data models and spatial optimization problem solving, GeoBalance is established as a workload-aware framework for supporting scalable cyberGIS (i.e. geographic information science and systems based on advanced cyberinfrastructure) analytics

    Cooperative "folding transition" in the sequence space facilitates function-driven evolution of protein families

    Full text link
    In the protein sequence space, natural proteins form clusters of families which are characterized by their unique native folds whereas the great majority of random polypeptides are neither clustered nor foldable to unique structures. Since a given polypeptide can be either foldable or unfoldable, a kind of "folding transition" is expected at the boundary of a protein family in the sequence space. By Monte Carlo simulations of a statistical mechanical model of protein sequence alignment that coherently incorporates both short-range and long-range interactions as well as variable-length insertions to reproduce the statistics of the multiple sequence alignment of a given protein family, we demonstrate the existence of such transition between natural-like sequences and random sequences in the sequence subspaces for 15 domain families of various folds. The transition was found to be highly cooperative and two-state-like. Furthermore, enforcing or suppressing consensus residues on a few of the well-conserved sites enhanced or diminished, respectively, the natural-like pattern formation over the entire sequence. In most families, the key sites included ligand binding sites. These results suggest some selective pressure on the key residues, such as ligand binding activity, may cooperatively facilitate the emergence of a protein family during evolution. From a more practical aspect, the present results highlight an essential role of long-range effects in precisely defining protein families, which are absent in conventional sequence models.Comment: 13 pages, 7 figures, 2 tables (a new subsection added

    Computational Assessment of the Impact of Social Justice Documentaries

    Get PDF
    Documentaries are meant to tell a story, that is, to create memory, imagination and sharing (Rose, 2012). Moreover, documentaries aim to lead to change in people's knowledge and/ or behavior (Barrett & Leddy, 2008). How can we know if a documentary has achieved these goals? We report on a research project where we have been developing, applying and evaluating a theoretically-grounded, empirical and computational solution for assessing the impact of social justice documentaries in a scalable, robust and rigorous fashion. We leverage cutting-edge methods from socio-technical data analytics - namely natural language processing and network analysis - for this purpose and provide a publicly available technology (ConText) that supports these routines. In this paper, we focus on the theoretical foundations of this project, address our methodological and technical framework, and provide an illustrative example of the introduced solution.publishedye

    Twitter temporal signatures.

    No full text
    <p>A-D: Twitter users’ temporal signatures aggregated by land use type for all users during weekdays (A-B) and weekends (C-D). Weekdays were defined as Mondays to Fridays while Weekends include Saturdays and Sundays. Signatures were normalized by the total number of tweets counts in a land use class to allow comparisons.</p

    Scatter plots of temporal signatures of individual key locations.

    No full text
    <p>A-B: Distribution of individual clusters in a 2D space defined by the temporal activity (percentage of tweets relative to the total number of tweets in the cluster) during different hours of the day. A: morning vs. evening. B: morning vs. afternoon. Clusters with similar land use attributes have a similar distribution of tweets within the twenty-four cycle. Hexagonal binning was used to display the common (mode) land use attribute in each bin.</p
    corecore