18 research outputs found
A distributed workload-aware approach to partitioning geospatial big data for cybergis analytics
Numerous applications and scientific domains have contributed to tremendous growth of geospatial data during the past several decades. To resolve the volume and velocity of such big data, distributed system approaches have been extensively studied to partition data for scalable analytics and associated applications. However, previous work on partitioning large geospatial data focuses on bulk-ingestion and static partitioning, hence is unable to handle dynamic variability in both data and computation that are particularly common for streaming data.
To eliminate this limitation, this thesis holistically addresses computational intensity and dynamic data workload to achieve optimal data partitioning for scalable geospatial applications. Specifically, novel data partitioning algorithms have been developed to support scalable geospatial and temporal data management with new data models designed to represent dynamic data workload. Optimal partitions are realized by formulating a fine-grain spatial optimization problem that is solved using an evolutionary algorithm with spatially explicit operations. As an overarching approach to integrating the algorithms, data models and spatial optimization problem solving, GeoBalance is established as a workload-aware framework for supporting scalable cyberGIS (i.e. geographic information science and systems based on advanced cyberinfrastructure) analytics
Cooperative "folding transition" in the sequence space facilitates function-driven evolution of protein families
In the protein sequence space, natural proteins form clusters of families
which are characterized by their unique native folds whereas the great majority
of random polypeptides are neither clustered nor foldable to unique structures.
Since a given polypeptide can be either foldable or unfoldable, a kind of
"folding transition" is expected at the boundary of a protein family in the
sequence space. By Monte Carlo simulations of a statistical mechanical model of
protein sequence alignment that coherently incorporates both short-range and
long-range interactions as well as variable-length insertions to reproduce the
statistics of the multiple sequence alignment of a given protein family, we
demonstrate the existence of such transition between natural-like sequences and
random sequences in the sequence subspaces for 15 domain families of various
folds. The transition was found to be highly cooperative and two-state-like.
Furthermore, enforcing or suppressing consensus residues on a few of the
well-conserved sites enhanced or diminished, respectively, the natural-like
pattern formation over the entire sequence. In most families, the key sites
included ligand binding sites. These results suggest some selective pressure on
the key residues, such as ligand binding activity, may cooperatively facilitate
the emergence of a protein family during evolution. From a more practical
aspect, the present results highlight an essential role of long-range effects
in precisely defining protein families, which are absent in conventional
sequence models.Comment: 13 pages, 7 figures, 2 tables (a new subsection added
Computational Assessment of the Impact of Social Justice Documentaries
Documentaries are meant to tell a story, that is, to create memory, imagination and sharing (Rose, 2012). Moreover, documentaries aim to lead to change in people's knowledge and/ or behavior (Barrett & Leddy, 2008). How can we know if a documentary has achieved these goals? We report on a research project where we have been developing, applying and evaluating a theoretically-grounded, empirical and computational solution for assessing the impact of social justice documentaries in a scalable, robust and rigorous fashion. We leverage cutting-edge methods from socio-technical data analytics - namely natural language processing and network analysis - for this purpose and provide a publicly available technology (ConText) that supports these routines. In this paper, we focus on the theoretical foundations of this project, address our methodological and technical framework, and provide an illustrative example of the introduced solution.publishedye
Twitter temporal signatures.
<p>A-D: Twitter users’ temporal signatures aggregated by land use type for all users during weekdays (A-B) and weekends (C-D). Weekdays were defined as Mondays to Fridays while Weekends include Saturdays and Sundays. Signatures were normalized by the total number of tweets counts in a land use class to allow comparisons.</p
Scatter plots of temporal signatures of individual key locations.
<p>A-B: Distribution of individual clusters in a 2D space defined by the temporal activity (percentage of tweets relative to the total number of tweets in the cluster) during different hours of the day. A: morning vs. evening. B: morning vs. afternoon. Clusters with similar land use attributes have a similar distribution of tweets within the twenty-four cycle. Hexagonal binning was used to display the common (mode) land use attribute in each bin.</p
Confusion matrix of Twitter land use classification.
<p>Confusion matrix of Twitter land use classification.</p