1,720 research outputs found
Recommended from our members
Guidance in the human–machine analytics process
In this paper, we list the goals for and the pros and cons of guidance, and we discuss the role that it can play not only in key low-level visualization tasks but also the more sophisticated model-generation tasks of visual analytics. Recent advances in artificial intelligence, particularly in machine learning, have led to high hopes regarding the possibilities of using automatic techniques to perform some of the tasks that are currently done manually using visualization by data analysts. However, visual analytics remains a complex activity, combining many different subtasks. Some of these tasks are relatively low-level, and it is clear how automation could play a role—for example, classification and clustering of data. Other tasks are much more abstract and require significant human creativity, for example, linking insights gleaned from a variety of disparate and heterogeneous data artifacts to build support for decision making. In this paper, we outline the potential applications of guidance, as well as the inputs to guidance. We discuss challenges in implementing guidance, including the inputs to guidance systems and how to provide guidance to users. We propose potential methods for evaluating the quality of guidance at different phases in the analytic process and introduce the potential negative effects of guidance as a source of bias in analytic decision making
Statistical Algorithms and Bioinformatics Tools Development for Computational Analysis of High-throughput Transcriptomic Data
Next-Generation Sequencing technologies allow for a substantial increase in the amount of data available for various biological studies. In order to effectively and efficiently analyze this data, computational approaches combining mathematics, statistics, computer science, and biology are implemented. Even with the substantial efforts devoted to development of these approaches, numerous issues and pitfalls remain. One of these issues is mapping uncertainty, in which read alignment results are biased due to the inherent difficulties associated with accurately aligning RNA-Sequencing reads. GeneQC is an alignment quality control tool that provides insight into the severity of mapping uncertainty in each annotated gene from alignment results. GeneQC used feature extraction to identify three levels of information for each gene and implements elastic net regularization and mixture model fitting to provide insight in the severity of mapping uncertainty and the quality of read alignment. In combination with GeneQC, the Ambiguous Reads Mapping (ARM) algorithm works to re-align ambiguous reads through the integration of motif prediction from metabolic pathways to establish coregulatory gene modules for re-alignment using a negative binomial distribution-based probabilistic approach. These two tools work in tandem to address the issue of mapping uncertainty and provide more accurate read alignments, and thus more accurate expression estimates. Also presented in this dissertation are two approaches to interpreting the expression estimates. The first is IRIS-EDA, an integrated shiny web server that combines numerous analyses to investigate gene expression data generated from RNASequencing data. The second is ViDGER, an R/Bioconductor package that quickly generates high-quality visualizations of differential gene expression results to assist users in comprehensive interpretations of their differential gene expression results, which is a non-trivial task. These four presented tools cover a variety of aspects of modern RNASeq analyses and aim to address bottlenecks related to algorithmic and computational issues, as well as more efficient and effective implementation methods
Dwelling Quietly in the Rich Club: Brain Network Determinants of Slow Cortical Fluctuations
For more than a century, cerebral cartography has been driven by
investigations of structural and morphological properties of the brain across
spatial scales and the temporal/functional phenomena that emerge from these
underlying features. The next era of brain mapping will be driven by studies
that consider both of these components of brain organization simultaneously --
elucidating their interactions and dependencies. Using this guiding principle,
we explored the origin of slowly fluctuating patterns of synchronization within
the topological core of brain regions known as the rich club, implicated in the
regulation of mood and introspection. We find that a constellation of densely
interconnected regions that constitute the rich club (including the anterior
insula, amygdala, and precuneus) play a central role in promoting a stable,
dynamical core of spontaneous activity in the primate cortex. The slow time
scales are well matched to the regulation of internal visceral states,
corresponding to the somatic correlates of mood and anxiety. In contrast, the
topology of the surrounding "feeder" cortical regions show unstable, rapidly
fluctuating dynamics likely crucial for fast perceptual processes. We discuss
these findings in relation to psychiatric disorders and the future of
connectomics.Comment: 35 pages, 6 figure
Improved Computational Prediction of Function and Structural Representation of Self-Cleaving Ribozymes with Enhanced Parameter Selection and Library Design
Biomolecules could be engineered to solve many societal challenges, including disease diagnosis and treatment, environmental sustainability, and food security. However, our limited understanding of how mutational variants alter molecular structures and functional performance has constrained the potential of important technological advances, such as high-throughput sequencing and gene editing. Ribonuleic Acid (RNA) sequences are thought to play a central role within many of these challenges. Their continual discovery throughout all domains of life is evidence of their significant biological importance (Weinreb et al., 2016). The self-cleaving ribozyme is a class of noncoding Ribonuleic Acid (ncRNA) that has been useful for relating sequence variants to structural features and their associated catalytic activities. Self-cleaving ribozymes possess tractable sequence spaces, perform easily identifiable catalytic functions, and have well documented structures. The determination of a self-cleaving ribozyme’s structure and catalytic activity within the laboratory is typically a slow and expensive process. Most current explorations of structure and function come from these empirical processes. Computational approaches to the prediction of catalytic activity and structure are fast and inexpensive, but have failed both to achieve atomic accuracy or to correctly identify all base-pair interactions (Watkins et al., 2018). One prominent impediment to computational approaches is the lack of existing structural and functional data typically required by predictive models (Jumper et al., 2021). Using data from deep-mutational scanning experiments and high-throughput sequencing technology, it is possible to computationally map mutational variants to their observed catalytic activity for a range of self-cleaving ribozymes. The resulting map reveals important base-pairing relationships that, in turn, facilitate accurate predictions of higher-order variants. Using sequence data from three experimental replicates of five model self-cleaving ribozymes, I will identify and map all single and double mutation variants to their observed cleavage activity. These mappings will be used to identify structural features within each ribozyme. Next, I will show within a training tool how observed cleavage for multiple reaction times can be used to identify the catalytic rates of our model ribozymes. Finally, I will predict the functional activity for model ribozyme variants of various mutational orders using machine learning models trained only on functionally labeled sequence variants. Together, these three dissertation chapters represent the kind of analysis needed to further the implementation of more accurate structural and functional prediction algorithms
Improving the Ribozyme Toolbox: From Structure-Function Insights to Synthetic Biology Applications
Self-cleaving ribozymes are a naturally occurring class of catalytically active RNA molecules which cleave their own phosphate backbone. In nature, self-cleaving ribozymes are best known for their role in processing concatamers of viral genomes into monomers during viral replication in some RNA viruses, but to a lesser degree have also been implicated in mRNA regulation and processing in bacteria and eukaryotes. In addition to their biological relevance, these RNA enzymes have been harnessed as important biomolecular tools with a variety of applications in fields such as bioengineering. Self-cleaving ribozymes are relatively small and easy to generate in the lab using common molecular biology approaches, and have therefore been accessible and well exploited model systems used to interrogate RNA sequence-structure-function relationships. Furthermore, self-cleaving ribozymes are also being implemented as parts in the development of various biomolecular tools such as biosensors and gene regulatory elements. While much progress has been made in these areas, there are still challenges associated with the performance and implementation of such tools.
The work contained in this dissertation aims to address several of these challenges and improve the ribozyme toolbox in several diverse areas. Chapter one provides an introduction to pertinent background information for this dissertation. Chapter two aims to improve the ribozyme toolbox by providing and analyzing new high-throughput sequence-structure-function data sets on five different self-cleaving ribozymes, and identifying how trends in epistasis relate to distinct structural elements. Chapter three uses such high-throughput data to train machine learning models that accurately predict the historically difficult to predict functional effects of higher order mutations in functional RNA’s. Finally, in chapter four, I developed a biologically relevant platform to study the real time performance and kinetics of self-cleaving ribozyme-based gene regulatory elements directly at the site of transcription in mammalian cells
Quantification of C-type lectin receptors signal transduction
Ubiquitous glycans facilitate a plethora of important interactions namely cancer-host, host-pathogen,
host-self interactions. Interaction with theses carbohydrates is enabled by lectins and the effects of
these interactions can range from redundant to essential. Lectins are exposed on mammalian cell
surfaces where they identify the information encoded in glycans and transfer it into signal transduction
pathways. Such signal transduction pathways are complex and difficult to analyse. However,
quantitative data with single cell resolution provides means to disentangle the associated signalling
cascades. C-Type lectin receptors (CLRs) expressed on immune cells were chosen as a model system to
study their capacity to transmit information encoded in glycans of incoming particles. To this end,
monocytic cell lines cell lines expressing DC-SIGN, MCL, dectin-1, dectin-2, and mincle, as well as TNFAR
and TLR-1&2 were established. Based on the study of Cheong et al., 2011 the amount of transmitted
information was quantified by following NFκB dependent GFP expression. While most receptors did
have a channel capacity of at least 1 bit, it was found that dectin-2 has a lower capacity to transmit
information than other lectins. Especially the comparison to the related lectin mincle is interesting, since
mincle uses the same pathway effectively. Furthermore, information transmission of dectin-2 could not
be enhanced by other lectins or signalling molecules. Yet upon closer analysis it was found that the
sensitivity of the dectin-2 signal transduction pathway can be enhanced by overexpression of its co-
receptor FcRγ, but surprisingly its transmitted information cannot. Moreover, it was suggested how
potential autoimmunity might be a cause for dectin-2’s inefficient signalling. The question of signal
integration was also approached: How do cells combine the flow of information from multiple
receptors? It was shown that the signal of dectin-2 and dectin-1 are being integrated as a compromise
between both receptors. The reason for this compromise might be the activity of the phosphoprotein
SYK, present in both dectin-1 and dectin-2 signal transduction pathways.
By using the established assays and cell lines, soluble beta glucans (SBGs) were discovered to be potent
stimulators of dectin-1, where sensitivity to the SBGs was highly variable and dependent on their β-
glucan side chains. Various different ligands for mincle on the other hand resulted in a similar signalling
behaviour. Building on insight in targeted delivery to lectins, it was shown how nucleic acids can be
delivered to Langerin expressing cells and used to reprogramme the cells, a technology of tremendous
potential for vaccination strategies and (non-germline) genetic editing.
8
Taken together, the concepts of information theory with single cell resolved data enabled the
quantification of CLRs signalling behaviour and signal integration. By using dectin-2 and other lectins as
example it was demonstrated how the receptor itself determine the efficiency and therefore outcome
of the signal transduction pathways. Moreover, the potential to explore glycan lectins interactions in
drug targeting was exemplified by delivering mRNA via Langerin or demonstrating the dependency of
dectin-1 sensitivity upon the β-glucan side chains of its ligands
Data Mining On Retail Banking Simulation Model To Devise Productivity Improvement Strategies
Retailing banking is service-oriented business. Customers visit different branches
to procure services such as transaction inquiry, process bank account, etc. The scenario is
operators of various positions of a branch have to render counter services and meet certain
customer service level. Customers arrive at the counter at different patterns. Information
can be collected, such as customer waiting time, operator on service time, number of tickets
received at particular time. Such information allows data mining to be performed to
discover patterns that could segregate the type of services. The study aims to demonstrate
such possibility through computer simulation. Specifically, investigation is made for the
data mining to determine the clustering and classification of the branches and operator
productivity respectively. The research methodology involves seven steps. First, business
understanding is performed to gain insight into the motive of initiating data mining exercise.
Two levels, macro and micro levels were defined to differentiate inter and intra-branches
comparisons. Second, a computer simulation would be constructed in WITNESS Horizon
V.21, largely based on the description of a real branch. Different scenarios were built
reflecting operating behaviors of different branches. The simulation stores data (about six
thousands of records) in a database. The following steps involve different data selection
processing strategies (selection, cleaning, transformation). Next is the data mining,
primarily using Python Orange V.3.20. Last step will be pattern evaluation to develop
suitable productivity improvement strategies. In this research, a variety of data mining tools
have been deployed and multiple insights were generated. Notably, branches adopted lean
management have shown improved in general productivity. Operator performances were
able to differentiate based on years of experience. This research provides opportunities for
researchers to examine the productivity of branches with other data mining techniques. It
also helps bankers to focus on the right areas to be improved and increase the ability in
decision making. There are two main limitations of the research. First, the simulation model
does not capture the whole intricacy of retail banking front-end operations. Second,
software limitations hindered more complex data mining tasks to be performed
- …