4,846 research outputs found
UMSL Bulletin 2023-2024
The 2023-2024 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1088/thumbnail.jp
Determinantal Beam Search
Beam search is a go-to strategy for decoding neural sequence models. The
algorithm can naturally be viewed as a subset optimization problem, albeit one
where the corresponding set function does not reflect interactions between
candidates. Empirically, this leads to sets often exhibiting high overlap,
e.g., strings may differ by only a single word. Yet in use-cases that call for
multiple solutions, a diverse or representative set is often desired. To
address this issue, we propose a reformulation of beam search, which we call
determinantal beam search. Determinantal beam search has a natural relationship
to determinantal point processes (DPPs), models over sets that inherently
encode intra-set interactions. By posing iterations in beam search as a series
of subdeterminant maximization problems, we can turn the algorithm into a
diverse subset selection process. In a case study, we use the string
subsequence kernel to explicitly encourage n-gram coverage in text generated
from a sequence model. We observe that our algorithm offers competitive
performance against other diverse set generation strategies in the context of
language generation, while providing a more general approach to optimizing for
diversity
Using machine learning to predict pathogenicity of genomic variants throughout the human genome
Geschätzt mehr als 6.000 Erkrankungen werden durch Veränderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das Spleißen der mRNA in eine andere Isoform begünstigen. All diese Prozesse müssen überprüft werden, um die zum beschriebenen Phänotyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer Pathogenität.
Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier präsentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst außerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores.
Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells für das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. Außerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-Spleißing verbessern. Außerdem werden Varianteneffektmodelle aufgrund eines neuen, auf Allelhäufigkeit basierten, Trainingsdatensatz entwickelt.
Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfügbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity.
Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants.
The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency.
In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org
PARAMETRIC APPROACHES TO BALANCE STORMWATER MANAGEMENT AND HUMAN WELLBEING WITHIN URBAN GREEN SPACE
Through rapid urbanisation, urban green spaces (UGS) have become increasingly limited and valuable in high-density urban environments. However, meeting the diverse requirements of sustainable urban development often leads to conflicts in UGS usage. For example, the presence of stormwater treatment facilities may hinder residents' access to adjacent UGS.
Traditional approaches to UGS design typically focus on separate evaluations of human wellbeing and stormwater management. However, using questionnaires, interviews, and surveys for human wellbeing evaluation can be challenging to generalise across different projects and cities. Additionally, professional hydrological models used for stormwater management require extensive knowledge of hydrology and struggle to integrate their 2D evaluation methods with 3D models.
To address these challenges, this thesis proposes a novel framework to integrate the two types of analysis within a system for balancing the needs of human wellbeing and stormwater management in UGS design. The framework incorporates criteria and parameters for evaluating human wellbeing and stormwater management in a 3D model and introduces an approach to compare these two needs in terms of UGS area and suitable location. The contributions of this thesis to multi-objective UGS design are as follows: (1) defining human wellbeing evaluation through Accessibility and Usability assessment, which considers factors such as connectivity, walking distance, space enclosure, and space availability; (2) simplifying stormwater evaluation using particle systems and design curves to streamline complex hydrological models; (3) integrating the two evaluations by comparing their quantified requirements for UGS area and location; and (4) incorporating parameters to provide flexibility and accommodate various design scenarios and objectives.
The advantages of this evaluation framework are demonstrated through two case studies: (1) the human wellbeing analysis based on spatial parameters in the framework shows sensitivity to site variations, including UGS quantity and distribution, population density, terrain, road context, height of void space, and more; (2) the simplified stormwater analysis effectively captures site variations represented by UGS quantity and distribution, building distribution, as well as terrain, providing recommendations for each UGS with different types and sizes of stormwater facilities. (3) With the features of spatial parameter evaluation, the framework is feasible to adjust relevant thresholds and include more parameters to respond to specific project needs. (4) By quantifying the two different requirements for UGS and comparing them, any UGS with high usage conflicts can be easily identified. By evaluating all proposed criteria for UGSs in the 3D model, designers can conveniently observe simulation and adjust design scenarios to address identified usage conflicts. Thus, the proposed evaluation framework in this thesis would be valuable in effectively supporting further multi-objective UGS design
2023-2024 Boise State University Undergraduate Catalog
This catalog is primarily for and directed at students. However, it serves many audiences, such as high school counselors, academic advisors, and the public. In this catalog you will find an overview of Boise State University and information on admission, registration, grades, tuition and fees, financial aid, housing, student services, and other important policies and procedures. However, most of this catalog is devoted to describing the various programs and courses offered at Boise State
Contested environmental futures: rankings, forecasts and indicators as sociotechnical endeavours
In a world where numbers and science are often taken as the voice of truth and reason, Quantitative Devices (QDs) represent the epitome of policy driven by facts rather than hunches. Despite the scholarly interest in understanding the role of quantification in policy, the actual production of rankings, forecasts, indexes and other QDs has, to a great extent, been left unattended. While appendixes and technical notebooks offer an explanation of how these devices are produced, they exclude aspects of their making that are arbitrarily considered "mundane." It is in the everyday performances at research centres that the micropolitics of knowledge production, imaginaries, and frustrations merge. These are vital dimensions to understand the potential, limitations and ethical consequences of QDs.
Using two participant observations as the starting point, this thesis offers a comprehensive critical analysis of the processes through which university-based research centres create QDs that represent the world. It addresses how researchers conceive quantitative data. It pays attention to the discourses of hope and expectation embedded in the devices. Finally, it considers the ethics of creating devices that cannot be replicated independently of their place of production.
Two QDs were analysed: the Violence Early Warning System (ViEWS) and the Environmental Performance Index (EPI). At Uppsala University, researchers created ViEWS to forecast the probability of drought-driven conflicts within the next 100 years. The EPI, produced at the Yale Centre for Environmental Law and Policy, ranks the performance of countries' environmental policies. This thesis challenges existing claims within Science and Technology Studies and the Sociology of Quantification that QDs co-produce knowledge within their realms. I argue that these devices act as vehicles for sociotechnical infrastructures to be consolidated with little debate among policymakers, given their understanding as scientific and objective tools. Moreover, for an indicator to be incorporated within a QD, it needs to be deemed as relevant for those making the devices but also valuable enough to have been previously quantified by data providers. Even more, existing sociotechnical inequalities, power relations and epistemic injustices could impede disadvantaged communities' (e.g., in the Global South) ability to challenge metrics originated in centres in the Global North. This thesis, therefore, demonstrates how the future QDs propose is unilateral and does not acknowledge the myriad possibilities that might arise from a diversity of worldviews. In other words, they cast a future designed to fit under the current status quo.
In sum, through two QDs focused on environmental-related, this thesis launches an inquiry into the elements that make up the imaginaries they propose following the everyday life of their producers. To achieve this, I discuss two core elements. First, the role of tacit knowledge and sociotechnical inequalities in reinforcing power relations between those with the means to quantify and those who might only accommodate proposed futures. Second, the dynamics between research centres and data providers in relation to what is quantified. By scrutinising mundanity, this work is a step forward in understanding the construction of sociotechnical imaginaries and infrastructures
ManyDG: Many-domain Generalization for Healthcare Applications
The vast amount of health data has been continuously collected for each
patient, providing opportunities to support diverse healthcare predictive tasks
such as seizure detection and hospitalization prediction. Existing models are
mostly trained on other patients data and evaluated on new patients. Many of
them might suffer from poor generalizability. One key reason can be overfitting
due to the unique information related to patient identities and their data
collection environments, referred to as patient covariates in the paper. These
patient covariates usually do not contribute to predicting the targets but are
often difficult to remove. As a result, they can bias the model training
process and impede generalization. In healthcare applications, most existing
domain generalization methods assume a small number of domains. In this paper,
considering the diversity of patient covariates, we propose a new setting by
treating each patient as a separate domain (leading to many domains). We
develop a new domain generalization method ManyDG, that can scale to such
many-domain problems. Our method identifies the patient domain covariates by
mutual reconstruction and removes them via an orthogonal projection step.
Extensive experiments show that ManyDG can boost the generalization performance
on multiple real-world healthcare tasks (e.g., 3.7% Jaccard improvements on
MIMIC drug recommendation) and support realistic but challenging settings such
as insufficient data and continuous learning.Comment: The paper has been accepted by ICLR 2023, refer to
https://openreview.net/forum?id=lcSfirnflpW. We will release the data and
source codes here https://github.com/ycq091044/ManyD
Monetizing Explainable AI: A Double-edged Sword
Algorithms used by organizations increasingly wield power in society as they
decide the allocation of key resources and basic goods. In order to promote
fairer, juster, and more transparent uses of such decision-making power,
explainable artificial intelligence (XAI) aims to provide insights into the
logic of algorithmic decision-making. Despite much research on the topic,
consumer-facing applications of XAI remain rare. A central reason may be that a
viable platform-based monetization strategy for this new technology has yet to
be found. We introduce and describe a novel monetization strategy for fusing
algorithmic explanations with programmatic advertising via an explanation
platform. We claim the explanation platform represents a new,
socially-impactful, and profitable form of human-algorithm interaction and
estimate its potential for revenue generation in the high-risk domains of
finance, hiring, and education. We then consider possible undesirable and
unintended effects of monetizing XAI and simulate these scenarios using
real-world credit lending data. Ultimately, we argue that monetizing XAI may be
a double-edged sword: while monetization may incentivize industry adoption of
XAI in a variety of consumer applications, it may also conflict with the
original legal and ethical justifications for developing XAI. We conclude by
discussing whether there may be ways to responsibly and democratically harness
the potential of monetized XAI to provide greater consumer access to
algorithmic explanations
Modelling, Monitoring, Control and Optimization for Complex Industrial Processes
This reprint includes 22 research papers and an editorial, collected from the Special Issue "Modelling, Monitoring, Control and Optimization for Complex Industrial Processes", highlighting recent research advances and emerging research directions in complex industrial processes. This reprint aims to promote the research field and benefit the readers from both academic communities and industrial sectors
ProGAP: Progressive Graph Neural Networks with Differential Privacy Guarantees
Graph Neural Networks (GNNs) have become a popular tool for learning on
graphs, but their widespread use raises privacy concerns as graph data can
contain personal or sensitive information. Differentially private GNN models
have been recently proposed to preserve privacy while still allowing for
effective learning over graph-structured datasets. However, achieving an ideal
balance between accuracy and privacy in GNNs remains challenging due to the
intrinsic structural connectivity of graphs. In this paper, we propose a new
differentially private GNN called ProGAP that uses a progressive training
scheme to improve such accuracy-privacy trade-offs. Combined with the
aggregation perturbation technique to ensure differential privacy, ProGAP
splits a GNN into a sequence of overlapping submodels that are trained
progressively, expanding from the first submodel to the complete model.
Specifically, each submodel is trained over the privately aggregated node
embeddings learned and cached by the previous submodels, leading to an
increased expressive power compared to previous approaches while limiting the
incurred privacy costs. We formally prove that ProGAP ensures edge-level and
node-level privacy guarantees for both training and inference stages, and
evaluate its performance on benchmark graph datasets. Experimental results
demonstrate that ProGAP can achieve up to 5%-10% higher accuracy than existing
state-of-the-art differentially private GNNs
- …