2,893 research outputs found
MOLIERE: Automatic Biomedical Hypothesis Generation System
Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. Typically, these systems operate on domain-specific fractions of public medical data. MOLIERE, in contrast, utilizes information from over 24.5 million documents. At the heart of our approach lies a multi-modal and multi-relational network of biomedical objects extracted from several heterogeneous datasets from the National Center for Biotechnology Information (NCBI). These objects include but are not limited to scientific papers, keywords, genes, proteins, diseases, and diagnoses. We model hypotheses using Latent Dirichlet Allocation applied on abstracts found near shortest paths discovered within this network, and demonstrate the effectiveness of MOLIERE by performing hypothesis generation on historical data. Our network, implementation, and resulting data are all publicly available for the broad scientific community
MOLIERE: Automatic Biomedical Hypothesis Generation System
Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. Typically, these systems operate on domain-specific fractions of public medical data. MOLIERE, in contrast, utilizes information from over 24.5 million documents. At the heart of our approach lies a multi-modal and multi-relational network of biomedical objects extracted from several heterogeneous datasets from the National Center for Biotechnology Information (NCBI). These objects include but are not limited to scientific papers, keywords, genes, proteins, diseases, and diagnoses. We model hypotheses using Latent Dirichlet Allocation applied on abstracts found near shortest paths discovered within this network, and demonstrate the effectiveness of MOLIERE by performing hypothesis generation on historical data. Our network, implementation, and resulting data are all publicly available for the broad scientific community
Psychopower and Ordinary Madness: Reticulated Dividuals in Cognitive Capitalism
Despite the seemingly neutral vantage of using nature for widely-distributed computational purposes, neither post-biological nor post-humanist teleology simply concludes with the real "end of nature" as entailed in the loss of the specific ontological status embedded in the identifier "natural." As evinced by the ecological crises of the Anthropocene—of which the 2019 Brazil Amazon rainforest fires are only the most recent—our epoch has transfixed the “natural order" and imposed entropic artificial integration, producing living species that become “anoetic,” made to serve as automated exosomatic residues, or digital flecks. I further develop Gilles Deleuze’s description of control societies to upturn Foucauldian biopower, replacing its spacio-temporal bounds with the exographic excesses in psycho-power; culling and further detailing Bernard Stiegler’s framework of transindividuation and hyper-control, I examine how becoming-subject is predictively facilitated within cognitive capitalism and what Alexander Galloway terms “deep digitality.” Despite the loss of material vestiges qua virtualization—which I seek to trace in an historical review of industrialization to postindustrialization—the drive-based and reticulated "internet of things" facilitates a closed loop from within the brain to the outside environment, such that the aperture of thought is mediated and compressed. The human brain, understood through its material constitution, is susceptible to total datafication’s laminated process of “becoming-mnemotechnical,” and, as neuroplasticity is now a valid description for deep-learning and neural nets, we are privy to the rebirth of the once-discounted metaphor of the “cybernetic brain.” Probing algorithmic governmentality while posing noetic dreaming as both technical and pharmacological, I seek to analyze how spirit is blithely confounded with machine-thinking’s gelatinous cognition, as prosthetic organ-adaptation becomes probabilistically molded, networked, and agentially inflected (rather than simply externalized)
Model-based quality assurance of instrumented context-free systems
The ever-growing complexity of today’s software and hardware systems makes quality assurance (QA) a challenging task. Abstraction is a key technique for dealing with this complexity because it allows one to skip non-essential properties of a system and focus on the important ones. Crucial for the success of this approach is the availability of adequate abstraction models that strike a fine balance between simplicity and expressiveness.
This thesis presents the formalisms of systems of procedural automata (SPAs), systems of behavioral automata (SBAs), and systems of procedural Mealy machines (SPMMs). The three model types describe systems which consist of multiple procedures that can mutually call each other, including recursion. While the individual procedures are described by regular automata and therefore are easy to understand, the aggregation of procedures towards systems captures the semantics of context-free systems, offering the expressiveness necessary for representing procedural systems.
A central concept of the proposed model types is an instrumentation that exposes the internal structure of systems by making calls to and returns from procedures observable. This instrumentation allows for a notion of rigorous (de-) composition which enables a translation between local (procedural) views and global (holistic) views on a system. On the basis of this translation, this thesis presents algorithms for the verification, testing, and learning of (instrumented) context-free systems, covering a broad spectrum of practical QA tasks. Starting with SPAs as a “base” formalism for context-free systems, the flexibility of this concept is shown by including features such as prefix-closure (SBAs) and dialog-based transductions (SPMMs).
In a comparison with related formalisms, this thesis shows that the simplicity of the proposed model types not only increases the understandability of models but can also improve the performance of QA tasks. This makes SPAs, SBAs, and SPMMs a powerful tool for tackling the practical challenges of assuring the quality of today’s software and hardware systems
End-user feature labeling: a locally-weighted regression approach
When intelligent interfaces, such as intelligent desktop assistants, email classifiers, and recommender systems, customize themselves to a particular end user, such customizations can decrease productivity and increase frustration due to inaccurate predictions - especially in early stages, when training data is limited. The end user can improve the learning algorithm by tediously labeling a substantial amount of additional training data, but this takes time and is too ad hoc to target a particular area of inaccuracy. To solve this problem, we propose a new learning algorithm based on locally weighted regression for feature labeling by end users, enabling them to point out which features are important for a class, rather than provide new training instances. In our user study, the first allowing ordinary end users to freely choose features to label directly from text documents, our algorithm was both more effective than others at leveraging end users' feature labels to improve the learning algorithm, and more robust to real users' noisy feature labels. These results strongly suggest that allowing users to freely choose features to label is a promising method for allowing end users to improve learning algorithms effectively
Recommended from our members
End-user feature labeling: Supervised and semi-supervised approaches based on locally-weighted logistic regression
When intelligent interfaces, such as intelligent desktop assistants, email classifiers, and recommender systems, customize themselves to a particular end user, such customizations can decrease productivity and increase frustration due to inaccurate predictions — especially in early stages when training data is limited. The end user ca
improve the learning algorithm by tediously labeling a substantial amount of additional training data, but this takes time and is too ad hoc to target a particular area of inaccuracy. To solve this problem, we propose new supervised and semi-supervised learning algorithms based on locally weighted logistic regression for feature labeling by end users, enabling them to point out which features are important for a class, rather than provide new training instances.
We first evaluate our algorithms against other feature labeling algorithms under idealized conditions using feature labels generated by an oracle. In addition, another of our contributions is an evaluation of feature labeling algorithms under real world conditions using feature labels harvested from actual end users in our user study. Our user study is the first statistical user study for feature labeling involving a large number of end users (43 participants), all of whom have no background in machine learning.
Our supervised and semi-supervised algorithms were among
the best performers when compared to other feature labeling algorithms in the idealized setting and they are also robust to poor quality feature labels provided by ordinary
end users in our study. We also perform an analysis to investigate the relative gains of incorporating the different sources of knowledge available in the labeled training set, the feature labels and the unlabeled data. Together, our results strongly suggest that feature labeling by end users is both viable and effective for allowing end users to improve the learning algorithm behind their customized applications
- …