13 research outputs found
Deep learning-based cross-classifications reveal conserved spatial behaviors within tumor histological images.
Histopathological images are a rich but incompletely explored data type for studying cancer. Manual inspection is time consuming, making it challenging to use for image data mining. Here we show that convolutional neural networks (CNNs) can be systematically applied across cancer types, enabling comparisons to reveal shared spatial behaviors. We develop CNN architectures to analyze 27,815 hematoxylin and eosin scanned images from The Cancer Genome Atlas for tumor/normal, cancer subtype, and mutation classification. Our CNNs are able to classify TCGA pathologist-annotated tumor/normal status of whole slide images (WSIs) in 19 cancer types with consistently high AUCs (0.995 ± 0.008), as well as subtypes with lower but significant accuracy (AUC 0.87 ± 0.1). Remarkably, tumor/normal CNNs trained on one tissue are effective in others (AUC 0.88 ± 0.11), with classifier relationships also recapitulating known adenocarcinoma, carcinoma, and developmental biology. Moreover, classifier comparisons reveal intra-slide spatial similarities, with an average tile-level correlation of 0.45 ± 0.16 between classifier pairs. Breast cancers, bladder cancers, and uterine cancers have spatial patterns that are particularly easy to detect, suggesting these cancers can be canonical types for image analysis. Patterns for TP53 mutations can also be detected, with WSI self- and cross-tissue AUCs ranging from 0.65-0.80. Finally, we comparatively evaluate CNNs on 170 breast and colon cancer images with pathologist-annotated nuclei, finding that both cellular and intercellular regions contribute to CNN accuracy. These results demonstrate the power of CNNs not only for histopathological classification, but also for cross-comparisons to reveal conserved spatial behaviors across tumors
Deep learning-based cross-classifications reveal conserved spatial behaviors within tumor histological images
Histopathological images are a rich but incompletely explored data type for studying cancer. Manual inspection is time consuming, making it challenging to use for image data mining. Here we show that convolutional neural networks (CNNs) can be systematically applied across cancer types, enabling comparisons to reveal shared spatial behaviors. We develop CNN architectures to analyze 27,815 hematoxylin and eosin scanned images from The Cancer Genome Atlas for tumor/normal, cancer subtype, and mutation classification. Our CNNs are able to classify TCGA pathologist-annotated tumor/normal status of whole slide images (WSIs) in 19 cancer types with consistently high AUCs (0.995 ± 0.008), as well as subtypes with lower but significant accuracy (AUC 0.87 ± 0.1). Remarkably, tumor/normal CNNs trained on one tissue are effective in others (AUC 0.88 ± 0.11), with classifier relationships also recapitulating known adenocarcinoma, carcinoma, and developmental biology. Moreover, classifier comparisons reveal intra-slide spatial similarities, with an average tile-level correlation of 0.45 ± 0.16 between classifier pairs. Breast cancers, bladder cancers, and uterine cancers have spatial patterns that are particularly easy to detect, suggesting these cancers can be canonical types for image analysis. Patterns for TP53 mutations can also be detected, with WSI self- and cross-tissue AUCs ranging from 0.65-0.80. Finally, we comparatively evaluate CNNs on 170 breast and colon cancer images with pathologist-annotated nuclei, finding that both cellular and intercellular regions contribute to CNN accuracy. These results demonstrate the power of CNNs not only for histopathological classification, but also for cross-comparisons to reveal conserved spatial behaviors across tumors.R01 CA230031 - NCI NIH HHSPublished versio
Recommended from our members
Sequential Adaptation through Prediction of Structured Climate Risk
Infrastructure systems around the world face immediate crises and smoldering long-term challenges. Consequently, system owners and managers must balance the need to repair and replace the aging and deteriorating systems already in place against the need for transformative investments in deep decarbonization, climate adaptation, and transportation that will enable long-term competitiveness. Complicating these decisions are deep uncertainties, finite resources, and competing objectives.
These challenges motivate the integration of “hard” investments in physical infrastructure with “soft” instruments like insurance, land use policy, and ecosystem restoration that can improve service, shrink costs, scale up or down as future needs require, and reduce vulnerability to population loss and economic contraction. A critical advantage of soft instruments is that they enable planners to adjust, expand, or reduce them at regular intervals, unlike hard instruments which are difficult to modify once in place. As a result, soft instruments can be precisely tailored to meet near-term needs and conditions, including projections of the quasi-oscillatory, regime-like climate processes that dominate seasonal to decadal hydro-climate variability, thereby reducing the need to guess the needs and hazards of the distant future. The objective of this dissertation is to demonstrate how potentially predictable modes of structured climate variability can inform the design of soft instruments and the formulation of adaptive infrastructure system plans.
Using climate information for sequential adaptation requires developing credible projections of climate variables at relevant time scales. Part I considers the drivers of river floods in large river basins, which is used throughout this dissertation as an example of a high-impact hydroclimate extreme. First, chapter 2 opens by exploring the strengths and limitations of existing methodologies, and by developing a statistical-dynamical causal chain framework within which to consider flood risk on interannual to secular time scales. Next, chapter 3 describes the physical mechanisms responsible for heavy rainfall (90th percentile exceedance)and flooding in the Lower Paraguay River Basin (LPRB), focusing on a November-February(NDJF) 2015-16 flood event that displaced over 170 000 people. This chapter shows that:
1. persistent large-scale conditions over the South American continent during NDJF 2015-16 strengthened the South American Low-Level Jet (SALLJ), bringing warm air and moisture to South East South America (SESA), and steered the jet towards the LPRB, leading to repeated heavy rainfall events and large-scale flooding;
2. while the observed El Niño event contributed to a stronger SALLJ, the Madden-JulienOscillation (MJO) and Atlantic ocean steered the jet over the LPRB; and
3. while numerical sub-seasonal to seasonal (S2S) and seasonal models projected an elevated risk of flooding consistent with the observed El Niño event, they had limited skill at lead times greater than two weeks, suggesting that improved representation of MJO and Atlantic teleconnections could improve regional forecast skill.
Finally, chapter 4 shows how mechanistic understanding of the physical causal chain that leads to a particular hazard of interest – in this case heavy rainfall over a large area in the Ohio River Basin (ORB) – can inform future risks. Taking the GFDL coupled model, version 3 (CM3) as a representative general circulation model (GCM), this chapter shows that
1. the GCM simulates too many regional extreme precipitation (REP) events but under-simulates the occurrence of back to back REP days;
2. REP days show consistent large-scale climate anomalies leading up to the event;
3. indices describing these large-scale anomalies are well simulated by the GCM; and
4. a statistical model describing this causal chain and exploiting simulated large-scale in-dices from the GCM can be used to inform the future occurrence of REP days.
Even the best climate projections must confront epistemic uncertainties. Part II of this dissertation explores how intrinsically flawed projections should inform sequential adaptation.First, chapter5reviews approaches for planning under uncertainty, considering the role of classical decision theory, optimization, probability, and non probabilistic approaches. Next, chapter 6 considers how different physical mechanisms impart predictability at different timescales and the implications of secular, low-frequency cyclical, and high-frequency cyclical variability for selection between instruments with long and short planning periods. In particular, this chapter builds from three assertions regarding the nature of climate risk:
1. different climate risk mitigation instruments have different project lifespans;
2. climate risk varies on many scales; and
3. the processes which dominate this risk over the planning period depend on the planning period itself.
Defining M as the nominal design life of a structural or financial instrument and N as the length of the observational record (a proxy for total informational uncertainty), chapter 7 presents a series of stylized computational experiments to probe the implications of these premises. Key findings are that:
1. quasi-periodic and secular climate signals, with different identifiability and predictability, control future uncertainty and risk;
2. adaptation strategies need to consider how uncertainties in risk projections influence the success of decision pathways; and
3. stylized experiments reveal how bias and variance of climate risk projections influencerisk mitigation over a finite planning period.
Chapter 7 elaborates these findings through a didactic case study of levee heightening in the Netherlands. Integrating a conceptual model of low-frequency variability with credible projections of sea level rise, chapter 7 uses dynamic programming to co-optimize hard (levee increase) and soft (insurance) instruments. Key findings are that
1. large but distant and uncertain changes (e.g., sea level rise) do not necessarily motivate immediate investment in structural risk protection;
2. soft adaptation strategies are robust to different model structures and assumptions while hard instruments perform poorly under conditions for which they were not de-signed; and
3. increasing the hypothetical predictability of near-term climate extremes significantly lowers long-term adaptation costs.
Finally, part III seeks to unpack the conceptual experiments of parts I and II to inform policy and future research. Chapter 8 describes how constructive narratives about climate change can discourage climate fatalism. Instead, chapter 8 emphasizes that while climate change is and will be a critical stressor of infrastructure systems, individuals, communities, and regions have agency and can mitigate its consequences. Finally, chapter 9 concludes by discussing the key findings of this dissertation and exploring how future work on decision under uncertainty, technology, and earth systems science can aid the design and management of effective infrastructure services
Importance of selecting research stimuli: a comparative study of the properties, structure and validity of both standard and novel emotion elicitation techniques
The principal aim of this doctoral research has been to investigate whether various
popular methods of emotion elicitation perform differently in terms of self-reported participant
affect - and if so, whether any of them is better able to mimic real-life emotional
situations. A secondary goal has been to understand how continuous affect can be
classified into discrete categories - whether by using clustering algorithms, or resorting
to human participants for creating the classifications. A variety of research directions
subserved these main goals: firstly, developing data-driven strategies for selecting 'appropriate'
stimuli, and matching them across various stimulus modalities (i.e., words,
sounds, images,films and virtual environments / VEs); secondly, comparing the chosen
modalities on various self-report measures (with VEs assessed both with and without a
head-mounted display / HMD); thirdly, comparing how humans classify emotional information
vs. a clustering algorithm; and finally, comparing all five lab-based stimulus
modalities to emotional data collected via an experience sampling phone app. Findings
/ outputs discussed will include a matched database of stimuli geared towards lab use,
how the choice of stimulus modality may affect research results, the links (or discrepancies)
between human and machine classification of emotional information, as well as
range restriction affecting lab stimuli relative to `real-life' emotional phenomena
Dimension-reduction and discrimination of neuronal multi-channel signals
Dimensionsreduktion und Trennung neuronaler Multikanal-Signale
Applying Kernel Change Point Detection To Financial Markets
The widespread use of computers in everyday living has created a newfound reliance on data systems to support the decisions people make. From wristwatches that monitor your health to fridges that notify users of potential problems, data is constantly being streamed to help users make more informed choices. Because the data has im- mediate importance to users, techniques that analyse live data quickly and efficiently are necessary. One such group of methods are online change point detection methods. Online change point detection is concerned with identifying statistical change points in a datastream as they occur, as quickly as possible.
The focus for this thesis is on online kernel change point detection methods. Combining kernel two-sample testing and classic change point algorithms, kernel change point methods provide a robust, non-parametric way to measure changes in probability distributions on a variety of datasets and applications. We compare several kernel change point algorithms on several synthetic datasets across a range of measurements that assess online performance. We also provide a novel way to select the kernel bandwidth hyperparameter that adapts to the data in an online fashion.
Additionally, we take a look at the intraday market liquidity changes of several financial markets. We focus on futures instruments of different asset classes from the Chicago Mercantile Exchange. Data is sampled for the first four months of 2020 during which the world fell into an economic recession due to a global pandemic. An online kernel change point detection algorithm is applied to detect changes in the market liquidity distribution that are indicative of important macroeconomic events
Recommended from our members
Classifying complex topics using spatial-semantic document visualization: An evaluation of an interaction model to support open-ended search tasks
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In this dissertation we propose, test and develop a novel search interaction model to address two key problems associated with conducting an open-ended search task within a classical information retrieval system: (i) the need to reformulate the query within the context of a shifting conception of the problem and (ii) the need to integrate relevant results across a number of separate results sets. In our model the user issues just one highrecall query and then performs a sequence of more focused, distinct aspect searches by
browsing the static structured context of a spatial-semantic visualization of this retrieved
document set. Our thesis is that unsupervised spatial-semantic visualization can automatically classify retrieved documents into a two-level hierarchy of relevance. In particular we hypothesise that the locality of any given aspect exemplar will tend to comprise a sufficient proportion of same-aspect documents to support a visually guided strategy for focused, same-aspect searching that we term the aspect cluster growing
strategy. We examine spatial-semantic classification and potential aspect cluster growing performance across three scenarios derived from topics and relevance judgements from
the TREC test collection. Our analyses show that the expected classification can be represented in spatial-semantic structures created from document similarities computed by a simple vector space text analysis procedure. We compare two diametrically opposed approaches to layout optimisation: a global approach that focuses on preserving the all similarities and a local approach that focuses only on the strongest similarities. We find that the local approach, based on a minimum spanning tree of similarities, produces a better classification and, as observed from strategy simulation, more efficient aspect cluster growing performance in most situations, compared to the global approach of multidimensional scaling. We show that a small but significant proportion of aspect clustering
growing cases can be problematic, regardless of the layout algorithm used. We identify the
characteristics of these cases and, on this basis, demonstrate a set of novel interactive tools that provide additional semantic cues to aid the user in locating same-aspect documents
A quality metric to improve wrapper feature selection in multiclass subject invariant brain computer interfaces
Title from PDF of title page, viewed on June 5, 2012Dissertation advisor: Reza DerakhshaniVitaIncludes bibliographical references (p. 116-129)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2012Brain computer interface systems based on electroencephalograph (EEG) signals
have limitations which challenge their application as a practical device for general use.
The signal features generated by the brain states we wish to detect possess a high degree
of inter-subject and intra-subject variation. Additionally, these features usually exhibit a
low variation across each of the target states. Collection of EEG signals using low
resolution, non-invasive scalp electrodes further degrades the spatial resolution of these
signals. The majority of brain computer interface systems to date require extensive
training prior to use by each individual user. The discovery of subject invariant features
could reduce or even eliminate individual training requirements. To obtain suitable
subject invariant features requires search through a high dimension feature space
consisting of combinations of spatial, spectral and temporal features. Poorly separable
features can prevent the search from converging to a usable solution as a result of
degenerate classifiers. In such instances the system must detect and compensate for degenerate classifier behavior. This dissertation presents a method to accomplish this
search using a wrapper architecture comprised of a sequential forward floating search
algorithm coupled with a support vector machine classifier. This is successfully achieved
by the introduction of a scalar Quality (Q)-factor metric, calculated from the ratio of
sensitivity to specificity of the confusion matrix. This method is successfully applied to a
multiclass subject independent BCI using 10 untrained subjects performing 4 motor tasks.Introduction to brain computer interface systems -- Historical perspective and state of the art -- Experimental design -- Degeneracy in support vector machines -- Discussion of research -- Results -- Conclusion -- Appendix A. Information transfer rate -- Appendix B. Additional surface plots for individual tasks and subject
Recommended from our members
The diffusion of university spinoffs: Institutional and ecological perspectives
Spinoffs are companies based on university intellectual property established to commercialize university technology to the marketplace. The objective of this study was to examine the reasons for the rapid diffusion of spinoffs in the UK, as well as the potential effects of these companies on university resource acquisition. The study used two broad theoretical perspectives from the sociology of organizations: institutional theory and organizational ecology. It blended elements from other related perspectives such as organizational evolution and social exchange theory. Driven by the need to establish a full database of spinoffs for the first time, quantitative data collection and analysis techniques were predominantly employed. The emerging database comprised of nearly 9 million datapoints capturing the full population of university spinoffs (and their demographics) by all English and Scottish universities over a period of 15 years (1993-2007). Qualitative exploratory data collection methods were also used to supplement the design and structure of the study, including hypothesis formation. In total, 6 in-depth interviews with Technology Transfer Managers were conducted at a representative number of universities across England and Scotland. The study identified the role of certain environmental, institutional factors in shaping the decision by universities to adopt spinoff formation as a standard practice. Such factors were the role of networking, social compliance, industry associations, and media information providers. It also demonstrated that spinoff formation gradually but significantly enhanced university financial resources over time. The study finally discussed the process of coevolution of universities and spinoffs as distinct populations of organizations within the community of academic entrepreneurship. Specifically, the discussion moved towards building a new theory of “reciprocal legitimacy”