213 research outputs found
Data- og ekspertdreven variabelseleksjon for prediktive modeller i helsevesenet : mot økt tolkbarhet i underbestemte maskinlæringsproblemer
Modern data acquisition techniques in healthcare generate large collections of data from multiple sources, such as novel diagnosis and treatment methodologies. Some concrete examples are electronic healthcare record systems, genomics, and medical images. This leads to situations with often unstructured, high-dimensional heterogeneous patient cohort data where classical statistical methods may not be sufficient for optimal utilization of the data and informed decision-making. Instead, investigating such data structures with modern machine learning techniques promises to improve the understanding of patient health issues and may provide a better platform for informed decision-making by clinicians. Key requirements for this purpose include (a) sufficiently accurate predictions and (b) model interpretability. Achieving both aspects in parallel is difficult, particularly for datasets with few patients, which are common in the healthcare domain. In such cases, machine learning models encounter mathematically underdetermined systems and may overfit easily on the training data. An important approach to overcome this issue is feature selection, i.e., determining a subset of informative features from the original set of features with respect to the target variable. While potentially raising the predictive performance, feature selection fosters model interpretability by identifying a low number of relevant model parameters to better understand the underlying biological processes that lead to health issues.
Interpretability requires that feature selection is stable, i.e., small changes in the dataset do not lead to changes in the selected feature set. A concept to address instability is ensemble feature selection, i.e. the process of repeating the feature selection multiple times on subsets of samples of the original dataset and aggregating results in a meta-model. This thesis presents two approaches for ensemble feature selection, which are tailored towards high-dimensional data in healthcare: the Repeated Elastic Net Technique for feature selection (RENT) and the User-Guided Bayesian Framework for feature selection (UBayFS). While RENT is purely data-driven and builds upon elastic net regularized models, UBayFS is a general framework for ensembles with the capabilities to include expert knowledge in the feature selection process via prior weights and side constraints. A case study modeling the overall survival of cancer patients compares these novel feature selectors and demonstrates their potential in clinical practice.
Beyond the selection of single features, UBayFS also allows for selecting whole feature groups (feature blocks) that were acquired from multiple data sources, as those mentioned above. Importance quantification of such feature blocks plays a key role in tracing information about the target variable back to the acquisition modalities. Such information on feature block importance may lead to positive effects on the use of human, technical, and financial resources if systematically integrated into the planning of patient treatment by excluding the acquisition of non-informative features. Since a generalization of feature importance measures to block importance is not trivial, this thesis also investigates and compares approaches for feature block importance rankings.
This thesis demonstrates that high-dimensional datasets from multiple data sources in the medical domain can be successfully tackled by the presented approaches for feature selection. Experimental evaluations demonstrate favorable properties of both predictive performance, stability, as well as interpretability of results, which carries a high potential for better data-driven decision support in clinical practice.Moderne datainnsamlingsteknikker i helsevesenet genererer store datamengder fra flere kilder, som for eksempel nye diagnose- og behandlingsmetoder. Noen konkrete eksempler er elektroniske helsejournalsystemer, genomikk og medisinske bilder. Slike pasientkohortdata er ofte ustrukturerte, høydimensjonale og heterogene og hvor klassiske statistiske metoder ikke er tilstrekkelige for optimal utnyttelse av dataene og god informasjonsbasert beslutningstaking. Derfor kan det være lovende å analysere slike datastrukturer ved bruk av moderne maskinlæringsteknikker for å øke forståelsen av pasientenes helseproblemer og for å gi klinikerne en bedre plattform for informasjonsbasert beslutningstaking. Sentrale krav til dette formålet inkluderer (a) tilstrekkelig nøyaktige prediksjoner og (b) modelltolkbarhet. Å oppnå begge aspektene samtidig er vanskelig, spesielt for datasett med få pasienter, noe som er vanlig for data i helsevesenet. I slike tilfeller må maskinlæringsmodeller håndtere matematisk underbestemte systemer og dette kan lett føre til at modellene overtilpasses treningsdataene. Variabelseleksjon er en viktig tilnærming for å håndtere dette ved å identifisere en undergruppe av informative variabler med hensyn til responsvariablen. Samtidig som variabelseleksjonsmetoder kan lede til økt prediktiv ytelse, fremmes modelltolkbarhet ved å identifisere et lavt antall relevante modellparametere. Dette kan gi bedre forståelse av de underliggende biologiske prosessene som fører til helseproblemer.
Tolkbarhet krever at variabelseleksjonen er stabil, dvs. at små endringer i datasettet ikke fører til endringer i hvilke variabler som velges. Et konsept for å adressere ustabilitet er ensemblevariableseleksjon, dvs. prosessen med å gjenta variabelseleksjon flere ganger på en delmengde av prøvene i det originale datasett og aggregere resultater i en metamodell. Denne avhandlingen presenterer to tilnærminger for ensemblevariabelseleksjon, som er skreddersydd for høydimensjonale data i helsevesenet: "Repeated Elastic Net Technique for feature selection" (RENT) og "User-Guided Bayesian Framework for feature selection" (UBayFS). Mens RENT er datadrevet og bygger på elastic net-regulariserte modeller, er UBayFS et generelt rammeverk for ensembler som muliggjør inkludering av ekspertkunnskap i variabelseleksjonsprosessen gjennom forhåndsbestemte vekter og sidebegrensninger. En case-studie som modellerer overlevelsen av kreftpasienter sammenligner disse nye variabelseleksjonsmetodene og demonstrerer deres potensiale i klinisk praksis.
Utover valg av enkelte variabler gjør UBayFS det også mulig å velge blokker eller grupper av variabler som representerer de ulike datakildene som ble nevnt over. Kvantifisering av viktigheten av variabelgrupper spiller en nøkkelrolle for forståelsen av hvorvidt datakildene er viktige for responsvariablen. Tilgang til slik informasjon kan føre til at bruken av menneskelige, tekniske og økonomiske ressurser kan forbedres dersom informasjonen integreres systematisk i planleggingen av pasientbehandlingen. Slik kan man redusere innsamling av ikke-informative variabler. Siden generaliseringen av viktighet av variabelgrupper ikke er triviell, undersøkes og sammenlignes også tilnærminger for rangering av viktigheten til disse variabelgruppene.
Denne avhandlingen viser at høydimensjonale datasett fra flere datakilder fra det medisinske domenet effektivt kan håndteres ved bruk av variabelseleksjonmetodene som er presentert i avhandlingen. Eksperimentene viser at disse kan ha positiv en effekt på både prediktiv ytelse, stabilitet og tolkbarhet av resultatene. Bruken av disse variabelseleksjonsmetodene bærer et stort potensiale for bedre datadrevet beslutningsstøtte i klinisk praksis
Survey on Leveraging Uncertainty Estimation Towards Trustworthy Deep Neural Networks: The Case of Reject Option and Post-training Processing
Although neural networks (especially deep neural networks) have achieved
\textit{better-than-human} performance in many fields, their real-world
deployment is still questionable due to the lack of awareness about the
limitation in their knowledge. To incorporate such awareness in the machine
learning model, prediction with reject option (also known as selective
classification or classification with abstention) has been proposed in
literature. In this paper, we present a systematic review of the prediction
with the reject option in the context of various neural networks. To the best
of our knowledge, this is the first study focusing on this aspect of neural
networks. Moreover, we discuss different novel loss functions related to the
reject option and post-training processing (if any) of network output for
generating suitable measurements for knowledge awareness of the model. Finally,
we address the application of the rejection option in reducing the prediction
time for the real-time problems and present a comprehensive summary of the
techniques related to the reject option in the context of extensive variety of
neural networks. Our code is available on GitHub:
\url{https://github.com/MehediHasanTutul/Reject_option
Resilient and Scalable Forwarding for Software-Defined Networks with P4-Programmable Switches
Traditional networking devices support only fixed features and limited configurability.
Network softwarization leverages programmable software and hardware platforms to remove those limitations.
In this context the concept of programmable data planes allows directly to program the packet processing pipeline of networking devices and create custom control plane algorithms.
This flexibility enables the design of novel networking mechanisms where the status quo struggles to meet high demands of next-generation networks like 5G, Internet of Things, cloud computing, and industry 4.0.
P4 is the most popular technology to implement programmable data planes.
However, programmable data planes, and in particular, the P4 technology, emerged only recently.
Thus, P4 support for some well-established networking concepts is still lacking and several issues remain unsolved due to the different characteristics of programmable data planes in comparison to traditional networking.
The research of this thesis focuses on two open issues of programmable data planes.
First, it develops resilient and efficient forwarding mechanisms for the P4 data plane as there are no satisfying state of the art best practices yet.
Second, it enables BIER in high-performance P4 data planes.
BIER is a novel, scalable, and efficient transport mechanism for IP multicast traffic which has only very limited support of high-performance forwarding platforms yet.
The main results of this thesis are published as 8 peer-reviewed and one post-publication peer-reviewed publication. The results cover the development of suitable resilience mechanisms for P4 data planes, the development and implementation of resilient BIER forwarding in P4, and the extensive evaluations of all developed and implemented mechanisms. Furthermore, the results contain a comprehensive P4 literature study.
Two more peer-reviewed papers contain additional content that is not directly related to the main results.
They implement congestion avoidance mechanisms in P4 and develop a scheduling concept to find cost-optimized load schedules based on day-ahead forecasts
Coventry UK City of Culture 2021 through a Headphone Verbatim lens:A Study into Civic Pride
Through a headphone verbatim ‘lens’ this study investigates the impact that a UK City of Culture (UKCC) programme of work may have on the behavioural manifestations of ‘civic pride’ amongst residents of Coventry before, and during its titular year in 2021, conducted through a practice research methodology. In particular, the thesis explores civic pride and how it manifests as a measurement of ‘success’ when considering indicators of social change (Collins 2017). As a result, I have identified key behavioural dimensions that indicate how levels of civic pride amongst citizens might shift and change during a UKCC year. I discuss how evaluation reports and studies into past UKCC programmes have discourses rooted in research fields such as sociology, human geography, and economic disciplines (Myerscough 1992, Garcia 2005, Derry City & Strabane District Council 2016, Culture, Place & Policy Institute 2019). Of these, the majority have used quantitative methods with few examples of qualitative studies. Further, of the few qualitative studies, key stakeholders appear to be the focus, whereas community-based participants and citizens are rarely given a voice. This, I argue places limits on evidence that can better inform cultural policymakers and programme evaluators alike. Crucially, to date there are virtually no arts-based practice research studies that explore the impact of a UKCC programme on host city residents. This thesis responds directly to that gap by arguing for, and giving an account of the development of, an experimental practice research design and output model. I introduce this model as ‘Evaluative Performance’, a term that encapsulates this project’s utilization of theatre practice, and specifically headphone verbatim as a useful, and innovative way to collect, analyse and communicate personal stories of citizens that are currently lacking in research evaluations, and wider policymaking agendas (UK Civil Service 2021). I provide an account of my testing of this model through the development of a piece of practice research by way of headphone verbatim performance. Throughout this testing, I question contemporary considerations of ‘authenticity’ by drawing on scholarly accounts of philosophical thought (Lyotard 1984), and headphone verbatim practice (Fisher 2011, Kinghorn 2017, Schulze 2017). The project makes a novel contribution to knowledge surrounding evaluation practices, specifically on the importance of the affectual experience of citizens when investigating civic pride through arts-based methodologies. Reciprocally, by taking headphone verbatim out of a traditional storytelling mode, I offer new insights to the application of practice research and headphone verbatim scholarship. These contributions, briefly, are i) understandings of what civic pride means, within the context of UKCC programmes; ii) understanding the wider affordances and potential applications for headphone verbatim within an experimental evaluative context; and iii) the value and importance of participatory engagement in the production of performance for public engagement in evaluation processes.</p
Territorial Stigmatisation: Urban Renewal and Displacement in a Central Istanbul Neighbourhood
In Tarlabasi, an Istanbul neighbourhood facing massive redevelopment and displacement, marginalised residents speak about belonging, stigma, and what their community means to them. Based on a long-term ethnographic study that includes interviews, photographs, and archival research, Constanze Letsch examines how territorial stigmatisation is weaponised by the state and how differently stigmatised groups try to fight against the vilification of their neighbourhood. The contested plans of urban renewal threaten not only their homes and workplaces but a rapidly vanishing Istanbul: socio-demographic interdependencies and networks that have developed over decades
Optimization of 5G Second Phase Heterogeneous Radio Access Networks with Small Cells
Due to the exponential increase in high data-demanding applications and their services per
coverage area, it is becoming challenging for the existing cellular network to handle the massive
sum of users with their demands. It is conceded to network operators that the current
wireless network may not be capable to shelter future traffic demands. To overcome the challenges
the operators are taking interest in efficiently deploying the heterogeneous network.
Currently, 5G is in the commercialization phase. Network evolution with addition of small
cells will develop the existing wireless network with its enriched capabilities and innovative
features. Presently, the 5G global standardization has introduced the 5G New Radio (NR) under
the 3rd Generation Partnership Project (3GPP). It can support a wide range of frequency
bands (<6 GHz to 100 GHz).
For different trends and verticals, 5G NR encounters, functional splitting and its cost evaluation
are well-thought-out. The aspects of network slicing to the assessment of the business
opportunities and allied standardization endeavours are illustrated. The study explores the
carrier aggregation (Pico cellular) technique for 4G to bring high spectral efficiency with the
support of small cell massification while benefiting from statistical multiplexing gain. One
has been able to obtain values for the goodput considering CA in LTE-Sim (4G), of 40 Mbps
for a cell radius of 500 m and of 29 Mbps for a cell radius of 50 m, which is 3 times higher
than without CA scenario (2.6 GHz plus 3.5 GHz frequency bands).
Heterogeneous networks have been under investigation for many years. Heterogeneous network
can improve users service quality and resource utilization compared to homogeneous
networks. Quality of service can be enhanced by putting the small cells (Femtocells or Picocells)
inside the Microcells or Macrocells coverage area. Deploying indoor Femtocells for 5G
inside the Macro cellular network can reduce the network cost. Some service providers have
started their solutions for indoor users but there are still many challenges to be addressed.
The 5G air-simulator is updated to deploy indoor Femto-cell with proposed assumptions with
uniform distribution. For all the possible combinations of apartments side length and transmitter
power, the maximum number of supported numbers surpassed the number of users
by more than two times compared to papers mentioned in the literature. Within outdoor environments,
this study also proposed small cells optimization by putting the Pico cells within
a Macro cell to obtain low latency and high data rate with the statistical multiplexing gain of
the associated users.
Results are presented 5G NR functional split six and split seven, for three frequency bands
(2.6 GHz, 3.5GHz and 5.62 GHz). Based on the analysis for shorter radius values, the best
is to select the 2.6 GHz to achieve lower PLR and to support a higher number of users, with
better goodput, and higher profit (for cell radius u to 400 m). In 4G, with CA, from the
analysis of the economic trade-off with Picocell, the Enhanced multi-band scheduler EMBS
provide higher revenue, compared to those without CA. It is clearly shown that the profit of
CA is more than 4 times than in the without CA scenario. This means that the slight increase
in the cost of CA gives back more than 4-time profit relatively to the ”without” CA scenario.Devido ao aumento exponencial de aplicações/serviços de elevado débito por unidade de
área, torna-se bastante exigente, para a rede celular existente, lidar com a enormes quantidades
de utilizadores e seus requisitos. É reconhecido que as redes móveis e sem fios atuais
podem não conseguir suportar a procura de tráfego junto dos operadores. Para responder
a estes desafios, os operadores estão-se a interessar pelo desenvolvimento de redes heterogéneas
eficientes. Atualmente, a 5G está na fase de comercialização. A evolução destas
redes concretizar-se-á com a introdução de pequenas células com aptidões melhoradas e
características inovadoras. No presente, os organismos de normalização da 5G globais introduziram
os Novos Rádios (NR) 5G no contexto do 3rd Generation Partnership Project
(3GPP). A 5G pode suportar uma gama alargada de bandas de frequência (<6 a 100 GHz).
Abordam-se as divisões funcionais e avaliam-se os seus custos para as diferentes tendências
e verticais dos NR 5G. Ilustram-se desde os aspetos de particionamento funcional da rede à
avaliação das oportunidades de negócio, aliadas aos esforços de normalização. Exploram-se
as técnicas de agregação de espetro (do inglês, CA) para pico células, em 4G, a disponibilização
de eficiência espetral, com o suporte da massificação de pequenas células, e o ganho
de multiplexagem estatística associado. Obtiveram-se valores do débito binário útil, considerando
CA no LTE-Sim (4G), de 40 e 29 Mb/s para células de raios 500 e 50 m, respetivamente,
três vezes superiores em relação ao caso sem CA (bandas de 2.6 mais 3.5 GHz).
Nas redes heterogéneas, alvo de investigação há vários anos, a qualidade de serviço e a utilização
de recursos podem ser melhoradas colocando pequenas células (femto- ou pico-células)
dentro da área de cobertura de micro- ou macro-células). O desenvolvimento de pequenas
células 5G dentro da rede com macro-células pode reduzir os custos da rede. Alguns prestadores
de serviços iniciaram as suas soluções para ambientes de interior, mas ainda existem
muitos desafios a ser ultrapassados. Atualizou-se o 5G air simulator para representar a
implantação de femto-células de interior com os pressupostos propostos e distribuição espacial
uniforme. Para todas as combinações possíveis do comprimento lado do apartamento, o
número máximo de utilizadores suportado ultrapassou o número de utilizadores suportado
(na literatura) em mais de duas vezes. Em ambientes de exterior, propuseram-se pico-células
no interior de macro-células, de forma a obter atraso extremo-a-extremo reduzido e taxa de
transmissão dados elevada, resultante do ganho de multiplexagem estatística associado.
Apresentam-se resultados para as divisões funcionais seis e sete dos NR 5G, para 2.6 GHz,
3.5GHz e 5.62 GHz. Para raios das células curtos, a melhor solução será selecionar a banda
dos 2.6 GHz para alcançar PLR (do inglês, PLR) reduzido e suportar um maior número de
utilizadores, com débito binário útil e lucro mais elevados (para raios das células até 400 m).
Em 4G, com CA, da análise do equilíbrio custos-proveitos com pico-células, o escalonamento
multi-banda EMBS (do inglês, Enhanced Multi-band Scheduler) disponibiliza proveitos superiores
em comparação com o caso sem CA. Mostra-se claramente que lucro com CA é mais
de quatro vezes superior do que no cenário sem CA, o que significa que um aumento ligeiro
no custo com CA resulta num aumento de 4-vezes no lucro relativamente ao cenário sem CA
- …