1,538 research outputs found

    Redefining Disproportionate Arrest Rates: An Exploratory Quasi-Experiment that Reassesses the Role of Skin Tone

    Get PDF
    The New York Times reported that Black Lives Matter was the third most-read subject of 2020. These articles brought to the forefront the question of disparity in arrest rates for darker-skinned people. Questioning arrest disparity is understandable because virtually everything known about disproportionate arrest rates has been a guess, and virtually all prior research on disproportionate arrest rates is questionable because of improper benchmarking (the denominator effect). Current research has highlighted the need to switch from demographic data to skin tone data and start over on disproportionate arrest rate research; therefore, this study explored the relationship between skin tone and disproportionate arrest rates. This study also sought to determine which of the three theories surrounding disproportionate arrests is most predictive of disproportionate rates. The current theories are that disproportionate arrests increase as skin tone gets darker (stereotype threat theory), disproportionate rates are different for Black and Brown people (self-categorization theory), or disproportionate rates apply equally across all darker skin colors (social dominance theory). This study used a quantitative exploratory quasi-experimental design using linear spline regression to analyze arrest rates in Alachua County, Florida, before and after the county’s mandate to reduce arrests as much as possible during the COVID-19 pandemic to protect the prison population. The study was exploratory as no previous study has used skin tone analysis to examine arrest disparity. The findings of this study redefines the understanding of the existence and nature of disparities in arrest rates and offer a solid foundation for additional studies about the relationship between disproportionate arrest rates and skin color

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Data- og ekspertdreven variabelseleksjon for prediktive modeller i helsevesenet : mot økt tolkbarhet i underbestemte maskinlæringsproblemer

    Get PDF
    Modern data acquisition techniques in healthcare generate large collections of data from multiple sources, such as novel diagnosis and treatment methodologies. Some concrete examples are electronic healthcare record systems, genomics, and medical images. This leads to situations with often unstructured, high-dimensional heterogeneous patient cohort data where classical statistical methods may not be sufficient for optimal utilization of the data and informed decision-making. Instead, investigating such data structures with modern machine learning techniques promises to improve the understanding of patient health issues and may provide a better platform for informed decision-making by clinicians. Key requirements for this purpose include (a) sufficiently accurate predictions and (b) model interpretability. Achieving both aspects in parallel is difficult, particularly for datasets with few patients, which are common in the healthcare domain. In such cases, machine learning models encounter mathematically underdetermined systems and may overfit easily on the training data. An important approach to overcome this issue is feature selection, i.e., determining a subset of informative features from the original set of features with respect to the target variable. While potentially raising the predictive performance, feature selection fosters model interpretability by identifying a low number of relevant model parameters to better understand the underlying biological processes that lead to health issues. Interpretability requires that feature selection is stable, i.e., small changes in the dataset do not lead to changes in the selected feature set. A concept to address instability is ensemble feature selection, i.e. the process of repeating the feature selection multiple times on subsets of samples of the original dataset and aggregating results in a meta-model. This thesis presents two approaches for ensemble feature selection, which are tailored towards high-dimensional data in healthcare: the Repeated Elastic Net Technique for feature selection (RENT) and the User-Guided Bayesian Framework for feature selection (UBayFS). While RENT is purely data-driven and builds upon elastic net regularized models, UBayFS is a general framework for ensembles with the capabilities to include expert knowledge in the feature selection process via prior weights and side constraints. A case study modeling the overall survival of cancer patients compares these novel feature selectors and demonstrates their potential in clinical practice. Beyond the selection of single features, UBayFS also allows for selecting whole feature groups (feature blocks) that were acquired from multiple data sources, as those mentioned above. Importance quantification of such feature blocks plays a key role in tracing information about the target variable back to the acquisition modalities. Such information on feature block importance may lead to positive effects on the use of human, technical, and financial resources if systematically integrated into the planning of patient treatment by excluding the acquisition of non-informative features. Since a generalization of feature importance measures to block importance is not trivial, this thesis also investigates and compares approaches for feature block importance rankings. This thesis demonstrates that high-dimensional datasets from multiple data sources in the medical domain can be successfully tackled by the presented approaches for feature selection. Experimental evaluations demonstrate favorable properties of both predictive performance, stability, as well as interpretability of results, which carries a high potential for better data-driven decision support in clinical practice.Moderne datainnsamlingsteknikker i helsevesenet genererer store datamengder fra flere kilder, som for eksempel nye diagnose- og behandlingsmetoder. Noen konkrete eksempler er elektroniske helsejournalsystemer, genomikk og medisinske bilder. Slike pasientkohortdata er ofte ustrukturerte, høydimensjonale og heterogene og hvor klassiske statistiske metoder ikke er tilstrekkelige for optimal utnyttelse av dataene og god informasjonsbasert beslutningstaking. Derfor kan det være lovende å analysere slike datastrukturer ved bruk av moderne maskinlæringsteknikker for å øke forståelsen av pasientenes helseproblemer og for å gi klinikerne en bedre plattform for informasjonsbasert beslutningstaking. Sentrale krav til dette formålet inkluderer (a) tilstrekkelig nøyaktige prediksjoner og (b) modelltolkbarhet. Å oppnå begge aspektene samtidig er vanskelig, spesielt for datasett med få pasienter, noe som er vanlig for data i helsevesenet. I slike tilfeller må maskinlæringsmodeller håndtere matematisk underbestemte systemer og dette kan lett føre til at modellene overtilpasses treningsdataene. Variabelseleksjon er en viktig tilnærming for å håndtere dette ved å identifisere en undergruppe av informative variabler med hensyn til responsvariablen. Samtidig som variabelseleksjonsmetoder kan lede til økt prediktiv ytelse, fremmes modelltolkbarhet ved å identifisere et lavt antall relevante modellparametere. Dette kan gi bedre forståelse av de underliggende biologiske prosessene som fører til helseproblemer. Tolkbarhet krever at variabelseleksjonen er stabil, dvs. at små endringer i datasettet ikke fører til endringer i hvilke variabler som velges. Et konsept for å adressere ustabilitet er ensemblevariableseleksjon, dvs. prosessen med å gjenta variabelseleksjon flere ganger på en delmengde av prøvene i det originale datasett og aggregere resultater i en metamodell. Denne avhandlingen presenterer to tilnærminger for ensemblevariabelseleksjon, som er skreddersydd for høydimensjonale data i helsevesenet: "Repeated Elastic Net Technique for feature selection" (RENT) og "User-Guided Bayesian Framework for feature selection" (UBayFS). Mens RENT er datadrevet og bygger på elastic net-regulariserte modeller, er UBayFS et generelt rammeverk for ensembler som muliggjør inkludering av ekspertkunnskap i variabelseleksjonsprosessen gjennom forhåndsbestemte vekter og sidebegrensninger. En case-studie som modellerer overlevelsen av kreftpasienter sammenligner disse nye variabelseleksjonsmetodene og demonstrerer deres potensiale i klinisk praksis. Utover valg av enkelte variabler gjør UBayFS det også mulig å velge blokker eller grupper av variabler som representerer de ulike datakildene som ble nevnt over. Kvantifisering av viktigheten av variabelgrupper spiller en nøkkelrolle for forståelsen av hvorvidt datakildene er viktige for responsvariablen. Tilgang til slik informasjon kan føre til at bruken av menneskelige, tekniske og økonomiske ressurser kan forbedres dersom informasjonen integreres systematisk i planleggingen av pasientbehandlingen. Slik kan man redusere innsamling av ikke-informative variabler. Siden generaliseringen av viktighet av variabelgrupper ikke er triviell, undersøkes og sammenlignes også tilnærminger for rangering av viktigheten til disse variabelgruppene. Denne avhandlingen viser at høydimensjonale datasett fra flere datakilder fra det medisinske domenet effektivt kan håndteres ved bruk av variabelseleksjonmetodene som er presentert i avhandlingen. Eksperimentene viser at disse kan ha positiv en effekt på både prediktiv ytelse, stabilitet og tolkbarhet av resultatene. Bruken av disse variabelseleksjonsmetodene bærer et stort potensiale for bedre datadrevet beslutningsstøtte i klinisk praksis

    AI: Limits and Prospects of Artificial Intelligence

    Get PDF
    The emergence of artificial intelligence has triggered enthusiasm and promise of boundless opportunities as much as uncertainty about its limits. The contributions to this volume explore the limits of AI, describe the necessary conditions for its functionality, reveal its attendant technical and social problems, and present some existing and potential solutions. At the same time, the contributors highlight the societal and attending economic hopes and fears, utopias and dystopias that are associated with the current and future development of artificial intelligence

    The Active CryoCubeSat Technology: Active Thermal Control for Small Satellites

    Get PDF
    Modern CubeSats and Small Satellites have advanced in capability to tackle science and technology missions that would usually be reserved for more traditional, large satellites. However, this rapid growth in capability is only possible through the fast-to-production, low-cost, and advanced technology approach used by modern small satellite engineers. Advanced technologies in power generation, energy storage, and high-power density electronics have naturally led to a thermal bottleneck, where CubeSats and Small Satellites can generate more power than they can easily reject. The Active CryoCubeSat (ACCS) is an advanced active thermal control technology (ATC) for Small Satellites and CubeSats, which hopes to help solve this thermal problem. The ACCS technology is based on a two-stage design. An integrated miniature cryocooler forms the first stage, and a single-phase mechanically pumped fluid loop heat exchanger the second. The ACCS leverages advanced 3D manufacturing techniques to integrate the ATC directly into the satellite structure, which helps to improve the performance while simultaneously miniaturizing and simplifying the system. The ACCS system can easily be scaled to mission requirements and can control zonal temperature, bulk thermal rejection, and dynamic heat transfer within a satellite structure. The integrated cryocooler supports cryogenic science payloads such as advanced LWIR electro-optical detectors. The ACCS hopes to enable future advanced CubeSat and Small Satellite missions in earth science, heliophysics, and deep space operations. This dissertation will detail the design, development, and testing of the ACCS system technology

    Explainable Predictive and Prescriptive Process Analytics of customizable business KPIs

    Get PDF
    Recent years have witnessed a growing adoption of machine learning techniques for business improvement across various fields. Among other emerging applications, organizations are exploiting opportunities to improve the performance of their business processes by using predictive models for runtime monitoring. Predictive analytics leverages machine learning and data analytics techniques to predict the future outcome of a process based on historical data. Therefore, the goal of predictive analytics is to identify future trends, and discover potential issues and anomalies in the process before they occur, allowing organizations to take proactive measures to prevent them from happening, optimizing the overall performance of the process. Prescriptive analytics systems go beyond purely predictive ones, by not only generating predictions but also advising the user if and how to intervene in a running process in order to improve the outcome of a process, which can be defined in various ways depending on the business goals; this can involve measuring process-specific Key Performance Indicators (KPIs), such as costs, execution times, or customer satisfaction, and using this data to make informed decisions about how to optimize the process. This Ph.D. thesis research work has focused on predictive and prescriptive analytics, with particular emphasis on providing predictions and recommendations that are explainable and comprehensible to process actors. In fact, while the priority remains on giving accurate predictions and recommendations, the process actors need to be provided with an explanation of the reasons why a given process execution is predicted to behave in a certain way and they need to be convinced that the recommended actions are the most suitable ones to maximize the KPI of interest; otherwise, users would not trust and follow the provided predictions and recommendations, and the predictive technology would not be adopted.Recent years have witnessed a growing adoption of machine learning techniques for business improvement across various fields. Among other emerging applications, organizations are exploiting opportunities to improve the performance of their business processes by using predictive models for runtime monitoring. Predictive analytics leverages machine learning and data analytics techniques to predict the future outcome of a process based on historical data. Therefore, the goal of predictive analytics is to identify future trends, and discover potential issues and anomalies in the process before they occur, allowing organizations to take proactive measures to prevent them from happening, optimizing the overall performance of the process. Prescriptive analytics systems go beyond purely predictive ones, by not only generating predictions but also advising the user if and how to intervene in a running process in order to improve the outcome of a process, which can be defined in various ways depending on the business goals; this can involve measuring process-specific Key Performance Indicators (KPIs), such as costs, execution times, or customer satisfaction, and using this data to make informed decisions about how to optimize the process. This Ph.D. thesis research work has focused on predictive and prescriptive analytics, with particular emphasis on providing predictions and recommendations that are explainable and comprehensible to process actors. In fact, while the priority remains on giving accurate predictions and recommendations, the process actors need to be provided with an explanation of the reasons why a given process execution is predicted to behave in a certain way and they need to be convinced that the recommended actions are the most suitable ones to maximize the KPI of interest; otherwise, users would not trust and follow the provided predictions and recommendations, and the predictive technology would not be adopted

    Behavior quantification as the missing link between fields: Tools for digital psychiatry and their role in the future of neurobiology

    Full text link
    The great behavioral heterogeneity observed between individuals with the same psychiatric disorder and even within one individual over time complicates both clinical practice and biomedical research. However, modern technologies are an exciting opportunity to improve behavioral characterization. Existing psychiatry methods that are qualitative or unscalable, such as patient surveys or clinical interviews, can now be collected at a greater capacity and analyzed to produce new quantitative measures. Furthermore, recent capabilities for continuous collection of passive sensor streams, such as phone GPS or smartwatch accelerometer, open avenues of novel questioning that were previously entirely unrealistic. Their temporally dense nature enables a cohesive study of real-time neural and behavioral signals. To develop comprehensive neurobiological models of psychiatric disease, it will be critical to first develop strong methods for behavioral quantification. There is huge potential in what can theoretically be captured by current technologies, but this in itself presents a large computational challenge -- one that will necessitate new data processing tools, new machine learning techniques, and ultimately a shift in how interdisciplinary work is conducted. In my thesis, I detail research projects that take different perspectives on digital psychiatry, subsequently tying ideas together with a concluding discussion on the future of the field. I also provide software infrastructure where relevant, with extensive documentation. Major contributions include scientific arguments and proof of concept results for daily free-form audio journals as an underappreciated psychiatry research datatype, as well as novel stability theorems and pilot empirical success for a proposed multi-area recurrent neural network architecture.Comment: PhD thesis cop

    Mathematical Methods and Operation Research in Logistics, Project Planning, and Scheduling

    Get PDF
    In the last decade, the Industrial Revolution 4.0 brought flexible supply chains and flexible design projects to the forefront. Nevertheless, the recent pandemic, the accompanying economic problems, and the resulting supply problems have further increased the role of logistics and supply chains. Therefore, planning and scheduling procedures that can respond flexibly to changed circumstances have become more valuable both in logistics and projects. There are already several competing criteria of project and logistic process planning and scheduling that need to be reconciled. At the same time, the COVID-19 pandemic has shown that even more emphasis needs to be placed on taking potential risks into account. Flexibility and resilience are emphasized in all decision-making processes, including the scheduling of logistic processes, activities, and projects

    Data Collection in Two-Tier IoT Networks with Radio Frequency (RF) Energy Harvesting Devices and Tags

    Get PDF
    The Internet of things (IoT) is expected to connect physical objects and end-users using technologies such as wireless sensor networks and radio frequency identification (RFID). In addition, it will employ a wireless multi-hop backhaul to transfer data collected by a myriad of devices to users or applications such as digital twins operating in a Metaverse. A critical issue is that the number of packets collected and transferred to the Internet is bounded by limited network resources such as bandwidth and energy. In this respect, IoT networks have adopted technologies such as time division multiple access (TDMA), signal interference cancellation (SIC) and multiple-input multiple-output (MIMO) in order to increase network capacity. Another fundamental issue is energy. To this end, researchers have exploited radio frequency (RF) energy-harvesting technologies to prolong the lifetime of energy constrained sensors and smart devices. Specifically, devices with RF energy harvesting capabilities can rely on ambient RF sources such as access points, television towers, and base stations. Further, an operator may deploy dedicated power beacons that serve as RF-energy sources. Apart from that, in order to reduce energy consumption, devices can adopt ambient backscattering communication technologies. Advantageously, backscattering allows devices to communicate using negligible amount of energy by modulating ambient RF signals. To address the aforementioned issues, this thesis first considers data collection in a two-tier MIMO ambient RF energy-harvesting network. The first tier consists of routers with MIMO capability and a set of source-destination pairs/flows. The second tier consists of energy harvesting devices that rely on RF transmissions from routers for energy supply. The problem is to determine a minimum-length TDMA link schedule that satisfies the traffic demand of source-destination pairs and energy demand of energy harvesting devices. It formulates the problem as a linear program (LP), and outlines a heuristic to construct transmission sets that are then used by the said LP. In addition, it outlines a new routing metric that considers the energy demand of energy harvesting devices to cope with routing requirements of IoT networks. The simulation results show that the proposed algorithm on average achieves 31.25% shorter schedules as compared to competing schemes. In addition, the said routing metric results in link schedules that are at most 24.75% longer than those computed by the LP

    Applications of graph theory to wireless networks and opinion analysis

    Get PDF
    La teoría de grafos es una rama importante dentro de la matemática discreta. Su uso ha aumentado recientemente dada la conveniencia de los grafos para estructurar datos, para analizarlos y para generarlos a través de modelos. El objetivo de esta tesis es aplicar teoría de grafos a la optimización de redes inalámbricas y al análisis de opinión. El primer conjunto de contribuciones de esta tesis versa sobre la aplicación de teoría de grafos a redes inalámbricas. El rendimiento de estas redes depende de la correcta distribución de canales de frecuencia en un espacio compartido. Para optimizar estas redes se proponen diferentes técnicas, desde la aplicación de heurísticas como simulated annealing a la negociación automática. Cualquiera de estas técnicas requiere un modelo teórico de la red inalámbrica en cuestión. Nuestro modelo de redes Wi-Fi utiliza grafos geométricos para este propósito. Los vértices representan los dispositivos de la red, sean clientes o puntos de acceso, mientras que las aristas representan las señales entre dichos dispositivos. Estos grafos son de tipo geométrico, por lo que los vértices tienen posición en el espacio, y las aristas tienen longitud. Con esta estructura y la aplicación de un modelo de propagación y de uso, podemos simular redes inalámbricas y contribuir a su optimización. Usando dicho modelo basado en grafos, hemos estudiado el efecto de la interferencia cocanal en redes Wi-Fi 4 y mostramos una mejora de rendimiento asociada a la técnica de channel bonding cuando se usa en regiones donde hay por lo menos 13 canales disponibles. Por otra parte, en esta tesis doctoral hemos aplicado teoría de grafos al análisis de opinión dentro de la línea de investigación de SensoGraph, un método con el que se realiza un análisis de opinión sobre un conjunto de elementos usando grafos de proximidad, lo que permite manejar grandes conjuntos de datos. Además, hemos desarrollado un método de análisis de opinión que emplea la asignación manual de aristas y distancias en un grafo para estudiar la similaridad entre las muestras dos a dos. Adicionalmente, se han explorado otros temas sin relación con los grafos, pero que entran dentro de la aplicación de las matemáticas a un problema de la ingeniería telemática. Se ha desarrollado un sistema de votación electrónica basado en mixnets, secreto compartido de Shamir y cuerpos finitos. Dicha propuesta ofrece un sistema de verificación numérico novedoso a la vez que mantiene las propiedades esenciales de los sistemas de votación
    corecore