2,921 research outputs found

    Holistic analysis of the life course: Methodological challenges and new perspectives

    Get PDF
    Abstract We survey state-of-the-art approaches to study trajectories in their entirety, adopting a holistic perspective, and discuss their strengths and weaknesses. We begin by considering sequence analysis (SA), one of the most established holistic approaches. We discuss the inherent problems arising in SA, particularly in the study of the relationship between trajectories and covariates. We describe some recent developments combining SA and Event History Analysis, and illustrate how weakening the holistic perspective—focusing on sub-trajectories—might result in a more flexible analysis of life courses. We then move to some model-based approaches (included in the broad classes of multistate and of mixture latent Markov models) that further weaken the holistic perspective, assuming that the difficult task of predicting and explaining trajectories can be simplified by focusing on the collection of observed transitions. Our goal is twofold. On one hand, we aim to provide social scientists with indications for informed methodological choices and to emphasize issues that require consideration for proper application of the described approaches. On the other hand, by identifying relevant and open methodological challenges, we highlight and encourage promising directions for future research

    Advancing Precision Medicine: Unveiling Disease Trajectories, Decoding Biomarkers, and Tailoring Individual Treatments

    Get PDF
    Chronic diseases are not only prevalent but also exert a considerable strain on the healthcare system, individuals, and communities. Nearly half of all Americans suffer from at least one chronic disease, which is still growing. The development of machine learning has brought new directions to chronic disease analysis. Many data scientists have devoted themselves to understanding how a disease progresses over time, which can lead to better patient management, identification of disease stages, and targeted interventions. However, due to the slow progression of chronic disease, symptoms are barely noticed until the disease is advanced, challenging early detection. Meanwhile, chronic diseases often have diverse underlying causes and can manifest differently among patients. Besides the external factors, the development of chronic disease is also influenced by internal signals. The DNA sequence-level differences have been proven responsible for constant predisposition to chronic diseases. Given these challenges, data must be analyzed at various scales, ranging from single nucleotide polymorphisms (SNPs) to individuals and populations, to better understand disease mechanisms and provide precision medicine. Therefore, this research aimed to develop an automated pipeline from building predictive models and estimating individual treatment effects based on the structured data of general electronic health records (EHRs) to identifying genetic variations (e.g., SNPs) associated with diseases to unravel the genetic underpinnings of chronic diseases. First, we used structured EHRs to uncover chronic disease progression patterns and assess the dynamic contribution of clinical features. In this step, we employed causal inference methods (constraint-based and functional causal models) for feature selection and utilized Markov chains, attention long short-term memory (LSTM), and Gaussian process (GP). SHapley Additive exPlanations (SHAPs) and local interpretable model-agnostic explanations (LIMEs) further extended the work to identify important clinical features. Next, I developed a novel counterfactual-based method to predict individual treatment effects (ITE) from observational data. To discern a “balanced” representation so that treated and control distributions look similar, we disentangled the doctor’s preference from the covariance and rebuilt the representation of the treated and control groups. We use integral probability metrics to measure distances between distributions. The expected ITE estimation error of a representation was the sum of the standard generalization error of that representation and the distance between the distributions induced. Finally, we performed genome-wide association studies (GWAS) based on the stage information we extracted from our unsupervised disease progression model to identify the biomarkers and explore the genetic correction between the disease and its phenotypes

    Pivotal Visualization:A Design Method to Enrich Visual Exploration

    Get PDF

    Rough Set Soft Computing Cancer Classification and Network: One Stone, Two Birds

    Get PDF
    Gene expression profiling provides tremendous information to help unravel the complexity of cancer. The selection of the most informative genes from huge noise for cancer classification has taken centre stage, along with predicting the function of such identified genes and the construction of direct gene regulatory networks at different system levels with a tuneable parameter. A new study by Wang and Gotoh described a novel Variable Precision Rough Sets-rooted robust soft computing method to successfully address these problems and has yielded some new insights. The significance of this progress and its perspectives will be discussed in this article

    Text Mining of Airbnb Reviews: A holistic approach on reviewers’ opinions and topics distribution

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Marketing IntelligenceThis thesis aims to perform a holistic investigation concerning how Airbnb accommodation features and hosts’ attributes influence guest’s reviews and how are the main topics distributed. A dataset containing almost 4 million reviews from major touristic cities in the world (Milan, Lisbon, Amsterdam, Toronto, San-Francisco, and Sydney) was used for the text mining analysis to uncover the reviews’ social and market norms, as well as the guests’ sentiments and topics distribution. This research uses both Mallet LDA (Latent Dirichlet Allocation) and Word2Vec methods to unveil the semantic structure and similarity between data in this study. This approach will allow hospitality providers to understand the impact of underlying factors on reviewers’ opinions for further improvement of their services. Finally, this study develops a predictive unbiased model to forecast the review’s scores, with an accuracy of 90.70%

    Explaing portuguese's public administration absenteeism through data mining

    Get PDF
    Portuguese Public Administration (PPA) is the largest contractor in the country, with 12.8% of the Portugal’s active people working for it. Absenteeism and productivity are mutually connected. Thus, companies from public and private sector should always have it in mind, to prevent flaws in the processes and profit loss. Effectively, the main goal of this study is to understand PPA’s absenteeism, particularly the duration of the worker’s next absence, what leads to it, as well as explaining it, by creating a data mining model that fits the problem. To study PPA’s absenteeism it was collected data from a Human Capital Management (HCM) system, by extracting the annual absenteeism report, for 2016, and queries to the worker’s profile, absenteeism history and job characteristics, resulting in around 59,000 different absence records. Data mining techniques were used to clean the dataset and Recency, Frequency and Monetary (RFM) value methodology to add new variables to the problematic, originating richer information about the worker and the absence itself. Thereafter, the Support Vector Machines (SVM) algorithm was applied for modeling the absence duration in day and a 10-fold cross-validation scheme was adopted to assess and confirm the model’s robustness. Finally, major findings were revealed by this study as features related to the worker’s profile are less relevant than absence related features; the influence of the RFM methodology in this study, which managed to get all its computed variables in the 25th most important features; and the discovery of the most concerning employee profile.A Administração Pública Portuguesa (APP) é o maior contratante do país, englobando 12.8% da população ativa. O absentismo e a produtividade estão mutuamente ligados, logo tanto as empresas dos vários setores devem tê-las em atenção para prevenir falhas nos processos e perda de lucro. Efetivamente, o principal propósito deste estudo é perceber o absentismo na APP, em especial a duração da próxima ausência de um trabalhador, as suas causas e explicá-la, através da criação de um modelo adequado ao problema. Para modelar o absentismo na APP recolheram-se dados de um sistema de gestão de recursos humanos, extraindo o relatório anual de absentismo, para 2016, e dados do perfil do trabalhador, histórico de absentismo e especificações do contrato, resultando em cerca de 59,000 ausências. Por sua vez, foram usadas técnicas de data mining para limpar o conjunto de dados e a metodologia Recency, Frequency and Monetary value (RFM) para adicionar novas variáveis à problemática e obter mais perspetivas sobre o trabalhador e a ausência. De seguida, foi aplicado o algoritmo Support Vector Machines (SVM) para modelar a duração da ausência em dias e um esquema de validação cruzada com 10 folds, que testou e aprovou a robustez do modelo. Por fim, este estudo revelou várias descobertas como: variáveis relacionadas com o perfil do trabalhador são menos relevantes que as relacionadas com a ausência em si; a influência da metodologia RFM neste estudo, que conseguiu ter todas as suas variáveis nas mais importantes; e a descoberta do perfil do trabalhador mais preocupante

    Machine learning and data mining frameworks for predicting drug response in cancer:An overview and a novel <i>in silico</i> screening process based on association rule mining

    Get PDF

    An Exploration of Visual Analytic Techniques for XAI: Applications in Clinical Decision Support

    Get PDF
    Artificial Intelligence (AI) systems exhibit considerable potential in providing decision support across various domains. In this context, the methodology of eXplainable AI (XAI) becomes crucial, as it aims to enhance the transparency and comprehensibility of AI models\u27 decision-making processes. However, after a review of XAI methods and their application in clinical decision support, there exist notable gaps within the XAI methodology, particularly concerning the effective communication of explanations to users. This thesis aims to bridge these existing gaps by presenting in Chapter 3 a framework designed to communicate AI-generated explanations effectively to end-users. This is particularly pertinent in fields like healthcare, where the successful implementation of AI decision support hinges on the ability to convey actionable insights to medical professionals. Building upon this framework, subsequent chapters illustrate how visualization and visual analytics can be used with XAI in the context of clinical decision support. Chapter 4 introduces a visual analytic tool designed for ranking and triaging patients in the intensive care unit (ICU). Leveraging various XAI methods, the tool enables healthcare professionals to understand how the ranking model functions and how individual patients are prioritized. Through interactivity, users can explore influencing factors, evaluate alternate scenarios, and make informed decisions for optimal patient care. The pivotal role of transparency and comprehensibility within machine learning models is explored in Chapter 5. Leveraging the power of explainable AI techniques and visualization, it investigates the factors contributing to model performance and errors. Furthermore, it investigates scenarios in which the model outperforms, ultimately fostering user trust by shedding light on the model\u27s strengths and capabilities. Recognizing the ethical concerns associated with predictive models in health, Chapter 6 addresses potential bias and discrimination in ranking systems. By using the proposed visual analytic tool, users can assess the fairness and equity of the system, promoting equal treatment. This research emphasizes the need for unbiased decision-making in healthcare. Having developed the framework and illustrated ways of combining XAI with visual analytics in the service of clinical decision support, the thesis concludes by identifying important future directions of research in this area
    • …
    corecore