218 research outputs found

    Diagnosis and Prognosis of Occupational disorders based on Machine Learn- ing Techniques applied to Occupational Profiles

    Get PDF
    Work-related disorders have a global influence on people’s well-being and quality of life and are a financial burden for organizations because they reduce productivity, increase absenteeism, and promote early retirement. Work-related musculoskeletal disorders, in particular, represent a significant fraction of the total in all occupational contexts. In automotive and industrial settings where workers are exposed to work-related muscu- loskeletal disorders risk factors, occupational physicians are responsible for monitoring workers’ health protection profiles. Occupational technicians report in the Occupational Health Protection Profiles database to understand which exposure to occupational work- related musculoskeletal disorder risk factors should be ensured for a given worker. Occu- pational Health Protection Profiles databases describe the occupational physician states, and which exposure the physicians considers necessary to ensure the worker’s health protection in terms of their functional work ability. The application of Human-Centered explainable artificial intelligence can support the decision making to go from worker’s Functional Work Ability to explanations by integrating explainability into medical (re- striction) and supporting in two decision contexts: prognosis and diagnosis of individual, work related and organizational risk condition. Although previous machine learning ap- proaches provided good predictions, their application in an actual occupational setting is limited because their predictions are difficult to interpret and hence, not actionable. In this thesis, injured body parts in which the ability changed in a worker’s functional work ability status are targeted. On the one hand, artificial intelligence algorithms can help technical teams, occupational physicians, and ergonomists determine a worker’s workplace risk via the diagnosis and prognosis of body part(s) injuries; on the other hand, these approaches can help prevent work-related musculoskeletal disorders by identifying which processes are lacking in working condition improvement and which workplaces have a better match between the remaining functional work abilities. A sample of 2025 for the prognosis part (from the years of 2019 to 2020) and 7857 for the prognosis part of Occupational Health Protection Profiles based on Functional Work Ability textual re- ports in the Portuguese language in automotive industry factory. Machine learning-based Natural Language Processing methods were implemented to extract standardized infor- mation. The prognosis and diagnosis of Occupational Health Protection Profiles factors were developed in reliable Human-Centered explainable artificial intelligence system to promote a trustworthy Human-Centered explainable artificial intelligence system (enti- tled Industrial microErgo application). The most suitable regression models to predict the next medical appointment for the injured body regions were the models based on CatBoost regression, with R square and an RMSLE of 0.84 and 1.23 weeks, respectively. In parallel, CatBoost’s best regression model for most body parts is the prediction of the next injured body parts based on these two errors. This information can help tech- nical industrial teams understand potential risk factors for Occupational Health Protec- tion Profiles and identify warning signs of the early stages of musculoskeletal disorders.Os transtornos relacionados ao trabalho têm influência global no bem-estar e na quali- dade de vida das pessoas e são um ônus financeiro para as organizações, pois reduzem a produtividade, aumentam o absenteísmo e promovem a aposentadoria precoce. Os distúr- bios osteomusculares relacionados ao trabalho, em particular, representam uma fração significativa do total em todos os contextos ocupacionais. Em ambientes automotivos e industriais onde os trabalhadores estão expostos a fatores de risco de distúrbios osteomus- culares relacionados ao trabalho, os médicos do trabalho são responsáveis por monitorar os perfis de proteção à saúde dos trabalhadores. Os técnicos do trabalho reportam-se à base de dados dos Perfis de Proteção da Saúde Ocupacional para compreender quais os fatores de risco de exposição a perturbações músculo-esqueléticas relacionadas com o tra- balho que devem ser assegurados para um determinado trabalhador. As bases de dados de Perfis de Proteção à Saúde Ocupacional descrevem os estados do médico do trabalho e quais exposições os médicos consideram necessária para garantir a proteção da saúde do trabalhador em termos de sua capacidade funcional para o trabalho. A aplicação da inteligência artificial explicável centrada no ser humano pode apoiar a tomada de decisão para ir da capacidade funcional de trabalho do trabalhador às explicações, integrando a explicabilidade à médica (restrição) e apoiando em dois contextos de decisão: prognóstico e diagnóstico da condição de risco individual, relacionado ao trabalho e organizacional . Embora as abordagens anteriores de aprendizado de máquina tenham fornecido boas pre- visões, sua aplicação em um ambiente ocupacional real é limitada porque suas previsões são difíceis de interpretar e portanto, não acionável. Nesta tese, as partes do corpo lesiona- das nas quais a habilidade mudou no estado de capacidade funcional para o trabalho do trabalhador são visadas. Por um lado, os algoritmos de inteligência artificial podem aju- dar as equipes técnicas, médicos do trabalho e ergonomistas a determinar o risco no local de trabalho de um trabalhador por meio do diagnóstico e prognóstico de lesões em partes do corpo; por outro lado, essas abordagens podem ajudar a prevenir distúrbios muscu- loesqueléticos relacionados ao trabalho, identificando quais processos estão faltando na melhoria das condições de trabalho e quais locais de trabalho têm uma melhor correspon- dência entre as habilidades funcionais restantes do trabalho. Para esta tese, foi utilizada uma base de dados com Perfis de Proteção à Saúde Ocupacional, que se baseiam em relató- rios textuais de Aptidão para o Trabalho em língua portuguesa, de uma fábrica da indús- tria automóvel (Auto Europa). Uma amostra de 2025 ficheiros foi utilizada para a parte de prognóstico (de 2019 a 2020) e uma amostra de 7857 ficheiros foi utilizada para a parte de diagnóstico. . Aprendizado de máquina- métodos baseados em Processamento de Lingua- gem Natural foram implementados para extrair informações padronizadas. O prognóstico e diagnóstico dos fatores de Perfis de Proteção à Saúde Ocupacional foram desenvolvidos em um sistema confiável de inteligência artificial explicável centrado no ser humano (inti- tulado Industrial microErgo application). Os modelos de regressão mais adequados para prever a próxima consulta médica para as regiões do corpo lesionadas foram os modelos baseados na regressão CatBoost, com R quadrado e RMSLE de 0,84 e 1,23 semanas, res- pectivamente. Em paralelo, a previsão das próximas partes do corpo lesionadas com base nesses dois erros relatados pelo CatBoost como o melhor modelo de regressão para a mai- oria das partes do corpo. Essas informações podem ajudar as equipes técnicas industriais a entender os possíveis fatores de risco para os Perfis de Proteção à Saúde Ocupacio- nal e identificar sinais de alerta dos estágios iniciais de distúrbios musculoesqueléticos

    2023 SDSU Data Science Symposium Presentation Abstracts

    Get PDF
    This document contains abstracts for presentations and posters 2023 SDSU Data Science Symposium

    2023 SDSU Data Science Symposium Presentation Abstracts

    Get PDF
    This document contains abstracts for presentations and posters 2023 SDSU Data Science Symposium

    Large-Scale Pattern-Based Information Extraction from the World Wide Web

    Get PDF
    Extracting information from text is the task of obtaining structured, machine-processable facts from information that is mentioned in an unstructured manner. It thus allows systems to automatically aggregate information for further analysis, efficient retrieval, automatic validation, or appropriate visualization. This work explores the potential of using textual patterns for Information Extraction from the World Wide Web

    On link predictions in complex networks with an application to ontologies and semantics

    Get PDF
    It is assumed that ontologies can be represented and treated as networks and that these networks show properties of so-called complex networks. Just like ontologies “our current pictures of many networks are substantially incomplete” (Clauset et al., 2008, p. 3ff.). For this reason, networks have been analyzed and methods for identifying missing edges have been proposed. The goal of this thesis is to show how treating and understanding an ontology as a network can be used to extend and improve existing ontologies, and how measures from graph theory and techniques developed in social network analysis and other complex networks in recent years can be applied to semantic networks in the form of ontologies. Given a large enough amount of data, here data organized according to an ontology, and the relations defined in the ontology, the goal is to find patterns that help reveal implicitly given information in an ontology. The approach does not, unlike reasoning and methods of inference, rely on predefined patterns of relations, but it is meant to identify patterns of relations or of other structural information taken from the ontology graph, to calculate probabilities of yet unknown relations between entities. The methods adopted from network theory and social sciences presented in this thesis are expected to reduce the work and time necessary to build an ontology considerably by automating it. They are believed to be applicable to any ontology and can be used in either supervised or unsupervised fashion to automatically identify missing relations, add new information, and thereby enlarge the data set and increase the information explicitly available in an ontology. As seen in the IBM Watson example, different knowledge bases are applied in NLP tasks. An ontology like WordNet contains lexical and semantic knowl- edge on lexemes while general knowledge ontologies like Freebase and DBpedia contain information on entities of the non-linguistic world. In this thesis, examples from both kinds of ontologies are used: WordNet and DBpedia. WordNet is a manually crafted resource that establishes a network of representations of word senses, connected to the word forms used to express these, and connect these senses and forms with lexical and semantic relations in a machine-readable form. As will be shown, although a lot of work has been put into WordNet, it can still be improved. While it already contains many lexical and semantical relations, it is not possible to distinguish between polysemous and homonymous words. As will be explained later, this can be useful for NLP problems regarding word sense disambiguation and hence QA. Using graph- and network-based centrality and path measures, the goal is to train a machine learning model that is able to identify new, missing relations in the ontology and assign this new relation to the whole data set (i.e., WordNet). The approach presented here will be based on a deep analysis of the ontology and the network structure it exposes. Using different measures from graph theory as features and a set of manually created examples, a so-called training set, a supervised machine learning approach will be presented and evaluated that will show what the benefit of interpreting an ontology as a network is compared to other approaches that do not take the network structure into account. DBpedia is an ontology derived from Wikipedia. The structured information given in Wikipedia infoboxes is parsed and relations according to an underlying ontology are extracted. Unlike Wikipedia, it only contains the small amount of structured information (e.g., the infoboxes of each page) and not the large amount of unstructured information (i.e., the free text) of Wikipedia pages. Hence DBpedia is missing a large number of possible relations that are described in Wikipedia. Also compared to Freebase, an ontology used and maintained by Google, DBpedia is quite incomplete. This, and the fact that Wikipedia is expected to be usable to compare possible results to, makes DBpedia a good subject of investigation. The approach used to extend DBpedia presented in this thesis will be based on a thorough analysis of the network structure and the assumed evolution of the network, which will point to the locations of the network where information is most likely to be missing. Since the structure of the ontology and the resulting network is assumed to reveal patterns that are connected to certain relations defined in the ontology, these patterns can be used to identify what kind of relation is missing between two entities of the ontology. This will be done using unsupervised methods from the field of data mining and machine learning

    Learning Interpretable Features of Graphs and Time Series Data

    Get PDF
    Graphs and time series are two of the most ubiquitous representations of data of modern time. Representation learning of real-world graphs and time-series data is a key component for the downstream supervised and unsupervised machine learning tasks such as classification, clustering, and visualization. Because of the inherent high dimensionality, representation learning, i.e., low dimensional vector-based embedding of graphs and time-series data is very challenging. Learning interpretable features incorporates transparency of the feature roles, and facilitates downstream analytics tasks in addition to maximizing the performance of the downstream machine learning models. In this thesis, we leveraged tensor (multidimensional array) decomposition for generating interpretable and low dimensional feature space of graphs and time-series data found from three domains: social networks, neuroscience, and heliophysics. We present the theoretical models and empirical results on node embedding of social networks, biomarker embedding on fMRI-based brain networks, and prediction and visualization of multivariate time-series-based flaring and non-flaring solar events

    Advancing natural language processing in political science

    Get PDF

    A Machine Learning Enhanced Scheme for Intelligent Network Management

    Get PDF
    The versatile networking services bring about huge influence on daily living styles while the amount and diversity of services cause high complexity of network systems. The network scale and complexity grow with the increasing infrastructure apparatuses, networking function, networking slices, and underlying architecture evolution. The conventional way is manual administration to maintain the large and complex platform, which makes effective and insightful management troublesome. A feasible and promising scheme is to extract insightful information from largely produced network data. The goal of this thesis is to use learning-based algorithms inspired by machine learning communities to discover valuable knowledge from substantial network data, which directly promotes intelligent management and maintenance. In the thesis, the management and maintenance focus on two schemes: network anomalies detection and root causes localization; critical traffic resource control and optimization. Firstly, the abundant network data wrap up informative messages but its heterogeneity and perplexity make diagnosis challenging. For unstructured logs, abstract and formatted log templates are extracted to regulate log records. An in-depth analysis framework based on heterogeneous data is proposed in order to detect the occurrence of faults and anomalies. It employs representation learning methods to map unstructured data into numerical features, and fuses the extracted feature for network anomaly and fault detection. The representation learning makes use of word2vec-based embedding technologies for semantic expression. Next, the fault and anomaly detection solely unveils the occurrence of events while failing to figure out the root causes for useful administration so that the fault localization opens a gate to narrow down the source of systematic anomalies. The extracted features are formed as the anomaly degree coupled with an importance ranking method to highlight the locations of anomalies in network systems. Two types of ranking modes are instantiated by PageRank and operation errors for jointly highlighting latent issue of locations. Besides the fault and anomaly detection, network traffic engineering deals with network communication and computation resource to optimize data traffic transferring efficiency. Especially when network traffic are constrained with communication conditions, a pro-active path planning scheme is helpful for efficient traffic controlling actions. Then a learning-based traffic planning algorithm is proposed based on sequence-to-sequence model to discover hidden reasonable paths from abundant traffic history data over the Software Defined Network architecture. Finally, traffic engineering merely based on empirical data is likely to result in stale and sub-optimal solutions, even ending up with worse situations. A resilient mechanism is required to adapt network flows based on context into a dynamic environment. Thus, a reinforcement learning-based scheme is put forward for dynamic data forwarding considering network resource status, which explicitly presents a promising performance improvement. In the end, the proposed anomaly processing framework strengthens the analysis and diagnosis for network system administrators through synthesized fault detection and root cause localization. The learning-based traffic engineering stimulates networking flow management via experienced data and further shows a promising direction of flexible traffic adjustment for ever-changing environments
    • …
    corecore