758 research outputs found

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

    Graph-Based Multi-Label Classification for WiFi Network Traffic Analysis

    Get PDF
    Network traffic analysis, and specifically anomaly and attack detection, call for sophisticated tools relying on a large number of features. Mathematical modeling is extremely difficult, given the ample variety of traffic patterns and the subtle and varied ways that malicious activity can be carried out in a network. We address this problem by exploiting data-driven modeling and computational intelligence techniques. Sequences of packets captured on the communication medium are considered, along with multi-label metadata. Graph-based modeling of the data are introduced, thus resorting to the powerful GRALG approach based on feature information granulation, identification of a representative alphabet, embedding and genetic optimization. The obtained classifier is evaluated both under accuracy and complexity for two different supervised problems and compared with state-of-the-art algorithms. We show that the proposed preprocessing strategy is able to describe higher level relations between data instances in the input domain, thus allowing the algorithms to suitably reconstruct the structure of the input domain itself. Furthermore, the considered Granular Computing approach is able to extract knowledge on multiple semantic levels, thus effectively describing anomalies as subgraphs-based symbols of the whole network graph, in a specific time interval. Interesting performances can thus be achieved in identifying network traffic patterns, in spite of the complexity of the considered traffic classes

    Integrated intelligent systems for industrial automation: the challenges of Industry 4.0, information granulation and understanding agents .

    Get PDF
    The objective of the paper consists in considering the challenges of new automation paradigm Industry 4.0 and reviewing the-state-of-the-art in the field of its enabling information and communication technologies, including Cyberphysical Systems, Cloud Computing, Internet of Things and Big Data. Some ways of multi-dimensional, multi-faceted industrial Big Data representation and analysis are suggested. The fundamentals of Big Data processing with using Granular Computing techniques have been developed. The problem of constructing special cognitive tools to build artificial understanding agents for Integrated Intelligent Enterprises has been faced

    Semantic Keyword-based Search on Heterogeneous Information Systems

    Get PDF
    En los últimos años, con la difusión y el uso de Internet, el volumen de información disponible para los usuarios ha crecido exponencialmente. Además, la posibilidad de acceder a dicha información se ha visto impulsada por los niveles de conectividad de los que disfrutamos actualmente gracias al uso de los móviles de nueva generación y las redes inalámbricas (e.g., 3G, Wi-Fi). Sin embargo, con los métodos de acceso actuales, este exceso de información es tan perjudicial como la falta de la misma, ya que el usuario no tiene tiempo de procesarla en su totalidad. Por otro lado, esta información está detrás de sistemas de información de naturaleza muy heterogénea (e.g., buscadores Web, fuentes de Linked Data, etc.), y el usuario tiene que conocerlos para poder explotar al máximo sus capacidades. Esta diversidad se hace más patente si consideramos cualquier servicio de información como potencial fuente de información para el usuario (e.g., servicios basados en la localización, bases de datos exportadas mediante Servicios Web, etc.). Dado este nivel de heterogeneidad, la integración de estos sistemas se debe hacer externamente, ocultando su complejidad al usuario y dotándole de mecanismos para que pueda expresar sus consultas de forma sencilla. En este sentido, el uso de interfaces basados en palabras clave (keywords) se ha popularizado gracias a su sencillez y a su adopción por parte de los buscadores Web más usados. Sin embargo, esa sencillez que es su mayor virtud también es su mayor defecto, ya que genera problemas de ambigüedad en las consultas. Las consultas expresadas como conjuntos de palabras clave son inherentemente ambiguas al ser una proyección de la verdadera pregunta que el usuario quiere hacer. En la presente tesis, abordamos el problema de integrar sistemas de información heterogéneos bajo una búsqueda guiada por la semántica de las palabras clave; y presentamos QueryGen, un prototipo de nuestra solución. En esta búsqueda semántica abogamos por establecer la consulta que el usuario tenía en mente cuando escribió sus palabras clave, en un lenguaje de consulta formal para evitar posibles ambigüedades. La integración de los sistemas subyacentes se realiza a través de la definición de sus lenguajes de consulta y de sus modelos de ejecución. En particular, nuestro sistema: - Descubre el significado de las palabras clave consultando un conjunto dinámico de ontologías, y desambigua dichas palabras teniendo en cuenta su contexto (el resto de palabras clave), ya que cada una de las palabras tiene influencia sobre el significado del resto de la entrada. Durante este proceso, los significados que son suficientemente similares son fusionados y el sistema propone aquellos más probables dada la entrada del usuario. La información semántica obtenida en el proceso es integrada y utilizada en fases posteriores para obtener la correcta interpretación del conjunto de palabras clave. - Un mismo conjunto de palabras pueden representar diversas consultas aún cuando se conoce su significado individual. Por ello, una vez establecidos los significados de cada palabra y para obtener la consulta exacta del usuario, nuestro sistema encuentra todas las preguntas posibles utilizando las palabras clave. Esta traducción de palabras clave a preguntas se realiza empleando lenguajes de consulta formales para evitar las posibles ambigüedades y expresar la consulta de manera precisa. Nuestro sistema evita la generación de preguntas semánticamente incorrectas o duplicadas con la ayuda de un razonador basado en Lógicas Descriptivas (Description Logics). En este proceso, nuestro sistema es capaz de reaccionar ante entradas insuficientes (e.g., palabras omitidas) mediante la adición de términos virtuales, que representan internamente palabras que el usuario tenía en mente pero omitió cuando escribió su consulta. - Por último, tras la validación por parte del usuario de su consulta, nuestro sistema accede a los sistemas de información registrados que pueden responderla y recupera la respuesta de acuerdo a la semántica de la consulta. Para ello, nuestro sistema implementa una arquitectura modular permite añadir nuevos sistemas al vuelo siempre que se proporcione su especificación (lenguajes de consulta soportados, modelos y formatos de datos, etc.). Por otro lado, el trabajar con sistemas de información heterogéneos, en particular sistemas relacionados con la Computación Móvil, ha permitido que las contribuciones de esta tesis no se limiten al campo de la búsqueda semántica. A este respecto, se ha estudiado el ámbito de la semántica de las consultas basadas en la localización, y especialmente, la influencia de la semántica de las localizaciones en el procesado e interpretación de las mismas. En particular, se proponen dos modelos ontológicos para modelar y capturar la relaciones semánticas de las localizaciones y ampliar la expresividad de las consultas basadas en la localización. Durante el desarrollo de esta tesis, situada entre el ámbito de la Web Semántica y el de la Computación Móvil, se ha abierto una nueva línea de investigación acerca del modelado de conocimiento volátil, y se ha estudiado la posibilidad de utilizar razonadores basados en Lógicas Descriptivas en dispositivos basados en Android. Por último, nuestro trabajo en el ámbito de las búsquedas semánticas a partir de palabras clave ha sido extendido al ámbito de los agentes conversacionales, haciéndoles capaces de explotar distintas fuentes de datos semánticos actualmente disponibles bajo los principios del Linked Data

    Approximate Data Mining Techniques on Clinical Data

    Get PDF
    The past two decades have witnessed an explosion in the number of medical and healthcare datasets available to researchers and healthcare professionals. Data collection efforts are highly required, and this prompts the development of appropriate data mining techniques and tools that can automatically extract relevant information from data. Consequently, they provide insights into various clinical behaviors or processes captured by the data. Since these tools should support decision-making activities of medical experts, all the extracted information must be represented in a human-friendly way, that is, in a concise and easy-to-understand form. To this purpose, here we propose a new framework that collects different new mining techniques and tools proposed. These techniques mainly focus on two aspects: the temporal one and the predictive one. All of these techniques were then applied to clinical data and, in particular, ICU data from MIMIC III database. It showed the flexibility of the framework, which is able to retrieve different outcomes from the overall dataset. The first two techniques rely on the concept of Approximate Temporal Functional Dependencies (ATFDs). ATFDs have been proposed, with their suitable treatment of temporal information, as a methodological tool for mining clinical data. An example of the knowledge derivable through dependencies may be "within 15 days, patients with the same diagnosis and the same therapy usually receive the same daily amount of drug". However, current ATFD models are not analyzing the temporal evolution of the data, such as "For most patients with the same diagnosis, the same drug is prescribed after the same symptom". To this extent, we propose a new kind of ATFD called Approximate Pure Temporally Evolving Functional Dependencies (APEFDs). Another limitation of such kind of dependencies is that they cannot deal with quantitative data when some tolerance can be allowed for numerical values. In particular, this limitation arises in clinical data warehouses, where analysis and mining have to consider one or more measures related to quantitative data (such as lab test results and vital signs), concerning multiple dimensional (alphanumeric) attributes (such as patient, hospital, physician, diagnosis) and some time dimensions (such as the day since hospitalization and the calendar date). According to this scenario, we introduce a new kind of ATFD, named Multi-Approximate Temporal Functional Dependency (MATFD), which considers dependencies between dimensions and quantitative measures from temporal clinical data. These new dependencies may provide new knowledge as "within 15 days, patients with the same diagnosis and the same therapy receive a daily amount of drug within a fixed range". The other techniques are based on pattern mining, which has also been proposed as a methodological tool for mining clinical data. However, many methods proposed so far focus on mining of temporal rules which describe relationships between data sequences or instantaneous events, without considering the presence of more complex temporal patterns into the dataset. These patterns, such as trends of a particular vital sign, are often very relevant for clinicians. Moreover, it is really interesting to discover if some sort of event, such as a drug administration, is capable of changing these trends and how. To this extent, we propose a new kind of temporal patterns, called Trend-Event Patterns (TEPs), that focuses on events and their influence on trends that can be retrieved from some measures, such as vital signs. With TEPs we can express concepts such as "The administration of paracetamol on a patient with an increasing temperature leads to a decreasing trend in temperature after such administration occurs". We also decided to analyze another interesting pattern mining technique that includes prediction. This technique discovers a compact set of patterns that aim to describe the condition (or class) of interest. Our framework relies on a classification model that considers and combines various predictive pattern candidates and selects only those that are important to improve the overall class prediction performance. We show that our classification approach achieves a significant reduction in the number of extracted patterns, compared to the state-of-the-art methods based on minimum predictive pattern mining approach, while preserving the overall classification accuracy of the model. For each technique described above, we developed a tool to retrieve its kind of rule. All the results are obtained by pre-processing and mining clinical data and, as mentioned before, in particular ICU data from MIMIC III database

    Big Data Computing for Geospatial Applications

    Get PDF
    The convergence of big data and geospatial computing has brought forth challenges and opportunities to Geographic Information Science with regard to geospatial data management, processing, analysis, modeling, and visualization. This book highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates opportunities for using big data for geospatial applications. Crucial to the advancements highlighted in this book is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms

    REVEALING BIOTIC DIVERSITY: HOW DO COMPLEX ENVIRONMENTS INFLUENCE HUMAN SCHISTOSOMIASIS IN A HYPERENDEMIC AREA

    Get PDF
    Human schistosomiasis is one of the great neglected tropical diseases (NTDs) of our time with more than 206 million individuals infected and more than 90% of those infected reside in Sub-Saharan Africa (WHO 2017). Chemotherapy based control programs play an essential role in contributing to the elimination of human schistosomiasis; however, there is an increasing consensus that chemotherapy needs to be supplemented by other means if interruption of transmission and elimination are to be achieved. Given this situation, the focus of this dissertation was to better understand transmission dynamics in a hyperendemic setting in western Kenya and to find alternative measures to supplement ongoing mass drug administration (MDA) using indigenous resources that disrupt the development of Schistosoma mansoni (the causative agent of intestinal schistosomiasis in Africa) within its obligatory aquatic snail intermediate host, Biomphalaria. The discipline of disease ecology emphasizes understanding the biotic context in which disease transmission occurs. S. mansoni and Biomphalaria exist within a complex ecological milieu in streams, ponds and lakes in Kenya. The research in this dissertation combined DNA barcodes, phylogenetics, host use patterns and morphology to determine the diversity of trematodes that use Kenyan Biomphalaria as an intermediate host. Along with S. mansoni, we found 21 additional digenetic trematodes that also use Biomphalaria species in Kenya as an intermediate host. The presence of other trematode species in Biomphalaria affects S. mansoni by causing competition for access to snail resources. Furthermore, we used experimental approaches to understand the competitive dynamics among these trematodes and to generate a dominance hierarchy among them. We found that several trematode species are dominant to S. mansoni and long-term agricultural practices have created a situation where an amphistome parasite of cattle relies on a facilitating effect by S. mansoni for its own successful development in the snail host. Coupled with these data are four years of observational survey data to predict how these trematodes influence S. mansoni’s prevalence in Biomphalaria and consequently the likelihood of influencing human infections

    Explainable temporal data mining techniques to support the prediction task in Medicine

    Get PDF
    In the last decades, the increasing amount of data available in all fields raises the necessity to discover new knowledge and explain the hidden information found. On one hand, the rapid increase of interest in, and use of, artificial intelligence (AI) in computer applications has raised a parallel concern about its ability (or lack thereof) to provide understandable, or explainable, results to users. In the biomedical informatics and computer science communities, there is considerable discussion about the `` un-explainable" nature of artificial intelligence, where often algorithms and systems leave users, and even developers, in the dark with respect to how results were obtained. Especially in the biomedical context, the necessity to explain an artificial intelligence system result is legitimate of the importance of patient safety. On the other hand, current database systems enable us to store huge quantities of data. Their analysis through data mining techniques provides the possibility to extract relevant knowledge and useful hidden information. Relationships and patterns within these data could provide new medical knowledge. The analysis of such healthcare/medical data collections could greatly help to observe the health conditions of the population and extract useful information that can be exploited in the assessment of healthcare/medical processes. Particularly, the prediction of medical events is essential for preventing disease, understanding disease mechanisms, and increasing patient quality of care. In this context, an important aspect is to verify whether the database content supports the capability of predicting future events. In this thesis, we start addressing the problem of explainability, discussing some of the most significant challenges need to be addressed with scientific and engineering rigor in a variety of biomedical domains. We analyze the ``temporal component" of explainability, focusing on detailing different perspectives such as: the use of temporal data, the temporal task, the temporal reasoning, and the dynamics of explainability in respect to the user perspective and to knowledge. Starting from this panorama, we focus our attention on two different temporal data mining techniques. The first one, based on trend abstractions, starting from the concept of Trend-Event Pattern and moving through the concept of prediction, we propose a new kind of predictive temporal patterns, namely Predictive Trend-Event Patterns (PTE-Ps). The framework aims to combine complex temporal features to extract a compact and non-redundant predictive set of patterns composed by such temporal features. The second one, based on functional dependencies, we propose a methodology for deriving a new kind of approximate temporal functional dependencies, called Approximate Predictive Functional Dependencies (APFDs), based on a three-window framework. We then discuss the concept of approximation, the data complexity of deriving an APFD, the introduction of two new error measures, and finally the quality of APFDs in terms of coverage and reliability. Exploiting these methodologies, we analyze intensive care unit data from the MIMIC dataset

    Modern roles for an ancient system. Intracellular Complement in the regulation of β cell function

    Get PDF
    Type 2 diabetes (T2D) is characterised by defective insulin exocytosis from the pancreatic β cells, accompanied by insulin resistance. Reduced β cells mass is often seen in T2D individuals, caused by enhanced β cells apoptosis. It is now understood that several components drive β cells dysfunction and apoptosis. These are glucose-and lipo-toxicity, ER stress, pro-inflammatory cytokines, dysfunctions in autophagy, and β cells dedifferentiation. Our studies revealed novel functions of intracellular complement components in β cells, presenting a new link between complement and diabetes development. We found that C3 is upregulated in pancreatic islets during T2D as a factor against β cells dysfunction caused by attenuated autophagy. In paper I, we revealed a high expression of C3 in human pancreatic islets. C3 was found intracellularly in isolated human pancreatic β cells. We verified the binding between C3 and ATG16L1 within the cytosol. C3 was required to maintain autophagy activity in β cells, as evidenced by the massive accumulation of LC3-II puncta, indicating that in the absence of C3 autophagosomes do not fuse with lysosomes. Autophagy protects the β cells from injuries caused by exposure to stressors, such as lipotoxicity. When exposing the C3-knockout INS-1 cells to β cells autophagy inducers (palmitate and IAPP), we observed significantly increased cell death caused by autophagy insufficiency. In paper II, we showed that silencing of CD59 expression in rat β cells significantly suppressed insulin secretion. Moreover, removing the membrane-bound CD59 did not affect insulin secretion, suggesting that intracellular CD59 is involved in this function. We found that the CD59 mutant, lacking the GPI-anchor, was present intracellularly in the β cell line. Non-GPI anchored CD59 interacts with SNARE protein: VAMP2 and rescues insulin secretion. We showed that the GPI-anchor, which is necessary for CD59 complement inhibitory function, is not necessary for its ability to mediate insulin secretion. Two other mutations: W40E and C64Y, rescued insulin secretion. Studies showed that these mutations result in a loss of CD59 complement inhibitory functions. Our data suggest that there are different structural requirements for separate functions of CD59, which are: MAC inhibition and insulin secretion. In papers III and IV, using RNA sequencing, we revealed the presence of two CD59 isoforms lacking the GPI anchoring domain (replaced with the unique C-terminal domains). We named these isoforms IRIS-1 and IRIS-2 (Isoforms Rescuing Insulin Secretion 1 and 2). Both isoforms exist in human and mouse pancreatic islets. They colocalize with insulin granules and interact with SNARE exocytotic machinery, allowing for insulin secretion. Induction of glucotoxicity in primary, healthy human islets led to a significant decrease in IRIS-1 protein-level expression. We found that expression of both IRIS-1 and IRIS-2 is markedly reduced in islets isolated from T2D patients compared to healthy controls, suggesting that hyperglycaemia may be one of the factors resulting in reduced IRIS-1 and IRIS-2 expression in T2D individuals. Next, an electropositive patch was found in the C-terminal region of IRIS-1, suggesting potential interaction with DNA. We found that IRIS-1 localizes in the nuclei of pancreatic β cells. We confirmed that the C-terminal domain of IRIS-1 is localising it to the nucleus. Since robust localisation of IRIS-1 in the nucleus is observed only in some nuclei, it can suggest that IRIS-1 is localising in the nucleus depending on the differentiation state of the cells or in a subset of cells with different functional relevance. We found that IRIS-1 expressing cells displayed significantly higher expression levels of Urocortin 3 and Pdx 1 (markers of mature β cells, which loss marks the beginning of β cells dedifferentiation), suggesting that IRIS-1 may be required for maintaining β cells identity and function
    • …
    corecore