2,258 research outputs found

    Unsupervised Ranking of Numerical Observations based on Magnetic Properties and Correlation Coefficient

    Get PDF
    This paper addresses a novel unsupervised algorithm to rank numerical observations which is important in many applications in computer science, especially in information retrieval (IR). The proposed algorithm shows how correlation coefficients between attribute values and the concept of magnetic properties can be explored to rank multi-attribute numerical objects. One of the main reasons of using correlation coefficients between attribute values and the concept of magnetic properties is that they are easy to compute and interpret. Our proposed Unsupervised Ranking using Magnetic properties and Correlation coefficient (URMC) algorithm can use some or all the numerical attributes of objects and can also handle objects with missing attribute values. The proposed algorithm overcomes a major limitation of the state-of-the-art technique while achieving excellent results

    The Challenge of Machine Learning in Space Weather Nowcasting and Forecasting

    Get PDF
    The numerous recent breakthroughs in machine learning (ML) make imperative to carefully ponder how the scientific community can benefit from a technology that, although not necessarily new, is today living its golden age. This Grand Challenge review paper is focused on the present and future role of machine learning in space weather. The purpose is twofold. On one hand, we will discuss previous works that use ML for space weather forecasting, focusing in particular on the few areas that have seen most activity: the forecasting of geomagnetic indices, of relativistic electrons at geosynchronous orbits, of solar flares occurrence, of coronal mass ejection propagation time, and of solar wind speed. On the other hand, this paper serves as a gentle introduction to the field of machine learning tailored to the space weather community and as a pointer to a number of open challenges that we believe the community should undertake in the next decade. The recurring themes throughout the review are the need to shift our forecasting paradigm to a probabilistic approach focused on the reliable assessment of uncertainties, and the combination of physics-based and machine learning approaches, known as gray-box.Comment: under revie

    Unsupervised learning for vascular heterogeneity assessment of glioblastoma based on magnetic resonance imaging: The Hemodynamic Tissue Signature

    Full text link
    [ES] El futuro de la imagen médica está ligado a la inteligencia artificial. El análisis manual de imágenes médicas es hoy en día una tarea ardua, propensa a errores y a menudo inasequible para los humanos, que ha llamado la atención de la comunidad de Aprendizaje Automático (AA). La Imagen por Resonancia Magnética (IRM) nos proporciona una rica variedad de representaciones de la morfología y el comportamiento de lesiones inaccesibles sin una intervención invasiva arriesgada. Sin embargo, explotar la potente pero a menudo latente información contenida en la IRM es una tarea muy complicada, que requiere técnicas de análisis computacional inteligente. Los tumores del sistema nervioso central son una de las enfermedades más críticas estudiadas a través de IRM. Específicamente, el glioblastoma representa un gran desafío, ya que, hasta la fecha, continua siendo un cáncer letal que carece de una terapia satisfactoria. Del conjunto de características que hacen del glioblastoma un tumor tan agresivo, un aspecto particular que ha sido ampliamente estudiado es su heterogeneidad vascular. La fuerte proliferación vascular del glioblastoma, así como su robusta angiogénesis han sido consideradas responsables de la alta letalidad de esta neoplasia. Esta tesis se centra en la investigación y desarrollo del método Hemodynamic Tissue Signature (HTS): un método de AA no supervisado para describir la heterogeneidad vascular de los glioblastomas mediante el análisis de perfusión por IRM. El método HTS se basa en el concepto de hábitat, que se define como una subregión de la lesión con un perfil de IRM que describe un comportamiento fisiológico concreto. El método HTS delinea cuatro hábitats en el glioblastoma: el hábitat HAT, como la región más perfundida del tumor con captación de contraste; el hábitat LAT, como la región del tumor con un perfil angiogénico más bajo; el hábitat IPE, como la región adyacente al tumor con índices de perfusión elevados; y el hábitat VPE, como el edema restante de la lesión con el perfil de perfusión más bajo. La investigación y desarrollo de este método ha originado una serie de contribuciones enmarcadas en esta tesis. Primero, para verificar la fiabilidad de los métodos de AA no supervisados en la extracción de patrones de IRM, se realizó una comparativa para la tarea de segmentación de gliomas de grado alto. Segundo, se propuso un algoritmo de AA no supervisado dentro de la familia de los Spatially Varying Finite Mixture Models. El algoritmo propone una densidad a priori basada en un Markov Random Field combinado con la función probabilística Non-Local Means, para codificar la idea de que píxeles vecinos tienden a pertenecer al mismo objeto. Tercero, se presenta el método HTS para describir la heterogeneidad vascular del glioblastoma. El método se ha aplicado a casos reales en una cohorte local de un solo centro y en una cohorte internacional de más de 180 pacientes de 7 centros europeos. Se llevó a cabo una evaluación exhaustiva del método para medir el potencial pronóstico de los hábitats HTS. Finalmente, la tecnología desarrollada en la tesis se ha integrado en la plataforma online ONCOhabitats (https://www.oncohabitats.upv.es). La plataforma ofrece dos servicios: 1) segmentación de tejidos de glioblastoma, y 2) evaluación de la heterogeneidad vascular del tumor mediante el método HTS. Los resultados de esta tesis han sido publicados en diez contribuciones científicas, incluyendo revistas y conferencias de alto impacto en las áreas de Informática Médica, Estadística y Probabilidad, Radiología y Medicina Nuclear y Aprendizaje Automático. También se emitió una patente industrial registrada en España, Europa y EEUU. Finalmente, las ideas originales concebidas en esta tesis dieron lugar a la creación de ONCOANALYTICS CDX, una empresa enmarcada en el modelo de negocio de los companion diagnostics de compuestos farmacéuticos.[EN] The future of medical imaging is linked to Artificial Intelligence (AI). The manual analysis of medical images is nowadays an arduous, error-prone and often unaffordable task for humans, which has caught the attention of the Machine Learning (ML) community. Magnetic Resonance Imaging (MRI) provides us with a wide variety of rich representations of the morphology and behavior of lesions completely inaccessible without a risky invasive intervention. Nevertheless, harnessing the powerful but often latent information contained in MRI acquisitions is a very complicated task, which requires computational intelligent analysis techniques. Central nervous system tumors are one of the most critical diseases studied through MRI. Specifically, glioblastoma represents a major challenge, as it remains a lethal cancer that, to date, lacks a satisfactory therapy. Of the entire set of characteristics that make glioblastoma so aggressive, a particular aspect that has been widely studied is its vascular heterogeneity. The strong vascular proliferation of glioblastomas, as well as their robust angiogenesis and extensive microvasculature heterogeneity have been claimed responsible for the high lethality of the neoplasm. This thesis focuses on the research and development of the Hemodynamic Tissue Signature (HTS) method: an unsupervised ML approach to describe the vascular heterogeneity of glioblastomas by means of perfusion MRI analysis. The HTS builds on the concept of habitats. A habitat is defined as a sub-region of the lesion with a particular MRI profile describing a specific physiological behavior. The HTS method delineates four habitats within the glioblastoma: the HAT habitat, as the most perfused region of the enhancing tumor; the LAT habitat, as the region of the enhancing tumor with a lower angiogenic profile; the potentially IPE habitat, as the non-enhancing region adjacent to the tumor with elevated perfusion indexes; and the VPE habitat, as the remaining edema of the lesion with the lowest perfusion profile. The research and development of the HTS method has generated a number of contributions to this thesis. First, in order to verify that unsupervised learning methods are reliable to extract MRI patterns to describe the heterogeneity of a lesion, a comparison among several unsupervised learning methods was conducted for the task of high grade glioma segmentation. Second, a Bayesian unsupervised learning algorithm from the family of Spatially Varying Finite Mixture Models is proposed. The algorithm integrates a Markov Random Field prior density weighted by the probabilistic Non-Local Means function, to codify the idea that neighboring pixels tend to belong to the same semantic object. Third, the HTS method to describe the vascular heterogeneity of glioblastomas is presented. The HTS method has been applied to real cases, both in a local single-center cohort of patients, and in an international retrospective cohort of more than 180 patients from 7 European centers. A comprehensive evaluation of the method was conducted to measure the prognostic potential of the HTS habitats. Finally, the technology developed in this thesis has been integrated into an online open-access platform for its academic use. The ONCOhabitats platform is hosted at https://www.oncohabitats.upv.es, and provides two main services: 1) glioblastoma tissue segmentation, and 2) vascular heterogeneity assessment of glioblastomas by means of the HTS method. The results of this thesis have been published in ten scientific contributions, including top-ranked journals and conferences in the areas of Medical Informatics, Statistics and Probability, Radiology & Nuclear Medicine and Machine Learning. An industrial patent registered in Spain, Europe and EEUU was also issued. Finally, the original ideas conceived in this thesis led to the foundation of ONCOANALYTICS CDX, a company framed into the business model of companion diagnostics for pharmaceutical compounds.[CA] El futur de la imatge mèdica està lligat a la intel·ligència artificial. L'anàlisi manual d'imatges mèdiques és hui dia una tasca àrdua, propensa a errors i sovint inassequible per als humans, que ha cridat l'atenció de la comunitat d'Aprenentatge Automàtic (AA). La Imatge per Ressonància Magnètica (IRM) ens proporciona una àmplia varietat de representacions de la morfologia i el comportament de lesions inaccessibles sense una intervenció invasiva arriscada. Tanmateix, explotar la potent però sovint latent informació continguda a les adquisicions de IRM esdevé una tasca molt complicada, que requereix tècniques d'anàlisi computacional intel·ligent. Els tumors del sistema nerviós central són una de les malalties més crítiques estudiades a través de IRM. Específicament, el glioblastoma representa un gran repte, ja que, fins hui, continua siguent un càncer letal que manca d'una teràpia satisfactòria. Del conjunt de característiques que fan del glioblastoma un tumor tan agressiu, un aspecte particular que ha sigut àmpliament estudiat és la seua heterogeneïtat vascular. La forta proliferació vascular dels glioblastomes, així com la seua robusta angiogènesi han sigut considerades responsables de l'alta letalitat d'aquesta neoplàsia. Aquesta tesi es centra en la recerca i desenvolupament del mètode Hemodynamic Tissue Signature (HTS): un mètode d'AA no supervisat per descriure l'heterogeneïtat vascular dels glioblastomas mitjançant l'anàlisi de perfusió per IRM. El mètode HTS es basa en el concepte d'hàbitat, que es defineix com una subregió de la lesió amb un perfil particular d'IRM, que descriu un comportament fisiològic concret. El mètode HTS delinea quatre hàbitats dins del glioblastoma: l'hàbitat HAT, com la regió més perfosa del tumor amb captació de contrast; l'hàbitat LAT, com la regió del tumor amb un perfil angiogènic més baix; l'hàbitat IPE, com la regió adjacent al tumor amb índexs de perfusió elevats, i l'hàbitat VPE, com l'edema restant de la lesió amb el perfil de perfusió més baix. La recerca i desenvolupament del mètode HTS ha originat una sèrie de contribucions emmarcades a aquesta tesi. Primer, per verificar la fiabilitat dels mètodes d'AA no supervisats en l'extracció de patrons d'IRM, es va realitzar una comparativa en la tasca de segmentació de gliomes de grau alt. Segon, s'ha proposat un algorisme d'AA no supervisat dintre de la família dels Spatially Varying Finite Mixture Models. L'algorisme proposa un densitat a priori basada en un Markov Random Field combinat amb la funció probabilística Non-Local Means, per a codificar la idea que els píxels veïns tendeixen a pertànyer al mateix objecte semàntic. Tercer, es presenta el mètode HTS per descriure l'heterogeneïtat vascular dels glioblastomas. El mètode HTS s'ha aplicat a casos reals en una cohort local d'un sol centre i en una cohort internacional de més de 180 pacients de 7 centres europeus. Es va dur a terme una avaluació exhaustiva del mètode per mesurar el potencial pronòstic dels hàbitats HTS. Finalment, la tecnologia desenvolupada en aquesta tesi s'ha integrat en una plataforma online ONCOhabitats (https://www.oncohabitats.upv.es). La plataforma ofereix dos serveis: 1) segmentació dels teixits del glioblastoma, i 2) avaluació de l'heterogeneïtat vascular dels glioblastomes mitjançant el mètode HTS. Els resultats d'aquesta tesi han sigut publicats en deu contribucions científiques, incloent revistes i conferències de primer nivell a les àrees d'Informàtica Mèdica, Estadística i Probabilitat, Radiologia i Medicina Nuclear i Aprenentatge Automàtic. També es va emetre una patent industrial registrada a Espanya, Europa i els EEUU. Finalment, les idees originals concebudes en aquesta tesi van donar lloc a la creació d'ONCOANALYTICS CDX, una empresa emmarcada en el model de negoci dels companion diagnostics de compostos farmacèutics.En este sentido quiero agradecer a las diferentes instituciones y estructuras de financiación de investigación que han contribuido al desarrollo de esta tesis. En especial quiero agradecer a la Universitat Politècnica de València, donde he desarrollado toda mi carrera acadèmica y científica, así como al Ministerio de Ciencia e Innovación, al Ministerio de Economía y Competitividad, a la Comisión Europea, al EIT Health Programme y a la fundación Caixa ImpulseJuan Albarracín, J. (2020). Unsupervised learning for vascular heterogeneity assessment of glioblastoma based on magnetic resonance imaging: The Hemodynamic Tissue Signature [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/149560TESI

    Mining sequences in distributed sensors data for energy production.

    Get PDF
    Brief Overview of the Problem: The Environmental Protection Agency (EPA), a government funded agency, provides both legislative and judicial powers for emissions monitoring in the United States. The agency crafts laws based on self-made regulations to enforce companies to operate within the limits of the law resulting in environmentally safe operation. Specifically, power companies operate electric generating facilities under guidelines drawn-up and enforced by the EPA. Acid rain and other harmful factors require that electric generating facilities report hourly emissions recorded via a Supervisory Control and Data Acquisition (SCADA) system. SCADA is a control and reporting system that is present in all power plants consisting of sensors and control mechanisms that monitor all equipment within the plants. The data recorded by a SCADA system is collected by the EPA and allows them to enforce proper plant operation relating to emissions. This data includes a lot of generating unit and power plant specific details, including hourly generation. This hourly generation (termed grossunitload by the EPA) is the actual hourly average output of the generator on a per unit basis. The questions to be answered are do any of these units operate in tandem and do any of the units start, stop, or change operation as a result of another\u27s change in generation? These types of questions will be answered for the years of April 2002 through April 2003 for facilities that operate pipeline natural-gas-fired generating units. Purpose of Research The research conducted has dual uses if fruitful. First, the use of a local modeling between generating units would be highly profitable among energy traders. Betting that a plant will operate a unit based on another\u27s current characteristics would be sensationally profitable to energy traders. This profitability is variable due to fuel type. For instance, if the price of coal is extremely high due to shortages, the value of knowing a semioperating characteristic of two generating units is highly valuable. Second, this known characteristic can also be used in regulation and operational modeling. The second use is of great importance to government agencies. If regulatory committees can be aware of past (or current) similarities between power producers, they may be able to avoid a power struggle in a region caused by greedy traders or companies. Not considering profitable motives, the Department of Energy may use something similar to generate a model of power grid generation availability based on previous data for reliability purposes. Type of Problem: The problem tackled within this Master\u27s thesis is of multiple time series pattern recognition. This field is expansive and well studied, therefore the research performed will benefit from previously known techniques. The author has chosen to experiment with conventional techniques such as correlation, principal component analysis, and kmeans clustering for feature and eventually pattern extraction. For the primary analysis performed, the author chose to use a conventional sequence discovery algorithm. The sequence discovery algorithm has no prior knowledge of space limitations, therefore it searches over the entire space resulting in an expense but complete process. Prior to sequence discovery the author applies a uniform coding schema to the raw data, which is an adaption of a coding schema presented by Keogh. This coding and discovery process is deemed USD, or Uniform Sequence Discovery. The data is highly dimensional along with being extremely dynamic and sporadic with regards to magnitude. The energy market that demands power generation is profit and somewhat reliability driven. The obvious factors are more reliability based, for instance to keep system frequency at 60Hz, units may operate in an idle state resulting in a constant or very low value for a period of time (idle time). Also to avoid large frequency swings on the power grid, companies are require

    Non-Negative Discriminative Data Analytics

    Get PDF
    Due to advancements in data acquisition techniques, collecting datasets representing samples from multi-views has become more common recently (Jia et al. 2019). For instance, in genomics, a lymphoma patient’s dataset may include data on gene expression, single nucleotide polymorphism (SNP), and array Comparative genomic hybridization (aCGH) measurements. Learning from multiple views about the same objective, in general, obtains a better understanding of the hidden patterns of the data compared to learning from a single view data. Most of the existing multi-view learning techniques such as canonical correlation analysis (Hotelling et al. 1936) and multi-view support vector machine (Farquhar et al. 2006), multiple kernel learning (Zhang et al. 2016) are focused on extracting the shared information among multiple datasets. However, in some real-world applications, it’s appealing to extract the discriminative knowledge of multiple datasets, namely discriminative data analytics. For example, consider the one dataset as gene-expression measurements of cancer patients, and the other dataset as the gene-expression levels of healthy volunteers and the goal is to cluster cancer patients according to the molecular sub-types. Performing a single view analysis such as principal component analysis (PCA) on any of the dataset yields information related to the common knowledge between the two datasets (Garte et al. 1996). Addressing such challenge, contrastive PCA (Abid et al. 2017) and discriminative (d) PCA in (Jia et al. 2019) are proposed in to extract one dataset-specific information often missed by PCA. Inspired by dPCA, we propose a novel discriminative multi-view learning algorithm, namely Non-negative Discriminative Analysis (DNA), to extract the unique information of one dataset (a.k.a. view) with respect to the other dataset. This boils down to solving a non-negative matrix factorization problem. Furthermore, we apply the proposed DNA framework in various real-world down-stream machine learning applications such as feature selections, dimensionality reduction, classification, and clustering

    Initial Condition Estimation in Flux Tube Simulations using Machine Learning

    Get PDF
    Space weather has become an essential field of study as solar flares, coronal mass ejections, and other phenomena can severely impact Earth's life as we know it. The solar wind is threaded by magnetic flux tubes that extend from the solar atmosphere to distances beyond the solar system boundary. As those flux tubes cross the Earth's orbit, it is essential to understand and predict solar phenomena' effects at 1 AU, but the physical parameters linked to the solar wind formation and acceleration processes are not directly observable. Some existing models, such as MULTI-VP, try to fill this gap by predicting the background solar wind's dynamical and thermal properties from chosen magnetograms and using a coronal field reconstruction method. However, these models take a long time, and their performance increases with good initial guesses regarding the simulation's initial conditions. To address this problem, we propose using varied machine learning techniques to obtain good initial guesses that can accelerate MULTI-VP's computational time
    corecore