7 research outputs found

    Data Fusion by Matrix Factorization

    Get PDF
    For most problems in science and engineering we can obtain data sets that describe the observed system from various perspectives and record the behavior of its individual components. Heterogeneous data sets can be collectively mined by data fusion. Fusion can focus on a specific target relation and exploit directly associated data together with contextual data and data about system's constraints. In the paper we describe a data fusion approach with penalized matrix tri-factorization (DFMF) that simultaneously factorizes data matrices to reveal hidden associations. The approach can directly consider any data that can be expressed in a matrix, including those from feature-based representations, ontologies, associations and networks. We demonstrate the utility of DFMF for gene function prediction task with eleven different data sources and for prediction of pharmacologic actions by fusing six data sources. Our data fusion algorithm compares favorably to alternative data integration approaches and achieves higher accuracy than can be obtained from any single data source alone.Comment: Short preprint, 13 pages, 3 Figures, 3 Tables. Full paper in 10.1109/TPAMI.2014.234397

    ENDOMET database – A means to identify novel diagnostic and prognostic tools for endometriosis

    Get PDF
    Endometriosis is a common benign hormone reliant inflammatory gynecological disease that affects fertile aged women and has a considerable economic impact on healthcare systems. Symptoms include intense menstrual pain, persistent pelvic pain, and infertility. It is defined by the existence of endometrium-like tissue developing in ectopic locations outside the uterine cavity and inflammation in the peritoneal cavity. Endometriosis presents with multifactorial etiology, and despite extensive research the etiology is still poorly understood. Diagnostic delay from the onset of the disease to when a conclusive diagnosis is reached is between 7–12 years. There is no known cure, although symptoms can be improved with hormonal medications (which often have multiple side effects and prevent pregnancy), or through surgery which carries its own risk. Current non-invasive tools for diagnosis are not sufficiently dependable, and a definite diagnosis is achieved through laparoscopy or laparotomy. This study was based on two prospective cohorts: The ENDOMET study, including 137 endometriosis patients scheduled for surgery and 62 healthy women, and PROENDO that included 138 endometriosis patients and 33 healthy women. Our long-term goal with the current study was to support the discovery of innovative new tools for efficient diagnosis of endometriosis as well as tools to further understand the etiology and pathogenesis of the disease. We set about achieving this goal by creating a database, EndometDB, based on a relational data model, implemented with PostgreSQL programming language. The database allows e.g., for the exploration of global genome-wide expression patterns in the peritoneum, endometrium, and in endometriosis lesions of endometriosis patients as well as in the peritoneum and endometrium of healthy control women of reproductive age. The data collected in the EndometDB was also used for the development and validation of a symptom and biomarker-based predictive model designed for risk evaluation and early prediction of endometriosis without invasive diagnostic methods. Using the data in the EndometDB we discovered that compared with the eutopic endometrium, the WNT- signaling pathway is one of the molecular pathways that undergo strong changes in endometriosis. We then evaluated the potential role for secreted frizzled-related protein 2 (SFRP-2, a WNT-signaling pathway modulator), in improving endometriosis lesion border detection. The SFRP-2 expression visualizes the lesion better than previously used markers and can be used to better define lesion size and that the surgical excision of the lesions is complete.ENDOMET tietokanta – Keino tunnistaa uusi diagnostinen ja ennustava työkalu endometrioosille Endometrioosi on yleinen hyvänlaatuinen, hormoneista riippuvainen tulehduksellinen lisääntymisikäisten naisten gynekologinen sairaus, joka kuormittaa terveydenhuoltojärjestelmää merkittävästi. Endometrioositaudin oireita ovat mm. voimakas kuukautiskipu, jatkuva lantion alueen kipu ja hedelmättömyys. Sairaus määritellään kohdun limakalvon kaltaisen kudoksen esiintymisenä kohdun ulkopuolella sekä siihen liittyvänä vatsakalvon tulehduksena. Endometrioosin etiologia on monitahoinen, ja laajasta tutkimuksesta huolimatta edelleen huonosti tunnettu. Kesto taudin puhkeamisesta lopullisen diagnoosin saamiseen on usein jopa 7–12 vuotta. Sairauteen ei tunneta parannuskeinoa, mutta oireita voidaan lievittää esimerkiksi hormonaalisilla lääkkeillä (joilla on usein monia sivuvaikutuksia ja jotka estävät raskauden) tai leikkauksella, johon liittyy omat tunnetut riskit. Nykyiset ei-invasiiviset diagnoosityökalut eivät ole riittävän luotettavia sairauden tunnistamiseen, ja varma endometrioosin diagnoosi saavutetaan laparoskopian tai laparotomian avulla. Tämä tutkimus perustui kahteen prospektiiviseen kohorttiin: ENDOMET-tutkimuk-seen, johon osallistui 137 endometrioosipotilasta ja 62 terveellistä naista, sekä PROENDO-tutkimukseen, johon osallistui 138 endometrioosipotilasta ja 33 terveellistä naista. Tässä tutkimuksessa pitkän aikavälin tavoitteemme oli löytää uusia työkalujen endometrioosin diagnosointiin, sekä ymmärtää endometrioosin etiologiaa ja patogeneesiä. Ensimmäisessä vaiheessa loimme EndometDB –tietokannan PostgreSQL-ohjelmointi-kielellä. Tämän osittain avoimeen käyttöön vapautetun tietokannan avulla voidaan tutkia genomin, esimerkiksi kaikkien tunnettujen geenien ilmentymistä peritoneumissa, endo-metriumissa ja endometrioosipotilaiden endometrioosileesioissa EndometDB-tietokantaan kerättyjä tietoja käytettiin oireiden ja biomarkkeripohjaisen ennustemallin kehittämiseen ja validointiin. Malli tuottaa riskinarvioinnin endometrioositaudin varhaiseen ennustamiseen ilman laparoskopiaa. Käyttäen EndometDB-tietokannan tietoja havaitsimme, että endo-metrioositautikudoksessa tapahtui voimakkaita geeni-ilmentymisen muutoksia erityisesti geeneissä, jotka liittyvät WNT-signalointireitin säätelyyn. Keskeisin löydös oli, että SFRP-2 proteiinin ilmentyminen oli huomattavasti koholla endometrioosikudoksessa ja SFRP-2 proteiinin immunohistokemiallinen värjäys erottaa endometrioosin tautikudoksen terveestä kudoksesta aiempia merkkiaineita paremmin. Löydetyllä menetelmällä voidaan siten selvittää tautikudoksen laajuus ja tarvittaessa osoittaa, että leikkauksella on kyetty poistamaan koko sairas kudos

    Biomedical image analysis of brain tumours through the use of artificial intelligence

    Get PDF
    Thesis (MCom)--Stellenbosch University, 2022.ENGLISH SUMMARY: Cancer is one of the leading causes of morbidity and mortality on a global scale. More specifically, cancer of the brain, which is one of the rarest forms. One of the major challenges is that of timely diagnoses. In the ongoing fight against cancer early and accurate detection in combination with effective treatment strategy planning remains one of the best tools for improved patient outcomes and success. Emphasis has been placed on the identification and classification of brain lesions in patients - that is, either the absence or presence of brain tumours. In the case of malignant brain tumours it is critical to classify patients into either high-grade or low-grade brain lesion groups: different gradings of brain tumours have different prognoses, thus different survival rates. The growth in the availability and accessibility of big data due to digitisation has led individuals in the area of bioinformatics in both academia and industry to apply and evaluate artificial intelligence techniques. However, one of the most important challenges, not only in the field of bioinformatics but also in other realms, is transforming the raw data into valuable insights and knowledge. In this research thesis artificial intelligence techniques that can detect vital and fundamental underlying patterns in the data are reviewed. The models may provide significant predictive performance to assist with decision making. Much artificial intelligence has been applied to brain tumour classification and segmentation in the research literature. However, in this study the theoretical background of two more traditional machine learning methods, namely -nearest neighbours and support vector machines, is discussed. In recent years, deep learning (artificial neural networks) has gained prominence due to its ability to handle copious amounts of data. The specialised version of the artificial neural network that is reviewed is convolutional neural networks. The rationale behind this particular technique is that it is applied to visual imagery. In addition to making use of the convolutional neural network architecture, the study reviews the training of neural networks that involves the use of optimisation techniques, considered to be one of the most difficult parts. Utilising only one learning algorithm (optimisation technique) in the architecture of convolutional neural network models for classification tasks may be regarded as insufficient unless there is strong support in the design of the analysis for using a particular technique. Nine state-of-the-art optimisation techniques formed part of a comparative study to determine if there was any improvement in the classification and segmentation of high-grade or low-grade brain tumours. These machine learning and deep learning techniques have proved to be successful in image classification and - more relevant to this research – brain tumours. To supplement the theoretical knowledge, these artificial intelligence methodologies (models) are applied through the exploration of magnetic resonance imaging scans of brain lesions.AFRIKAANSE OPSOMMING: Kanker is wêreldwyd een van die hoofoorsake van morbiditeit en sterftes; veral breinkanker, wat een van die mees seldsame soorte is. Een van die groot uitdagings is om dit betyds te diagnoseer. In die voortgesette stryd teen kanker is vroeë en akkurate opsporing, in kombinasie met doeltreffende beplanning van die behandelingstrategie, een van die beste hulpmiddels vir verbeterde pasiëntuitkomste en sukses. Klem word geplaas op die identifikasie en klassifikasie van breinletsels in pasiënte – dit wil sê, die teenwoordigheid of afwesigheid van breingewasse. In die geval van kwaadaardige breingewasse is dit noodsaaklik om pasiënte in groepe as hetsy hoëgraad- of laegraadbreingewasse te klassifiseer: verskillende graderings van breingewasse het verskillende prognoses, en dus verskillende oorlewingskoerse. Die toename in die beskikbaarheid en toeganklikheid van groot data danksy digitalisering, het daartoe gelei dat individue op die gebied van bio-informatika in die akademie en die bedryf begin het om kunsmatige-intelligensie-tegnieke toe te pas en te evalueer. Een van die belangrikste uitdagings, nie slegs op die gebied van bio-informatika nie, maar ook op ander terreine, is egter die omskakeling van rou data na waardevolle insigte en kennis. Hierdie navorsingstesis hersien die kunsmatige-intelligensie-tegnieke wat lewensbelangrike en grondliggende onderliggende patrone in die data kan opspoor. Die modelle kan beduidende voorspellende prestasie bied om met besluitneming te help. Die navorsingsliteratuur dek heelwat toepassings van kunsmatige intelligensie op breingewasklassifikasie en -segmentasie. In hierdie studie word die teoretiese agtergrond van meer tradisionele masjienleermetodes, naamlik die -naaste-bure-algoritme (-nearest neighbour algorithm) en steunvektormasjiene, bespreek. Diep leer (kunsmatige neurale netwerke) het onlangs op die voorgrond getree weens die vermoë daarvan om groot hoeveelhede data te kan hanteer. Die gespesialiseerde weergawe van die kunsmatige neurale netwerk wat hersien word, is konvolusionele neurale netwerkargitektuur. Die rasionaal vir hierdie spesifieke tegniek is dat dit op visuele beelde toegepas word. Buiten dat dit van konvolusionele neurale netwerkargitektuur gebruik maak, hersien die studie ook die afrigting van neurale netwerke met behulp van optimaliseringstegnieke, wat as een van die moeilikste dele beskou word. Die aanwending van slegs een leeralgoritme (optimaliseringstegniek) in die argitektuur van konvolusionele neurale netwerkmodelle vir klassifikasietake, kan as onvoldoende beskou word, tensy daar sterk steun vir die gebruik van ʼn spesifieke tegniek in die ontwerp van die ontleding is. Nege van die jongste optimaliseringstegnieke was deel van ʼn vergelykende studie om vas te stel of daar enige verbetering in die klassifikasie en segmentasie van hoëgraad- en laegraadbreingewasse was. Hierdie masjienleer- en diep-leertegnieke was suksesvol met beeldklassifikasie en – meer relevant vir hierdie navorsing – breingewasklassifikasie. Ter aanvulling van die teoretiese kennis, word hierdie kunsmatige-intelligensie-metodologieë (-modelle) deur die verkenning van magnetiese resonansbeelding van breingewasse toegepas.Master

    Learning by Fusing Heterogeneous Data

    Get PDF
    It has become increasingly common in science and technology to gather data about systems at different levels of granularity or from different perspectives. This often gives rise to data that are represented in totally different input spaces. A basic premise behind the study of learning from heterogeneous data is that in many such cases, there exists some correspondence among certain input dimensions of different input spaces. In our work we found that a key bottleneck that prevents us from better understanding and truly fusing heterogeneous data at large scales is identifying the kind of knowledge that can be transferred between related data views, entities and tasks. We develop interesting and accurate data fusion methods for predictive modeling, which reduce or entirely eliminate some of the basic feature engineering steps that were needed in the past when inferring prediction models from disparate data. In addition, our work has a wide range of applications of which we focus on those from molecular and systems biology: it can help us predict gene functions, forecast pharmacological actions of small chemicals, prioritize genes for further studies, mine disease associations, detect drug toxicity and regress cancer patient survival data. Another important aspect of our research is the study of latent factor models. We aim to design latent models with factorized parameters that simultaneously tackle multiple types of data heterogeneity, where data diversity spans across heterogeneous input spaces, multiple types of features, and a variety of related prediction tasks. Our algorithms are capable of retaining the relational structure of a data system during model inference, which turns out to be vital for good performance of data fusion in certain applications. Our recent work included the study of network inference from many potentially nonidentical data distributions and its application to cancer genomic data. We also model the epistasis, an important concept from genetics, and propose algorithms to efficiently find the ordering of genes in cellular pathways. A central topic of our Thesis is also the analysis of large data compendia as predictions about certain phenomena, such as associations between diseases and involvement of genes in a certain phenotype, are only possible when dealing with lots of data. Among others, we analyze 30 heterogeneous data sets to assess drug toxicity and over 40 human gene association data collections, the largest number of data sets considered by a collective latent factor model up to date. We also make interesting observations about deciding which data should be considered for fusion and develop a generic approach that can estimate the sensitivities between different data sets

    MicroRNA Expression Profile Based Cancer Classification Using Default ARTMAP

    No full text
    High-throughput messenger RNA (mRNA) expression profiling with microarray has been demonstrated as a more effective method of cancer diagnosis and treatment than the traditional morphology or clinical parameter based methods. Recently, the discovery of a category of small non-coding RNAs, named microRNAs (miRNAs), provides another promising method of cancer classification. miRNAs play a critical role in the tumorigenic process by functioning either as oncogenes or as tumor suppressors. Here, we apply a neural based classifier, Default ARTMAP, to classify broad types of cancers based on their miRNA expression fingerprints. As the miRNA expression data usually have high dimensionalities, particle swarm optimization (PSO) is used for selecting important miRNAs that contribute to the discrimination of different cancer types. Experimental results on the multiple human cancers show that Default ARTMAP performs consistently well on all the data, and the classification accuracy is better than or comparable to that of the other popular classifiers. Also, the selection of informative miRNAs can further improve the performance of classifiers and provide meaningful insights into cancer researchers
    corecore