1,990 research outputs found

    Patent Analytics Based on Feature Vector Space Model: A Case of IoT

    Full text link
    The number of approved patents worldwide increases rapidly each year, which requires new patent analytics to efficiently mine the valuable information attached to these patents. Vector space model (VSM) represents documents as high-dimensional vectors, where each dimension corresponds to a unique term. While originally proposed for information retrieval systems, VSM has also seen wide applications in patent analytics, and used as a fundamental tool to map patent documents to structured data. However, VSM method suffers from several limitations when applied to patent analysis tasks, such as loss of sentence-level semantics and curse-of-dimensionality problems. In order to address the above limitations, we propose a patent analytics based on feature vector space model (FVSM), where the FVSM is constructed by mapping patent documents to feature vectors extracted by convolutional neural networks (CNN). The applications of FVSM for three typical patent analysis tasks, i.e., patents similarity comparison, patent clustering, and patent map generation are discussed. A case study using patents related to Internet of Things (IoT) technology is illustrated to demonstrate the performance and effectiveness of FVSM. The proposed FVSM can be adopted by other patent analysis studies to replace VSM, based on which various big data learning tasks can be performed

    Molecular Approaches for Analyzing Organismal and Environmental Interactions

    Get PDF
    Our planet is undergoing rapid change due to the expanding human population and climate change, which leads to extreme weather events and habitat loss. It is more important than ever to develop methods which can monitor the impact we are having on the biodiversity of our planet. To influence policy changes in wildlife and resource management practices we need to provide measurable evidence of how we are affecting animal health and fitness and the ecosystems needed for their survival. We also need to pool our resources and work in interdisciplinary teams to find common threads which can help preserve biodiversity and vital habitats. This dissertation showcases how improved molecular biology assays and data analysis approaches can help monitor the fitness of animal populations within changing ecosystems. Chapter 1 details the development of a universal telomere assay for vertebrates. Recent work has shown the utility of telomere assays in tracking animal health. Telomere lengths can predict extinction events in animal populations, life span, and fitness consequences of anthropogenic activity. Telomere length assays are an improvement over other methods of measuring animal stress, such as cortisol levels, since they are stable during capture and sampling of animals. This dissertation provides a telomere length assay which can be used for any vertebrate. The assay was developed using a quantitative polymerase chain reaction platform which requires low DNA input and is rapid. This dissertation also demonstrates how this assay improves on current telomere assays developed for mice and can be used in a vertebrate not previously assayed for telomere lengths, the American kestrel. This work has the potential to propel research in vertebrate systems forward as it alleviates the need to develop new reference primers for each species of interest. This improved assay has shown promise in studies in mouse cell line studies, American kestrels, golden eagles, five species of passerine birds, osprey, northern goshawks and bighorn sheep. Chapter 2 presents a machine learning analysis, using a topic model approach, to integrate big data from remote sensing, leaf area index surveys, metabolomics and metagenomics to analyze community composition in cross-disciplinary datasets. Topic models were applied to understand community organization across a range of distinct, but connected, biological scales within the sagebrush steppe. The sagebrush steppe is home to several threatened species, including the pygmy rabbit (Brachylagus idahoensis) and sage-grouse (Centrocercus urophasianus). It covers vast swaths of the western United States and is subject to habitat fragmentation and land use conversion for both farming and rangeland use. It is also threatened by increases in fire events which can dramatically alter the landscape. Restoration efforts have been hampered by a lack of resources and often by inadequate collaboration between stakeholders and scientists. This work brought together scientists from four disciplines: remote sensing, field ecology, metabolomics and metagenomics, to provide a framework for how studies can be designed and analyzed that integrate patterns of biodiversity from multiple scales, from the molecular to the landscape scale. A topic model approach was used which groups features (chemicals, bacterial and plant taxa, and light spectrum) into “communities” which in turn can be analyzed for their presence within individual samples and time points. Within the landscape, I found communities which contain encroaching plant species, such as juniper (Juniperus spp.) and cheatgrass (Bromus tectorum). Within plants, I found chemicals which are known toxins to herbivores. Within herbivores, I identified differences in bacterial taxonomical communities associated with changes in diet. This work will help to inform restoration efforts and provide a road map for designing interdisciplinary studies

    Computational approaches for single-cell omics and multi-omics data

    Get PDF
    Single-cell omics and multi-omics technologies have enabled the study of cellular heterogeneity with unprecedented resolution and the discovery of new cell types. The core of identifying heterogeneous cell types, both existing and novel ones, relies on efficient computational approaches, including especially cluster analysis. Additionally, gene regulatory network analysis and various integrative approaches are needed to combine data across studies and different multi-omics layers. This thesis comprehensively compared Bayesian clustering models for single-cell RNAsequencing (scRNA-seq) data and selected integrative approaches were used to study the cell-type specific gene regulation of uterus. Additionally, single-cell multi-omics data integration approaches for cell heterogeneity analysis were investigated. Article I investigated analytical approaches for cluster analysis in scRNA-seq data, particularly, latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP) models. The comparison of LDA and HDP together with the existing state-of-art methods revealed that topic modeling-based models can be useful in scRNA-seq cluster analysis. Evaluation of the cluster qualities for LDA and HDP with intrinsic and extrinsic cluster quality metrics indicated that the clustering performance of these methods is dataset dependent. Article II and Article III focused on cell-type specific integrative analysis of uterine or decidual stromal (dS) and natural killer (dNK) cells that are important for successful pregnancy. Article II integrated the existing preeclampsia RNA-seq studies of the decidua together with recent scRNA-seq datasets in order to investigate cell-type-specific contributions of early onset preeclampsia (EOP) and late onset preeclampsia (LOP). It was discovered that the dS marker genes were enriched for LOP downregulated genes and the dNK marker genes were enriched for upregulated EOP genes. Article III presented a gene regulatory network analysis for the subpopulations of dS and dNK cells. This study identified novel subpopulation specific transcription factors that promote decidualization of stromal cells and dNK mediated maternal immunotolerance. In Article IV, different strategies and methodological frameworks for data integration in single-cell multi-omics data analysis were reviewed in detail. Data integration methods were grouped into early, late and intermediate data integration strategies. The specific stage and order of data integration can have substantial effect on the results of the integrative analysis. The central details of the approaches were presented, and potential future directions were discussed.  Laskennallisia menetelmiä yksisolusekvensointi- ja multiomiikkatulosten analyyseihin Yksisolusekvensointitekniikat mahdollistavat solujen heterogeenisyyden tutkimuksen ennennäkemättömällä resoluutiolla ja uusien solutyyppien löytämisen. Solutyyppien tunnistamisessa keskeisessä roolissa on ryhmittely eli klusterointianalyysi. Myös geenien säätelyverkostojen sekä eri molekyylidatatasojen yhdistäminen on keskeistä analyysissä. Väitöskirjassa verrataan bayesilaisia klusterointimenetelmiä ja yhdistetään eri menetelmillä kerättyjä tietoja kohdun solutyyppispesifisessä geeninsäätelyanalyysissä. Lisäksi yksisolutiedon integraatiomenetelmiä selvitetään kattavasti. Julkaisu I keskittyy analyyttisten menetelmien, erityisesti latenttiin Dirichletallokaatioon (LDA) ja hierarkkiseen Dirichlet-prosessiin (HDP) perustuvien mallien tutkimiseen yksisoludatan klusterianalyysissä. Kattava vertailu näiden kahden mallin sekä olemassa olevien menetelmien kanssa paljasti, että aihemallinnuspohjaiset menetelmät voivat olla hyödyllisiä yksisoludatan klusterianalyysissä. Menetelmien suorituskyky riippui myös kunkin analysoitavan datasetin ominaisuuksista. Julkaisuissa II ja III keskitytään naisen lisääntymisterveydelle tärkeiden kohdun stroomasolujen ja NK-immuunisolujen solutyyppispesifiseen analyysiin. Artikkelissa II yhdistettiin olemassa olevia tuloksia pre-eklampsiasta viimeisimpiin yksisolusekvensointituloksiin ja löydettiin varhain alkavan pre-eklampsian (EOP) ja myöhään alkavan pre-eklampsian (LOP) solutyyppispesifisiä vaikutuksia. Havaittiin, että erilaistuneen strooman markkerigeenien ilmentyminen vähentyi LOP:ssa ja NK-markkerigeenien ilmentyminen lisääntyi EOP:ssa. Julkaisu III analysoi strooman ja NK-solujen alapopulaatiospesifisiä geeninsäätelyverkostoja ja niiden transkriptiofaktoreita. Tutkimus tunnisti uusia alapopulaatiospesifisiä säätelijöitä, jotka edistävät strooman erilaistumista ja NK-soluvälitteistä immunotoleranssia Julkaisu IV tarkastelee yksityiskohtaisesti strategioita ja menetelmiä erilaisten yksisoludatatasojen (multi-omiikka) integroimiseksi. Integrointimenetelmät ryhmiteltiin varhaisen, myöhäisen ja välivaiheen strategioihin ja kunkin lähestymistavan menetelmiä esiteltiin tarkemmin. Lisäksi keskusteltiin mahdollisista tulevaisuuden suunnista

    Context-awareness for mobile sensing: a survey and future directions

    Get PDF
    The evolution of smartphones together with increasing computational power have empowered developers to create innovative context-aware applications for recognizing user related social and cognitive activities in any situation and at any location. The existence and awareness of the context provides the capability of being conscious of physical environments or situations around mobile device users. This allows network services to respond proactively and intelligently based on such awareness. The key idea behind context-aware applications is to encourage users to collect, analyze and share local sensory knowledge in the purpose for a large scale community use by creating a smart network. The desired network is capable of making autonomous logical decisions to actuate environmental objects, and also assist individuals. However, many open challenges remain, which are mostly arisen due to the middleware services provided in mobile devices have limited resources in terms of power, memory and bandwidth. Thus, it becomes critically important to study how the drawbacks can be elaborated and resolved, and at the same time better understand the opportunities for the research community to contribute to the context-awareness. To this end, this paper surveys the literature over the period of 1991-2014 from the emerging concepts to applications of context-awareness in mobile platforms by providing up-to-date research and future research directions. Moreover, it points out the challenges faced in this regard and enlighten them by proposing possible solutions

    Identification of Emerging Scientific Topics in Bibliometric Databases

    Get PDF
    Bibliometrie, Maschinelles Lernen, LDA, Clustering, Neue Themen Abstract = Frühzeitiges Erkennen von aufkommenden Themengebieten in der Wissenschaft unterstützt sowohl Entscheidungen auf individueller als auch öffentlicher Ebene. Viele bestehende Verfahren beschränken sich auf eine retrospektive (Zitations-)Analyse der Publikationsdaten. Das Ziel der vorliegenden Arbeit war deshalb die Entwicklung eines Verfahrens, das zeitnah und neutral sogenannte "emerging topic candidates" aus einem Set von wissenschaftlichen Publikationen auswählt

    Identification of Emerging Scientific Topics in Bibliometric Databases

    Get PDF
    Bibliometrie, Maschinelles Lernen, LDA, Clustering, Neue Themen Abstract = Frühzeitiges Erkennen von aufkommenden Themengebieten in der Wissenschaft unterstützt sowohl Entscheidungen auf individueller als auch öffentlicher Ebene. Viele bestehende Verfahren beschränken sich auf eine retrospektive (Zitations-)Analyse der Publikationsdaten. Das Ziel der vorliegenden Arbeit war deshalb die Entwicklung eines Verfahrens, das zeitnah und neutral sogenannte "emerging topic candidates" aus einem Set von wissenschaftlichen Publikationen auswählt

    Topic Modeling and Classification of Cyberspace Papers Using Text Mining

    Get PDF
    The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspace is an umbrella term that covers all issues occurring through the interaction of information systems and humans over these networks. Deep evaluation of the scientific articles on the cyberspace domain provides concentrated knowledge and insights about major trends of the field. Text mining tools and techniques enable the practitioners and scholars to discover significant trends in a large set of internationally validated papers. This study utilizes text mining algorithms to extract, validate, and analyze 1860 scientific articles on the cyberspace domain and provides insight over the future scientific directions or cyberspace studies
    corecore