4,654 research outputs found

    Conditional t-SNE: Complementary t-SNE embeddings through factoring out prior information

    Get PDF
    Dimensionality reduction and manifold learning methods such as t-Distributed Stochastic Neighbor Embedding (t-SNE) are routinely used to map high-dimensional data into a 2-dimensional space to visualize and explore the data. However, two dimensions are typically insufficient to capture all structure in the data, the salient structure is often already known, and it is not obvious how to extract the remaining information in a similarly effective manner. To fill this gap, we introduce \emph{conditional t-SNE} (ct-SNE), a generalization of t-SNE that discounts prior information from the embedding in the form of labels. To achieve this, we propose a conditioned version of the t-SNE objective, obtaining a single, integrated, and elegant method. ct-SNE has one extra parameter over t-SNE; we investigate its effects and show how to efficiently optimize the objective. Factoring out prior knowledge allows complementary structure to be captured in the embedding, providing new insights. Qualitative and quantitative empirical results on synthetic and (large) real data show ct-SNE is effective and achieves its goal

    Software tools for conducting bibliometric analysis in science: An up-to-date review

    Get PDF
    Bibliometrics has become an essential tool for assessing and analyzing the output of scientists, cooperation between universities, the effect of state-owned science funding on national research and development performance and educational efficiency, among other applications. Therefore, professionals and scientists need a range of theoretical and practical tools to measure experimental data. This review aims to provide an up-to-date review of the various tools available for conducting bibliometric and scientometric analyses, including the sources of data acquisition, performance analysis and visualization tools. The included tools were divided into three categories: general bibliometric and performance analysis, science mapping analysis, and libraries; a description of all of them is provided. A comparative analysis of the database sources support, pre-processing capabilities, analysis and visualization options were also provided in order to facilitate its understanding. Although there are numerous bibliometric databases to obtain data for bibliometric and scientometric analysis, they have been developed for a different purpose. The number of exportable records is between 500 and 50,000 and the coverage of the different science fields is unequal in each database. Concerning the analyzed tools, Bibliometrix contains the more extensive set of techniques and suitable for practitioners through Biblioshiny. VOSviewer has a fantastic visualization and is capable of loading and exporting information from many sources. SciMAT is the tool with a powerful pre-processing and export capability. In views of the variability of features, the users need to decide the desired analysis output and chose the option that better fits into their aims

    Patent Analytics Based on Feature Vector Space Model: A Case of IoT

    Full text link
    The number of approved patents worldwide increases rapidly each year, which requires new patent analytics to efficiently mine the valuable information attached to these patents. Vector space model (VSM) represents documents as high-dimensional vectors, where each dimension corresponds to a unique term. While originally proposed for information retrieval systems, VSM has also seen wide applications in patent analytics, and used as a fundamental tool to map patent documents to structured data. However, VSM method suffers from several limitations when applied to patent analysis tasks, such as loss of sentence-level semantics and curse-of-dimensionality problems. In order to address the above limitations, we propose a patent analytics based on feature vector space model (FVSM), where the FVSM is constructed by mapping patent documents to feature vectors extracted by convolutional neural networks (CNN). The applications of FVSM for three typical patent analysis tasks, i.e., patents similarity comparison, patent clustering, and patent map generation are discussed. A case study using patents related to Internet of Things (IoT) technology is illustrated to demonstrate the performance and effectiveness of FVSM. The proposed FVSM can be adopted by other patent analysis studies to replace VSM, based on which various big data learning tasks can be performed

    Text Analytics for Android Project

    Get PDF
    Most advanced text analytics and text mining tasks include text classification, text clustering, building ontology, concept/entity extraction, summarization, deriving patterns within the structured data, production of granular taxonomies, sentiment and emotion analysis, document summarization, entity relation modelling, interpretation of the output. Already existing text analytics and text mining cannot develop text material alternatives (perform a multivariant design), perform multiple criteria analysis, automatically select the most effective variant according to different aspects (citation index of papers (Scopus, ScienceDirect, Google Scholar) and authors (Scopus, ScienceDirect, Google Scholar), Top 25 papers, impact factor of journals, supporting phrases, document name and contents, density of keywords), calculate utility degree and market value. However, the Text Analytics for Android Project can perform the aforementioned functions. To the best of the knowledge herein, these functions have not been previously implemented; thus this is the first attempt to do so. The Text Analytics for Android Project is briefly described in this article
    • …
    corecore