10,917 research outputs found

    Mapping Science Based on Research Content Similarity

    Get PDF
    Maps of science representing the structure of science help us understand science and technology development. Thus, research in scientometrics has developed techniques for analyzing research activities and for measuring their relationships; however, navigating the recent scientific landscape is still challenging, since conventional inter-citation and co-citation analysis has difficulty in applying to recently published articles and ongoing projects. Therefore, to characterize what is being attempted in the current scientific landscape, this article proposes a content-based method of locating research articles/projects in a multi-dimensional space using word/paragraph embedding. Specifically, for addressing an unclustered problem, we introduced cluster vectors based on the information entropies of technical concepts. The experimental results showed that our method formed a clustered map from approx. 300 k IEEE articles and NSF projects from 2012 to 2016. Finally, we confirmed that formation of specific research areas can be captured as changes in the network structure

    Content-based Map of Science using Cross-lingual Document Embedding - A Comparison of US-Japan Funded Projects

    Get PDF
    Maps depicting the structure of science help us understand the development of science and technology. However, as it is difficult to apply inter-citation and co-citation analysis to recently published papers and ongoing projects that have few or no references, our previous work developed a content-based map by locating research papers and funding projects using word/document embedding. Because difficulties arise when comparing the content-based map in different languages, this paper improves our content-based map by developing a method for generating multi-dimensional vectors in the same space from cross-lingual (English and Japanese) documents. Using 1,000 IEEE papers, we confirmed a similarity of 0.76 for matching bilingual contents. Finally, we constructed a map from 34,000 projects of the National Science Foundation and Japan Society for the Promotion of Science from 2012 to 2015, and we indicate the findings obtained from a comparison of the US-Japan funded projects

    Doing pedagogical research in engineering

    Get PDF
    This is a book

    Technology classification with latent semantic indexing

    Get PDF
    Many national and international governments establish organizations for applied science research funding. For this, several organizations have defined procedures for identifying relevant projects that based on prioritized technologies. Even for applied science research projects, which combine several technologies it is difficult to identify all corresponding technologies of all research-funding organizations. In this paper, we present an approach to support researchers and to support research-funding planners by classifying applied science research projects according to corresponding technologies of research-funding organizations. In contrast to related work, this problem is solved by considering results from literature concerning the application based technological relationships and by creating a new approach that is based on latent semantic indexing (LSI) as semantic text classification algorithm. Technologies that occur together in the process of creating an application are grouped in classes, semantic textual patterns are identified as representative for each class, and projects are assigned to one of these classes. This enables the assignment of each project to all technologies semantically grouped by use of LSI. This approach is evaluated using the example of defense and security based technological research. This is because the growing importance of this application field leads to an increasing number of research projects and to the appearance of many new technologies

    Topic Modelling of Everyday Sexism Project Entries

    Full text link
    The Everyday Sexism Project documents everyday examples of sexism reported by volunteer contributors from all around the world. It collected 100,000 entries in 13+ languages within the first 3 years of its existence. The content of reports in various languages submitted to Everyday Sexism is a valuable source of crowdsourced information with great potential for feminist and gender studies. In this paper, we take a computational approach to analyze the content of reports. We use topic-modelling techniques to extract emerging topics and concepts from the reports, and to map the semantic relations between those topics. The resulting picture closely resembles and adds to that arrived at through qualitative analysis, showing that this form of topic modeling could be useful for sifting through datasets that had not previously been subject to any analysis. More precisely, we come up with a map of topics for two different resolutions of our topic model and discuss the connection between the identified topics. In the low resolution picture, for instance, we found Public space/Street, Online, Work related/Office, Transport, School, Media harassment, and Domestic abuse. Among these, the strongest connection is between Public space/Street harassment and Domestic abuse and sexism in personal relationships.The strength of the relationships between topics illustrates the fluid and ubiquitous nature of sexism, with no single experience being unrelated to another.Comment: preprint, under revie

    Seismic Risk Analysis of Revenue Losses, Gross Regional Product and transportation systems.

    Get PDF
    Natural threats like earthquakes, hurricanes or tsunamis have shown seri- ous impacts on communities. In the past, major earthquakes in the United States like Loma Prieta 1989, Northridge 1994, or recent events in Italy like L’Aquila 2009 or Emilia 2012 earthquake emphasized the importance of pre- paredness and awareness to reduce social impacts. Earthquakes impacted businesses and dramatically reduced the gross regional product. Seismic Hazard is traditionally assessed using Probabilistic Seismic Hazard Anal- ysis (PSHA). PSHA well represents the hazard at a specific location, but it’s unsatisfactory for spatially distributed systems. Scenario earthquakes overcome the problem representing the actual distribution of shaking over a spatially distributed system. The performance of distributed productive systems during the recovery process needs to be explored. Scenario earthquakes have been used to assess the risk in bridge networks and the social losses in terms of gross regional product reduction. The proposed method for scenario earthquakes has been applied to a real case study: Treviso, a city in the North East of Italy. The proposed method for scenario earthquakes requires three models: one representation of the sources (Italian Seismogenic Zonation 9), one attenuation relationship (Sa- betta and Pugliese 1996) and a model of the occurrence rate of magnitudes (Gutenberg Richter). A methodology has been proposed to reduce thou- sands of scenarios to a subset consistent with the hazard at each location. Earthquake scenarios, along with Mote Carlo method, have been used to simulate business damage. The response of business facilities to earthquake has been obtained from fragility curves for precast industrial building. Fur- thermore, from business damage the reduction of productivity has been simulated using economic data from the National statistical service and a proposed piecewise “loss of functionality model”. To simulate the economic process in the time domain, an innovative businesses recovery function has been proposed. The proposed method has been applied to generate scenarios earthquakes at the location of bridges and business areas. The proposed selection method- ology has been applied to reduce 8000 scenarios to a subset of 60. Subse- quently, these scenario earthquakes have been used to calculate three system performance parameters: the risk in transportation networks, the risk in terms of business damage and the losses of gross regional product. A novel model for business recovery process has been tested. The proposed model has been used to represent the business recovery process and simulate the effects of government aids allocated for reconstruction. The proposed method has efficiently modeled the seismic hazard using scenario earthquakes. The scenario earthquakes presented have been used to assess possible consequences of earthquakes in seismic prone zones and to increase the preparedness. Scenario earthquakes have been used to sim- ulate the effects to economy of the impacted area; a significant Gross Regional Product reduction has been shown, up to 77% with an earthquake with 0.0003 probability of occurrence. The results showed that limited funds available after the disaster can be distributed in a more efficient way

    Development of a national-scale real-time Twitter data mining pipeline for social geodata on the potential impacts of flooding on communities

    Get PDF
    International audienceSocial media, particularly Twitter, is increasingly used to improve resilience during extreme weather events/emergency management situations, including floods: by communicating potential risks and their impacts, and informing agencies and responders. In this paper, we developed a prototype national-scale Twitter data mining pipeline for improved stakeholder situational awareness during flooding events across Great Britain, by retrieving relevant social geodata, grounded in environmental data sources (flood warnings and river levels). With potential users we identified and addressed three research questions to develop this application, whose components constitute a modular architecture for real-time dashboards. First, polling national flood warning and river level Web data sources to obtain at-risk locations. Secondly, real-time retrieval of geotagged tweets, proximate to at-risk areas. Thirdly, filtering flood-relevant tweets with natural language processing and machine learning libraries, using word embeddings of tweets. We demonstrated the national-scale social geodata pipeline using over 420,000 georeferenced tweets obtained between 20-29th June 2016. Highlights • Prototype real-time social geodata pipeline for flood events and demonstration dataset • National-scale flood warnings/river levels set 'at-risk areas' in Twitter API queries • Monitoring multiple locations (without keywords) retrieved current, geotagged tweets • Novel application of word embeddings in flooding context identified relevant tweets • Pipeline extracts tweets to visualise using open-source libraries (SciKit Learn/Gensim) Keywords Flood management; Twitter; volunteered geographic information; natural language processing; word embeddings; social geodata. Hardware required: Intel i3 or mid-performance PC with multicore processor and SSD main drive, 8Gb memory recommended. Software required: Python and library dependencies specified in Appendix A1.2.1, (viii) environment.yml Software availability: All source code can be found at GitHub public repositorie

    Knowledge graphs for covid-19: An exploratory review of the current landscape

    Get PDF
    Background: Searching through the COVID-19 research literature to gain actionable clinical insight is a formidable task, even for experts. The usefulness of this corpus in terms of improving patient care is tied to the ability to see the big picture that emerges when the studies are seen in conjunction rather than in isolation. When the answer to a search query requires linking together multiple pieces of information across documents, simple keyword searches are insufficient. To answer such complex information needs, an innovative artificial intelligence (AI) technology named a knowledge graph (KG) could prove to be effective. Methods: We conducted an exploratory literature review of KG applications in the context of COVID-19. The search term used was "covid-19 knowledge graph". In addition to PubMed, the first five pages of search results for Google Scholar and Google were considered for inclusion. Google Scholar was used to include non-peer-reviewed or non-indexed articles such as pre-prints and conference proceedings. Google was used to identify companies or consortiums active in this domain that have not published any literature, peer-reviewed or otherwise. Results: Our search yielded 34 results on PubMed and 50 results each on Google and Google Scholar. We found KGs being used for facilitating literature search, drug repurposing, clinical trial mapping, and risk factor analysis. Conclusions: Our synopses of these works make a compelling case for the utility of this nascent field of research
    • …
    corecore