72,615 research outputs found

    Ontology-based knowledge representation of experiment metadata in biological data mining

    Get PDF
    According to the PubMed resource from the U.S. National Library of Medicine, over 750,000 scientific articles have been published in the ~5000 biomedical journals worldwide in the year 2007 alone. The vast majority of these publications include results from hypothesis-driven experimentation in overlapping biomedical research domains. Unfortunately, the sheer volume of information being generated by the biomedical research enterprise has made it virtually impossible for investigators to stay aware of the latest findings in their domain of interest, let alone to be able to assimilate and mine data from related investigations for purposes of meta-analysis. While computers have the potential for assisting investigators in the extraction, management and analysis of these data, information contained in the traditional journal publication is still largely unstructured, free-text descriptions of study design, experimental application and results interpretation, making it difficult for computers to gain access to the content of what is being conveyed without significant manual intervention. In order to circumvent these roadblocks and make the most of the output from the biomedical research enterprise, a variety of related standards in knowledge representation are being developed, proposed and adopted in the biomedical community. In this chapter, we will explore the current status of efforts to develop minimum information standards for the representation of a biomedical experiment, ontologies composed of shared vocabularies assembled into subsumption hierarchical structures, and extensible relational data models that link the information components together in a machine-readable and human-useable framework for data mining purposes

    Structuring visual exploratory analysis of skill demand

    No full text
    The analysis of increasingly large and diverse data for meaningful interpretation and question answering is handicapped by human cognitive limitations. Consequently, semi-automatic abstraction of complex data within structured information spaces becomes increasingly important, if its knowledge content is to support intuitive, exploratory discovery. Exploration of skill demand is an area where regularly updated, multi-dimensional data may be exploited to assess capability within the workforce to manage the demands of the modern, technology- and data-driven economy. The knowledge derived may be employed by skilled practitioners in defining career pathways, to identify where, when and how to update their skillsets in line with advancing technology and changing work demands. This same knowledge may also be used to identify the combination of skills essential in recruiting for new roles. To address the challenges inherent in exploring the complex, heterogeneous, dynamic data that feeds into such applications, we investigate the use of an ontology to guide structuring of the information space, to allow individuals and institutions to interactively explore and interpret the dynamic skill demand landscape for their specific needs. As a test case we consider the relatively new and highly dynamic field of Data Science, where insightful, exploratory data analysis and knowledge discovery are critical. We employ context-driven and task-centred scenarios to explore our research questions and guide iterative design, development and formative evaluation of our ontology-driven, visual exploratory discovery and analysis approach, to measure where it adds value to users’ analytical activity. Our findings reinforce the potential in our approach, and point us to future paths to build on

    A Triclustering Approach for Time Evolving Graphs

    Full text link
    This paper introduces a novel technique to track structures in time evolving graphs. The method is based on a parameter free approach for three-dimensional co-clustering of the source vertices, the target vertices and the time. All these features are simultaneously segmented in order to build time segments and clusters of vertices whose edge distributions are similar and evolve in the same way over the time segments. The main novelty of this approach lies in that the time segments are directly inferred from the evolution of the edge distribution between the vertices, thus not requiring the user to make an a priori discretization. Experiments conducted on a synthetic dataset illustrate the good behaviour of the technique, and a study of a real-life dataset shows the potential of the proposed approach for exploratory data analysis

    Learning functional object categories from a relational spatio-temporal representation

    Get PDF
    Abstract. We propose a framework that learns functional objectcategories from spatio-temporal data sets such as those abstracted from video. The data is represented as one activity graph that encodes qualitative spatio-temporal patterns of interaction between objects. Event classes are induced by statistical generalization, the instances of which encode similar patterns of spatio-temporal relationships between objects. Equivalence classes of objects are discovered on the basis of their similar role in multiple event instantiations. Objects are represented in a multidimensional space that captures their role in all the events. Unsupervised learning in this space results in functional object-categories. Experiments in the domain of food preparation suggest that our techniques represent a significant step in unsupervised learning of functional object categories from spatio-temporal patterns of object interaction.

    An Integrated Approach for Characterizing Aerosol Climate Impacts and Environmental Interactions

    Get PDF
    Aerosols exert myriad influences on the earth's environment and climate, and on human health. The complexity of aerosol-related processes requires that information gathered to improve our understanding of climate change must originate from multiple sources, and that effective strategies for data integration need to be established. While a vast array of observed and modeled data are becoming available, the aerosol research community currently lacks the necessary tools and infrastructure to reap maximum scientific benefit from these data. Spatial and temporal sampling differences among a diverse set of sensors, nonuniform data qualities, aerosol mesoscale variabilities, and difficulties in separating cloud effects are some of the challenges that need to be addressed. Maximizing the long-term benefit from these data also requires maintaining consistently well-understood accuracies as measurement approaches evolve and improve. Achieving a comprehensive understanding of how aerosol physical, chemical, and radiative processes impact the earth system can be achieved only through a multidisciplinary, inter-agency, and international initiative capable of dealing with these issues. A systematic approach, capitalizing on modern measurement and modeling techniques, geospatial statistics methodologies, and high-performance information technologies, can provide the necessary machinery to support this objective. We outline a framework for integrating and interpreting observations and models, and establishing an accurate, consistent, and cohesive long-term record, following a strategy whereby information and tools of progressively greater sophistication are incorporated as problems of increasing complexity are tackled. This concept is named the Progressive Aerosol Retrieval and Assimilation Global Observing Network (PARAGON). To encompass the breadth of the effort required, we present a set of recommendations dealing with data interoperability; measurement and model integration; multisensor synergy; data summarization and mining; model evaluation; calibration and validation; augmentation of surface and in situ measurements; advances in passive and active remote sensing; and design of satellite missions. Without an initiative of this nature, the scientific and policy communities will continue to struggle with understanding the quantitative impact of complex aerosol processes on regional and global climate change and air quality

    Numeral Understanding in Financial Tweets for Fine-grained Crowd-based Forecasting

    Full text link
    Numerals that contain much information in financial documents are crucial for financial decision making. They play different roles in financial analysis processes. This paper is aimed at understanding the meanings of numerals in financial tweets for fine-grained crowd-based forecasting. We propose a taxonomy that classifies the numerals in financial tweets into 7 categories, and further extend some of these categories into several subcategories. Neural network-based models with word and character-level encoders are proposed for 7-way classification and 17-way classification. We perform backtest to confirm the effectiveness of the numeric opinions made by the crowd. This work is the first attempt to understand numerals in financial social media data, and we provide the first comparison of fine-grained opinion of individual investors and analysts based on their forecast price. The numeral corpus used in our experiments, called FinNum 1.0 , is available for research purposes.Comment: Accepted by the 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2018), Santiago, Chil

    Animal community dynamics at senescent and active vents at the 9° N East Pacific Rise after a volcanic eruption

    Get PDF
    © The Author(s), 2020. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Gollner, S., Govenar, B., Arbizu, P. M., Mullineaux, L. S., Mills, S., Le Bris, N., Weinbauer, M., Shank, T. M., & Bright, M. Animal community dynamics at senescent and active vents at the 9° N East Pacific Rise after a volcanic eruption. Frontiers in Marine Science, 6, (2020): 832, doi:10.3389/fmars.2019.00832.In 2005/2006, a major volcanic eruption buried faunal communities over a large area of the 9°N East Pacific Rise (EPR) vent field. In late 2006, we initiated colonization studies at several types of post eruption vent communities including those that either survived the eruption, re-established after the eruption, or arisen at new sites. Some of these vents were active whereas others appeared senescent. Although the spatial scale of non-paved (surviving) vent communities was small (several m2 compared to several km2 of total paved area), the remnant individuals at surviving active and senescent vent sites may be important for recolonization. A total of 46 meio- and macrofauna species were encountered at non-paved areas with 33 of those species detected were also present at new sites in 2006. The animals living at non-paved areas represent refuge populations that could act as source populations for new vent sites directly after disturbance. Remnants may be especially important for the meiofauna, where many taxa have limited or no larval dispersal. Meiofauna may reach new vent sites predominantly via migration from local refuge areas, where a reproductive and abundant meiofauna is thriving. These findings are important to consider in any potential future deep-sea mining scenario at deep-sea hydrothermal vents. Within our 4-year study period, we regularly observed vent habitats with tubeworm assemblages that became senescent and died, as vent fluid emissions locally stopped at patches within active vent sites. Senescent vents harbored a species rich mix of typical vent species as well as rare yet undescribed species. The senescent vents contributed significantly to diversity at the 9°N EPR with 55 macrofaunal species (11 singletons) and 74 meiofaunal species (19 singletons). Of these 129 species associated with senescent vents, 60 have not been reported from active vents. Tubeworms and other vent megafauna not only act as foundation species when alive but provide habitat also when dead, sustaining abundant and diverse small sized fauna.We received funding from the Austrian FWF (GrantP20190-B17; MB), the U.S. National Science Foundation (OCE-0424953; to LM, D. McGillicuddy, A. Thurnherr, J. Ledwell, and W. Lavelle; and OCE-1356738 to LM), and the European Union Seventh Framework Programme (FP7/2007-2013) under the MIDAS project, Grant Agreement No. 603418. Ifremer and CNRS (France) supported NL cruise participation and sensor developments. BG was supported by a postdoctoral fellowship from the Deep Ocean Exploration Institute at WHOI (United States). TS was supported by the U.S. National Science Foundation (OCE-0327261 to TS and OCE-0937395 to TS and BG)

    Event-based Access to Historical Italian War Memoirs

    Full text link
    The progressive digitization of historical archives provides new, often domain specific, textual resources that report on facts and events which have happened in the past; among these, memoirs are a very common type of primary source. In this paper, we present an approach for extracting information from Italian historical war memoirs and turning it into structured knowledge. This is based on the semantic notions of events, participants and roles. We evaluate quantitatively each of the key-steps of our approach and provide a graph-based representation of the extracted knowledge, which allows to move between a Close and a Distant Reading of the collection.Comment: 23 pages, 6 figure
    • 

    corecore