Search CORE

5,308 research outputs found

Temporal Representation in Semantic Graphs

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Analyzing complex data using domain constraints

Author: Mauder Markus
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 19/06/2017
Field of study

Data-driven research approaches are becoming increasingly popular in a growing number of scientific disciplines. While a data-driven research approach can yield superior results, generating the required data can be very costly. This frequently leads to small and complex data sets, in which it is impossible to rely on volume alone to compensate for all shortcomings of the data. To counter this problem, other reliable sources of information must be incorporated. In this work, domain knowledge, as a particularly reliable type of additional information, is used to inform data-driven analysis methods. This domain knowledge is represented as constraints on the possible solutions, which the presented methods can use to inform their analysis. It focusses on spatial constraints as a particularly common type of constraint, but the proposed techniques are general enough to be applied to other types of constraints. In this thesis, new methods using domain constraints for data-driven science applications are discussed. These methods have applications in feature evaluation, route database repair, and Gaussian Mixture modeling of spatial data. The first application focuses on feature evaluation. The presented method receives two representations of the same data: one as the intended target and the other for investigation. It calculates a score indicating how much the two representations agree. A presented application uses this technique to compare a reference attribute set with different subsets to determine the importance and relevance of individual attributes. A second technique analyzes route data for constraint compliance. The presented framework allows the user to specify constraints and possible actions to modify the data. The presented method then uses these inputs to generate a version of the data, which agrees with the constraints, while otherwise reducing the impact of the modifications as much as possible. Two extensions of this schema are presented: an extension to continuously valued costs, which are minimized, and an extension to constraints involving more than one moving object. Another addressed application area is modeling of multivariate measurement data, which was measured at spatially distributed locations. The spatial information recorded with the data can be used as the basis for constraints. This thesis presents multiple approaches to building a model of this kind of data while complying with spatial constraints. The first approach is an interactive tool, which allows domain scientists to generate a model of the data, which complies with their knowledge about the data. The second is a Monte Carlo approach, which generates a large number of possible models, tests them for compliance with the constraints, and returns the best one. The final two approaches are based on the EM algorithm and use different ways of incorporating the information into their models. At the end of the thesis, two applications of the models, which have been generated in the previous chapter, are presented. The first is prediction of the origin of samples and the other is the visual representation of the extracted models on a map. These tools can be used by domain scientists to augment their tried and tested tools. The developed techniques are applied to a real-world data set collected in the archaeobiological research project FOR 1670 (Transalpine mobility and cultural transfer) of the German Science Foundation. The data set contains isotope ratio measurements of samples, which were discovered at archaeological sites in the Alps region of central Europe. Using the presented data analysis methods, the data is analyzed to answer relevant domain questions. In a first application, the attributes of the measurements are analyzed for their relative importance and their ability to predict the spatial location of samples. Another presented application is the reconstruction of potential migration routes between the investigated sites. Then spatial models are built using the presented modeling approaches. Univariate outliers are determined and used to predict locations based on the generated models. These are cross-referenced with the recorded origins. Finally, maps of the isotope distribution in the investigated regions are presented. The described methods and demonstrated analyses show that domain knowledge can be used to formulate constraints that inform the data analysis process to yield valid models from relatively small data sets and support domain scientists in their analyses.Datengetriebene Forschungsansätze werden für eine wachsende Anzahl von wissenschaftlichen Disziplinen immer wichtiger. Obwohl ein datengetriebener Forschungsansatz bessere Ergebnisse erzielen kann, kann es sehr teuer sein die notwendigen Daten zu gewinnen. Dies hat häufig zur Folge, dass kleine und komplexe Datensätze entstehen, bei denen es nicht möglich ist sich auf die Menge der Datenpunkte zu verlassen um Probleme bei der Analyse auszugleichen. Um diesem Problem zu begegnen müssen andere Informationsquellen verwendet werden. Fachwissen als eine besonders zuverlässige Quelle solcher Informationen kann herangezogen werden, um die datengetriebenen Analysemethoden zu unterstützen. Dieses Fachwissen wird ausgedrückt als Constraints (Nebenbedingungen) der möglichen Lösungen, die die vorgestellten Methoden benutzen können um ihre Analyse zu steuern. Der Fokus liegt dabei auf räumlichen Constraints als eine besonders häufige Art von Constraints, aber die vorgeschlagenen Methoden sind allgemein genug um auf andere Arte von Constraints angewendet zu werden. Es werden neue Methoden diskutiert, die Fachwissen für datengetriebene wissenschaftliche Anwendungen verwenden. Diese Methoden haben Anwendungen auf Feature-Evaluation, die Reparatur von Bewegungsdatenbanken und auf Gaussian-Mixture-Modelle von räumlichen Daten. Die erste Anwendung betrifft Feature-Evaluation. Die vorgestellte Methode erhält zwei Repräsentationen der selben Daten: eine als Zielrepräsentation und eine zur Untersuchung. Sie berechnet einen Wert, der aussagt, wie einig sich die beiden Repräsentationen sind. Eine vorgestellte Anwendung benutzt diese Technik um eine Referenzmenge von Attributen mit verschiedenen Untermengen zu vergleichen, um die Wichtigkeit und Relevanz einzelner Attribute zu bestimmen. Eine zweite Technik analysiert die Einhaltung von Constraints in Bewegungsdaten. Das präsentierte Framework erlaubt dem Benutzer Constraints zu definieren und mögliche Aktionen zur Veränderung der Daten anzuwenden. Die präsentierte Methode benutzt diese Eingaben dann um eine neue Variante der Daten zu erstellen, die die Constraints erfüllt ohne die Datenbank mehr als notwendig zu verändern. Zwei Erweiterungen dieser Grundidee werden vorgestellt: eine Erweiterung auf stetige Kostenfunktionen, die minimiert werden, und eine Erweiterung auf Bedingungen, die mehr als ein bewegliches Objekt betreffen. Ein weiteres behandeltes Anwendungsgebiet ist die Modellierung von multivariaten Messungen, die an räumlich verteilten Orten gemessen wurden. Die räumliche Information, die zusammen mit diesen Daten erhoben wurde, kann als Grundlage genutzt werden um Constraints zu formulieren. Mehrere Ansätze zum Erstellen von Modellen auf dieser Art von Daten werden vorgestellt, die räumliche Constraints einhalten. Der erste dieser Ansätze ist ein interaktives Werkzeug, das Fachwissenschaftlern dabei hilft, Modelle der Daten zu erstellen, die mit ihrem Wissen über die Daten übereinstimmen. Der zweite ist eine Monte-Carlo-Simulation, die eine große Menge möglicher Modelle erstellt, testet ob sie mit den Constraints übereinstimmen und das beste Modell zurückgeben. Zwei letzte Ansätze basieren auf dem EM-Algorithmus und benutzen verschiedene Arten diese Information in das Modell zu integrieren. Am Ende werden zwei Anwendungen der gerade vorgestellten Modelle vorgestellt. Die erste ist die Vorhersage der Herkunft von Proben und die andere ist die grafische Darstellung der erstellten Modelle auf einer Karte. Diese Werkzeuge können von Fachwissenschaftlern benutzt werden um ihre bewährten Methoden zu unterstützen. Die entwickelten Methoden werden auf einen realen Datensatz angewendet, der von dem archäo-biologischen Forschungsprojekt FOR 1670 (Transalpine Mobilität und Kulturtransfer der Deutschen Forschungsgemeinschaft erhoben worden ist. Der Datensatz enthält Messungen von Isotopenverhältnissen von Proben, die in archäologischen Fundstellen in den zentraleuropäischen Alpen gefunden wurden. Die präsentierten Datenanalyse-Methoden werden verwendet um diese Daten zu analysieren und relevante Forschungsfragen zu klären. In einer ersten Anwendung werden die Attribute der Messungen analysiert um ihre relative Wichtigkeit und ihre Fähigkeit zu bewerten, die räumliche Herkunft der Proben vorherzusagen. Eine weitere vorgestellte Anwendung ist die Wiederherstellung von möglichen Migrationsrouten zwischen den untersuchten Fundstellen. Danach werden räumliche Modelle der Daten unter Verwendung der vorgestellten Methoden erstellt. Univariate Outlier werden bestimmt und ihre möglich Herkunft basierend auf der erstellten Karte wird bestimmt. Die vorhergesagte Herkunft wird mit der tatsächlichen Fundstelle verglichen. Zuletzt werden Karten der Isotopenverteilung der untersuchten Region vorgestellt. Die beschriebenen Methoden und vorgestellten Analysen zeigen, dass Fachwissen verwendet werden kann um Constraints zu formulieren, die den Datenanalyseprozess unterstützen, um gültige Modelle aus relativ kleinen Datensätzen zu erstellen und Fachwissenschaftler bei ihren Analysen zu unterstützen

Advances in Data Modeling Research

Author: Allen Gove
Bajaj Akhilesh
Khatri Vijay
Ram Sudha
Siau Keng
Publication venue: AIS Electronic Library (AISeL)
Publication date: 22/05/2006
Field of study

In this paper, we summarize the discussions of the panel on Advances in Data Modeling Research, held at the Americas Conference on Information Systems (AMCIS) in 2005. We focus on four primary areas where data modeling research offers rich opportunities: spatio-temporal semantics, genome research, ontological analysis and empirical evaluation of existing models. We highlight past work in each area and also discuss open questions, with a view to promoting future research in the overall data modeling area

AIS Electronic Library (AISeL)

Upstream regulatory architecture of rice genes: summarizing the baseline towards genus-wide comparative analysis of regulatory networks and allele mining

Author: A Guo
A Ikeda
A Johansson
A Lovegrove
A Santino
A Wager
A Yilmaz
AB Rose
AC Vlot
AH Christensen
AJK Koo
AM Moabbi
AQ Srivastav
AY Guo
B Lenhard
B Mohanty
Benildo G de los Reyes
Bijayalaxmi Mohanty
C Cheng
C Garcion
C Grierson
C He
C Molina
C Wasternack
CA Lu
CM Fleet
CM Hart
CMJ Pieterse
D Desveaux
D Hao
D Liu
D McElroy
D McElroy
D Todaka
D Wang
DA Dempsey
DG Zarka
DM Riaño-Pachón
Dong-Yup Lee
DW Choi
E Rojo
E Scarpella
E Scarpella
E Wingender
EE Farmer
EE Helliwell
EJ Chapman
EJ Stockinger
EJ Stockinger
F Takaiwa
F Takaiwa
F Waller
F Xu
FA Razem
FB Abeles
G Gao
G Hagen
G Parra
H Abe
H Guo
H Hu
H Itoh
H Liu
H Ohyanagi
H Peng
H Thomas
H Washida
H Zhang
HG Nam
HS Ryu
I Ciolkowski
IC Jang
J Bhattacharyya
J Chen
J Giraudat
J Griffiths
J Jacquemin
J Jacquemin
J Li
J Lu
J Wang
J Yazaki
J Zhang
JEF Butler
JG Dubouzet
JK Thakur
JL Gomez-Porras
JL Nemhauser
JM Chandlee
JM Manners
JM Vogel
JT Vogel
K Baumann
K Gomi
K Hamada
K Higo
K Kaufmann
K Miyamoto
K Nakashima
K Nakashima
K Sato
K Shinozaki
K Sutoh
K Suzuki
K Yamaguchi-Shinozaki
K Yamaguchi-Shinozaki
KR Bradnam
KR Jaglo
KR Jaglo-Ottosen
KY Yun
L Liu
LG Guedes Correa
LM Hartweck
LM Hartweck
LQ Qu
M Cai
M Clancy
M Freeling
M Jain
M Jain
M Jain
M Jakoby
M Kaneko
M Kasuga
M Kieffer
M Lescot
M Ogawa
M Ohme-Takagi
M Ohme-Takagi
M Ohta
M Quint
M Seki
M Shimono
M Shimono
M Ueguchi-Tanaka
M Ueguchi-Tanaka
M Zourelidou
MA Rabbani
ME Alvarez
MF Thomashow
MJ Guiltinan
MK Mejia-Guerra
MP Singh
MR Park
Myoung-Ryoul Park
N Baranowskij
N Bodenhausen
N Darmasiri
N Yokotani
NE Buchler
P Bagnaresi
P Clivan
P Hai
P Perez-Rodriguez
P Silverman
PO Lim
PW Chen
PW Chen
Q Liu
Q Zhu
Q Zhu
QJ Shen
QJ Shen
R Finkelstein
R Mehrotra
RA Creelman
RA Wing
RC Hardison
RL Brown
RL Hong
RT Morris
RV Davuluri
S Bodt De
S Fowler
S Gan
S Ishiguro
S Meier
S Robatzek
S Rounsley
SB Tiwari
SH Hwang
SH Park
SJ Gilmour
SK Lenka
Song Joong Yun
ST Smale
ST Smale
SY Fujimoto
SY Kim
T Hobo
T Juven-Gershon
T Shoji
T Shoji
T Ulmasov
T Ulmasov
T Urao
TW Burke
TW Okita
V Buchanan-Wolaston
V Buchanan-Wollaston
V Chinnusamy
V Matys
W Zhang
X Peng
X Wang
X Xie
X Xu
X Zhang
X Zheng
XQ Liu
Y Croissant-Sych
Y Gu
Y Inukai
Y Kitomi
Y Narusaka
Y Nishizawa
Y Song
Y Yang
Y Zhao
YM Jeong
YS Noh
YY Yamamoto
Z Wang
Z Zhang
Z Zheng
ZK Shinwari
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A General Spatio-Temporal Clustering-Based Non-local Formulation for Multiscale Modeling of Compartmentalized Reservoirs

Author: Castineira David
Darabi Hamed
Esmaeilzadeh Soheil
Hetz Gill
Olalotiti-lawal Feyisayo
Salehi Amir
Publication venue: 'Society of Petroleum Engineers (SPE)'
Publication date: 01/01/2019
Field of study

Representing the reservoir as a network of discrete compartments with neighbor and non-neighbor connections is a fast, yet accurate method for analyzing oil and gas reservoirs. Automatic and rapid detection of coarse-scale compartments with distinct static and dynamic properties is an integral part of such high-level reservoir analysis. In this work, we present a hybrid framework specific to reservoir analysis for an automatic detection of clusters in space using spatial and temporal field data, coupled with a physics-based multiscale modeling approach. In this work a novel hybrid approach is presented in which we couple a physics-based non-local modeling framework with data-driven clustering techniques to provide a fast and accurate multiscale modeling of compartmentalized reservoirs. This research also adds to the literature by presenting a comprehensive work on spatio-temporal clustering for reservoir studies applications that well considers the clustering complexities, the intrinsic sparse and noisy nature of the data, and the interpretability of the outcome. Keywords: Artificial Intelligence; Machine Learning; Spatio-Temporal Clustering; Physics-Based Data-Driven Formulation; Multiscale Modelin

arXiv.org e-Print Archive

Crossref

Molecular eco-systems biology: towards an understanding of community function

Author: Bork P.
Raes J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2008
Field of study

Systems-biology approaches, which are driven by genome sequencing and high-throughput functional genomics data, are revolutionizing single-cell-organism biology. With the advent of various high-throughput techniques that aim to characterize complete microbial ecosystems (metagenomics, meta-transcriptomics and meta-metabolomics), we propose that the time is ripe to consider molecular systems biology at the ecosystem level (eco-systems biology). Here, we discuss the necessary data types that are required to unite molecular microbiology and ecology to develop an understanding of community function and discuss the potential shortcomings of these approaches

MDC Repository

Integration of molecular functions at the ecosystemic level: breakthroughs and future goals of environmental genomics and post-genomics

Author: Altschul
Amann
Atamna-Ismaeel
Ballatori
Ben-Dov
Buck
Béjà
Béjà
Church
Cébron
Danchin
Daniel
De La Torre
De Queiroz
Dekas
Delneri
Dinsdale
Dumont
Falkowski
Feder
Feist
Fitter
Frias-Lopez
Frigaard
Fuhrman
Fuhrman
Galperin
Galperin
Getz
Giovannoni
Hummel
Kassen
Koonin
Krawczyk-Barsch
Leininger
Levasseur
López-García
Mahaffey
Marcy
Martin
Medini
Mitra
Montoya
Moran
Mou
Méreau
Pace
Palsson
Parro
Prinzing
Quince
Raes
Ram
Rasmussen
Rodrigue
Rodriguez-Brito
Rusch
Röling
Sabehi
Schmidt
Shi
Shively
Silva-Filho
Singh
Stranger
Takeda
Tatusov
Teixeira
Torsvik
Treusch
Tringe
Tyson
Ungerer
Vandenkoornhuyse
Venter
Vera
Westerhoff
Woyke
Woyke
Wullschleger
Yang
Yooseph
Zehr
Zehr
Zhu
Publication venue: Blackwell Publishing Ltd
Publication date: 01/01/2010
Field of study

Environmental genomics and genome-wide expression approaches deal with large-scale sequence-based information obtained from environmental samples, at organismal, population or community levels. To date, environmental genomics, transcriptomics and proteomics are arguably the most powerful approaches to discover completely novel ecological functions and to link organismal capabilities, organism–environment interactions, functional diversity, ecosystem processes, evolution and Earth history. Thus, environmental genomics is not merely a toolbox of new technologies but also a source of novel ecological concepts and hypotheses. By removing previous dichotomies between ecophysiology, population ecology, community ecology and ecosystem functioning, environmental genomics enables the integration of sequence-based information into higher ecological and evolutionary levels. However, environmental genomics, along with transcriptomics and proteomics, must involve pluridisciplinary research, such as new developments in bioinformatics, in order to integrate high-throughput molecular biology techniques into ecology. In this review, the validity of environmental genomics and post-genomics for studying ecosystem functioning is discussed in terms of major advances and expectations, as well as in terms of potential hurdles and limitations. Novel avenues for improving the use of these approaches to test theory-driven ecological hypotheses are also explored

Structural interrogation of phosphoproteome identified by mass spectrometry reveals allowed and disallowed regions of phosphoconformation

Author: Balakrishnan Satish
Gautam Amit Kumar Singh
Palmer David
Somavarapu Arun Kumar
Venkatraman Prasanna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

High-throughput mass spectrometric (HT-MS) study is the method of choice for monitoring global changes in proteome. Data derived from these studies are meant for further validation and experimentation to discover novel biological insights. Here we evaluate use of relative solvent accessible surface area (rSASA) and DEPTH as indices to assess experimentally determined phosphorylation events deposited in PhosphoSitePlus. Based on accessibility, we map these identifications on allowed (accessible) or disallowed (inaccessible) regions of phosphoconformation. Surprisingly a striking number of HT- MS/MS derived events (1461/5947 sites or 24.6%) are present in the disallowed region of conformation. By considering protein dynamics, autophosphorylation events and/or the sequence specificity of kinases, 13.8% of these phosphosites can be moved to the allowed region of conformation. We also demonstrate that rSASA values can be used to increase the confidence of identification of phosphorylation sites within an ambiguous MS dataset. While MS is a stand-alone technique for the identification of vast majority of phosphorylation events, identifications within disallowed region of conformation will benefit from techniques that independently probe for phosphorylation and protein dynamics. Our studies also imply that trapping alternate protein conformations may be a viable alternative to the design of inhibitors against mutation prone drug resistance kinases

Crossref

University of Strathclyde Institutional Repository

Springer - Publisher Connector

PubMed Central

Marine Data Fusion for Analyzing Spatio-Temporal Ocean Region Connectivity

Author: Trahms Carola
Publication venue: Universitatsbibliothek Kiel
Publication date: 01/01/2023
Field of study

This thesis develops methods to automate and objectify the connectivity analysis between ocean regions. Existing methods for connectivity analysis often rely on manual integration of expert knowledge, which renders the processing of large amounts of data tedious. This thesis presents a new framework for Data Fusion that provides several approaches for automation and objectification of the entire analysis process. It identifies different complexities of connectivity analysis and shows how the Data Fusion framework can be applied and adapted to them. The framework is used in this thesis to analyze geo-referenced trajectories of fish larvae in the western Mediterranean Sea, to trace the spreading pathways of newly formed water in the subpolar North Atlantic based on their hydrographic properties, and to gauge their temporal change. These examples introduce a new, and highly relevant field of application for the established Data Science methods that were used and innovatively combined in the framework. New directions for further development of these methods are opened up which go beyond optimization of existing methods. The Marine Science, more precisely Physical Oceanography, benefits from the new possibilities to analyze large amounts of data quickly and objectively for its exact research questions. This thesis is a foray into the new field of Marine Data Science. It practically and theoretically explores the possibilities of combining Data Science and the Marine Sciences advantageously for both sides. The example of automating and objectifying connectivity analysis between marine regions in this thesis shows the added value of combining Data Science and Marine Science. This thesis also presents initial insights and ideas on how researchers from both disciplines can position themselves to thrive as Marine Data Scientists and simultaneously advance our understanding of the ocean

MACAU: Open Access Repository of Kiel University