101 research outputs found
Expanding The NIF Ecosystem - Corpus Conversion, Parsing And Processing Using The NLP Interchange Format 2.0
This work presents a thorough examination and expansion of the NIF ecosystem
Recommended from our members
Interpretation, Identification and Reuse of Models. Theory and algorithms with applications in predictive toxicology.
This thesis is concerned with developing methodologies that enable existing
models to be effectively reused. Results of this thesis are presented in
the framework of Quantitative Structural-Activity Relationship (QSAR)
models, but their application is much more general. QSAR models relate
chemical structures with their biological, chemical or environmental
activity. There are many applications that offer an environment to build
and store predictive models. Unfortunately, they do not provide advanced
functionalities that allow for efficient model selection and for interpretation
of model predictions for new data. This thesis aims to address these
issues and proposes methodologies for dealing with three research problems:
model governance (management), model identification (selection),
and interpretation of model predictions. The combination of these methodologies
can be employed to build more efficient systems for model reuse
in QSAR modelling and other areas.
The first part of this study investigates toxicity data and model formats
and reviews some of the existing toxicity systems in the context of model
development and reuse. Based on the findings of this review and the principles
of data governance, a novel concept of model governance is defined.
Model governance comprises model representation and model governance
processes. These processes are designed and presented in the context of
model management. As an application, minimum information requirements
and an XML representation for QSAR models are proposed.
Once a collection of validated, accepted and well annotated models is
available within a model governance framework, they can be applied for
new data. It may happen that there is more than one model available for
the same endpoint. Which one to chose? The second part of this thesis
proposes a theoretical framework and algorithms that enable automated
identification of the most reliable model for new data from the collection
of existing models. The main idea is based on partitioning of the search
space into groups and assigning a single model to each group. The construction
of this partitioning is difficult because it is a bi-criteria problem.
The main contribution in this part is the application of Pareto points for
the search space partition. The proposed methodology is applied to three
endpoints in chemoinformatics and predictive toxicology.
After having identified a model for the new data, we would like to know
how the model obtained its prediction and how trustworthy it is. An interpretation
of model predictions is straightforward for linear models thanks
to the availability of model parameters and their statistical significance.
For non linear models this information can be hidden inside the model
structure. This thesis proposes an approach for interpretation of a random
forest classification model. This approach allows for the determination of
the influence (called feature contribution) of each variable on the model
prediction for an individual data. In this part, there are three methods proposed
that allow analysis of feature contributions. Such analysis might
lead to the discovery of new patterns that represent a standard behaviour
of the model and allow additional assessment of the model reliability for
new data. The application of these methods to two standard benchmark
datasets from the UCI machine learning repository shows a great potential
of this methodology. The algorithm for calculating feature contributions
has been implemented and is available as an R package called rfFC.BBSRC and Syngenta (International Research Centre at Jealott’s Hill, Bracknell, UK)
MediaSync: Handbook on Multimedia Synchronization
This book provides an approachable overview of the most recent advances in the fascinating field of media synchronization (mediasync), gathering contributions from the most representative and influential experts. Understanding the challenges of this field in the current multi-sensory, multi-device, and multi-protocol world is not an easy task. The book revisits the foundations of mediasync, including theoretical frameworks and models, highlights ongoing research efforts, like hybrid broadband broadcast (HBB) delivery and users' perception modeling (i.e., Quality of Experience or QoE), and paves the way for the future (e.g., towards the deployment of multi-sensory and ultra-realistic experiences). Although many advances around mediasync have been devised and deployed, this area of research is getting renewed attention to overcome remaining challenges in the next-generation (heterogeneous and ubiquitous) media ecosystem. Given the significant advances in this research area, its current relevance and the multiple disciplines it involves, the availability of a reference book on mediasync becomes necessary. This book fills the gap in this context. In particular, it addresses key aspects and reviews the most relevant contributions within the mediasync research space, from different perspectives. Mediasync: Handbook on Multimedia Synchronization is the perfect companion for scholars and practitioners that want to acquire strong knowledge about this research area, and also approach the challenges behind ensuring the best mediated experiences, by providing the adequate synchronization between the media elements that constitute these experiences
Personalization platform for multimodal ubiquitous computing applications
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaWe currently live surrounded by a myriad of computing devices running multiple applications.
In general, the user experience on each of those scenarios is not adapted to each
user’s specific needs, without personalization and integration across scenarios. Moreover, developers usually do not have the right tools to handle that in a standard and generic way. As such, a personalization platform may provide those tools.
This kind of platform should be readily available to be used by any developer. Therefore, it must be developed to be available over the Internet. With the advances in IT infrastructure, it is now possible to develop reliable and scalable services running on abstract and virtualized platforms. Those are some of the advantages of cloud computing, which offers a model of utility computing where customers are able to dynamically allocate the resources they need and are charged accordingly.
This work focuses on the creation of a cloud-based personalization platform built on
a previously developed generic user modeling framework. It provides user profiling and
context-awareness tools to third-party developers.
A public display-based application was also developed. It provides useful information
to students, teachers and others in a university campus as they are detected by Bluetooth scanning. It uses the personalization platform as the basis to select the most relevant information in each situation, while a mobile application was developed to be used as an input mechanism. A user study was conducted to assess the usefulness of the application and to validate some design choices. The results were mostly positive
- …