Search CORE

4,009 research outputs found

Tracking repeats using significance and transitivty.

Author: Heringa J.
Szklarczyk R.
Publication venue
Publication date: 01/01/2004
Field of study

transitivity; extreme value distribution Motivation: Internal repeats in coding sequences correspond to structural and functional units of proteins. Moreover, duplication of fragments of coding sequences is known to be a mechanism to facilitate evolution. Identification of repeats is crucial to shed light on the function and structure of proteins, and explain their evolutionary past. The task is difficult because during the course of evolution many repeats diverged beyond recognition. Results: We introduce a new method TRUST, for ab-initio determination of internal repeats in proteins. It provides an improvement in prediction quality as compared to alternative state-of-the-art methods. The increased sensitivity and accuracy of the method is achieved by exploiting the concept of transitivity of alignments. Starting from significant local suboptimal alignments, the application of transitivity allows us to: 1) identify distant repeat homologues for which no alignments were found; 2) gain confidence about consistently well-aligned regions; and 3) recognize and reduce the contribution of nonhomologous repeats. This reassessment step enables us to derive a virtually noise-free profile representing a generalized repeat with high fidelity. We also obtained superior specificity by employing rigid statistical testing for self-sequence and profile-sequence alignments. Assessment was done using a database of repeat annotations based on structural superpositioning. The results show that TRUST is a useful and reliable tool for mining tandem and non-tandem repeats in protein sequence databases, able to predict multiple repeat types with varying intervening segments within a single sequence

CiteSeerX

Crossref

VU Research Portal

Automated smoother for the numerical decoupling of dynamics models

Author: A Benveniste
A Ramos
AJ Bell
Ana Tereza R Vasconcelos
BW Silverman
Carlos CH Borges
D Erdogmus
D Erdogmus
Eberhard O Voit
EO Voit
EO Voit
EO Voit
EO Voit
ET Whittaker
Helena Santos
I Santamaria
IC Chou
J Principe
JM Santos
Jonas S Almeida
JS Almeida
K Hornik
L Sardo
MA Savageau
MA Savageau
Marco Vilela
PH Eilers
S Kikuchi
S Kullback
Susana Vinga
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Structure identification of dynamic models for complex biological systems is the cornerstone of their reverse engineering. Biochemical Systems Theory (BST) offers a particularly convenient solution because its parameters are kinetic-order coefficients which directly identify the topology of the underlying network of processes. We have previously proposed a numerical decoupling procedure that allows the identification of multivariate dynamic models of complex biological processes. While described here within the context of BST, this procedure has a general applicability to signal extraction. Our original implementation relied on artificial neural networks (ANN), which caused slight, undesirable bias during the smoothing of the time courses. As an alternative, we propose here an adaptation of the Whittaker's smoother and demonstrate its role within a robust, fully automated structure identification procedure. Results In this report we propose a robust, fully automated solution for signal extraction from time series, which is the prerequisite for the efficient reverse engineering of biological systems models. The Whittaker's smoother is reformulated within the context of information theory and extended by the development of adaptive signal segmentation to account for heterogeneous noise structures. The resulting procedure can be used on arbitrary time series with a nonstationary noise process; it is illustrated here with metabolic profiles obtained from <it>in-vivo </it>NMR experiments. The smoothed solution that is free of parametric bias permits differentiation, which is crucial for the numerical decoupling of systems of differential equations. Conclusion The method is applicable in signal extraction from time series with nonstationary noise structure and can be applied in the numerical decoupling of system of differential equations into algebraic equations, and thus constitutes a rather general tool for the reverse engineering of mechanistic model descriptions from multivariate experimental time series.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Analyzing User Behavior in Collaborative Environments

Author: Saadat Samaneh
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2020
Field of study

Discrete sequences are the building blocks for many real-world problems in domains including genomics, e-commerce, and social sciences. While there are machine learning methods to classify and cluster sequences, they fail to explain what makes groups of sequences distinguishable. Although in some cases having a black box model is sufficient, there is a need for increased explainability in research areas focused on human behaviors. For example, psychologists are less interested in having a model that predicts human behavior with high accuracy and more concerned with identifying differences between actions that lead to divergent human behavior. This dissertation presents techniques for understanding differences between classes of discrete sequences. We leveraged our developed approaches to study two online collaborative environments: GitHub, a software development platform, and Minecraft, a multiplayer online game. The first approach measures the differences between groups of sequences by comparing k-gram representations of sequences using the silhouette score and characterizing the differences by analyzing the distance matrix of subsequences. The second approach discovers subsequences that are significantly more similar to one set of sequences vs. other sets. This approach, which is called contrast motif discovery, first finds a set of motifs for each group of sequences and then refines them to include the motifs that distinguish that group from other groups of sequences. Compared to existing methods, our technique is scalable and capable of handling long event sequences. Our first case study is GitHub. GitHub is a social coding platform that facilitates distributed, asynchronous collaborations in open source software development. It has an open API to collect metadata about users, repositories, and the activities of users on repositories. To study the dynamics of teams on GitHub, we focused on discrete event sequences that are generated when GitHub users perform actions on this platform. Specifically, we studied the differences that automated accounts (aka bots) make on software development processes and outcomes. We trained black box supervised learning methods to classify sequences of GitHub teams and then utilized our sequence analysis techniques to measure and characterize differences between event sequences of teams with bots and teams without bots. Teams with bots have relatively distinct event sequences from teams without bots in terms of the existence and frequency of short subsequences. Moreover, teams with bots have more novel and less repetitive sequences compared to teams with no bots. In addition, we discovered contrast motifs for human-bot and human-only teams. Our analysis of contrast motifs shows that in human-bot teams, discussions are scattered throughout other activities while in human-only teams discussions tend to cluster together. For our second case study, we applied our sequence mining approaches to analyze player behavior in Minecraft, a multiplayer online game that supports many forms of player collaboration. As a sandbox game, it provides players with a large amount of flexibility in deciding how to complete tasks; this lack of goal-orientation makes the problem of analyzing Minecraft event sequences more challenging than event sequences from more structured games. Using our approaches, we were able to measure and characterize differences between low-level sequences of high-level actions and despite variability in how different players accomplished the same tasks, we discovered contrast motifs for many player actions. Finally, we explored how the level of player collaboration affects the contrast motifs

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Applications of high-frequency telematics for driving behavior analysis

Author: Silva Maria Inês Pastor Pereira da
Publication venue
Publication date: 19/04/2022
Field of study

A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Statistics and EconometricsProcessing driving data and investigating driving behavior has been receiving an increasing interest in the last decades, with applications ranging from car insurance pricing to policy-making. A popular way of analyzing driving behavior is to move the focus to the maneuvers as they give useful information about the driver who is performing them. Previous research on maneuver detection can be divided into two strategies, namely, 1) using fixed thresholds in inertial measurements to define the start and end of specific maneuvers or 2) using features extracted from rolling windows of sensor data in a supervised learning model to detect maneuvers. While the first strategy is not adaptable and requires fine-tuning, the second needs a dataset with labels (which is time-consuming) and cannot identify maneuvers with different lengths in time. To tackle these shortcomings, we investigate a new way of identifying maneuvers from vehicle telematics data, through motif detection in time-series. Using a publicly available naturalistic driving dataset (the UAH-DriveSet), we conclude that motif detection algorithms are not only capable of extracting simple maneuvers such as accelerations, brakes, and turns, but also more complex maneuvers, such as lane changes and overtaking maneuvers, thus validating motif discovery as a worthwhile line for future research in driving behavior. We also propose TripMD, a system that extracts the most relevant driving patterns from sensor recordings (such as acceleration) and provides a visualization that allows for an easy investigation. We test TripMD in the same UAH-DriveSet dataset and show that (1) our system can extract a rich number of driving patterns from a single driver that are meaningful to understand driving behaviors and (2) our system can be used to identify the driving behavior of an unknown driver from a set of drivers whose behavior we know.Nas últimas décadas, o processamento e análise de dados de condução tem recebido um interesse cada vez maior, com aplicações que abrangem a área de seguros de automóveis até a atea de regulação. Tipicamente, a análise de condução compreende a extração e estudo de manobras uma vez que estas contêm informação relevante sobre a performance do condutor. A investigação prévia sobre este tema pode ser dividida em dois tipos de estratégias, a saber, 1) o uso de valores fixos de aceleração para definir o início e fim de cada manobra ou 2) a utilização de modelos de aprendizagem supervisionada em janelas temporais. Enquanto o primeiro tipo de estratégias é inflexível e requer afinação dos parâmetros, o segundo precisa de dados de condução anotados (o que é moroso) e não é capaz de identificar manobras de diferentes durações. De forma a mitigar estas lacunas, neste trabalho, aplicamos métodos desenvolvidos na área de investigação de séries temporais de forma a resolver o problema de deteção de manobras. Em particular, exploramos área de deteção de motifs em séries temporais e testamos se estes métodos genéricos são bem-sucedidos na deteção de manobras. Também propomos o TripMD, um sistema que extrai os padrões de condução mais relevantes de um conjuntos de viagens e fornece uma simples visualização. TripMD é testado num conjunto de dados públicos (o UAH-DriveSet), do qual concluímos que (1) o nosso sistema é capaz de extrair padrões de condução/manobras de um único condutor que estão relacionados com o perfil de condução do condutor em questão e (2) o nosso sistema pode ser usado para identificar o perfil de condução de um condutor desconhecido de um conjunto de condutores cujo comportamento nos é conhecido

Repositório da Universidade Nova de Lisboa

Combating User Misbehavior on Social Media

Author: Cao Cheng
Publication venue
Publication date: 21/09/2018
Field of study

Social media encourages user participation and facilitates user’s self-expression like never before. While enriching user behavior in a spectrum of means, many social media platforms have become breeding grounds for user misbehavior. In this dissertation we focus on understanding and combating three specific threads of user misbehaviors that widely exist on social media — spamming, manipulation, and distortion. First, we address the challenge of detecting spam links. Rather than rely on traditional blacklist-based or content-based methods, we examine the behavioral factors of both who is posting the link and who is clicking on the link. The core intuition is that these behavioral signals may be more difficult to manipulate than traditional signals. We find that this purely behavioral approach can achieve good performance for robust behavior-based spam link detection. Next, we deal with uncovering manipulated behavior of link sharing. We propose a four-phase approach to model, identify, characterize, and classify organic and organized groups who engage in link sharing. The key motivating insight is that group-level behavioral signals can distinguish manipulated user groups. We find that levels of organized behavior vary by link type and that the proposed approach achieves good performance measured by commonly-used metrics. Finally, we investigate a particular distortion behavior: making bullshit (BS) statements on social media. We explore the factors impacting the perception of BS and what leads users to ultimately perceive and call a post BS. We begin by preparing a crowdsourced collection of real social media posts that have been called BS. We then build a classification model that can determine what posts are more likely to be called BS. Our experiments suggest our classifier has the potential of leveraging linguistic cues for detecting social media posts that are likely to be called BS. We complement these three studies with a cross-cutting investigation of learning user topical profiles, which can shed light into what subjects each user is associated with, which can benefit the understanding of the connection between user and misbehavior. Concretely, we propose a unified model for learning user topical profiles that simultaneously considers multiple footprints and we show how these footprints can be embedded in a generalized optimization framework. Through extensive experiments on millions of real social media posts, we find our proposed models can effectively combat user misbehavior on social media

Texas A&M Repository

Abundance of intrinsic disorder in SV-IV, a multifunctional androgen-dependent protein secreted from rat seminal vesicle

Author: Ambrosone
Bairoch
Bornberg-Bauer
Bourhis
Cai
Caporale
Cheng
Coeytaux
Csizmók
Doszatányi
Dosztányi
Dunker
Dunker
Dyson
D’Ambrosio
Esposito
Ferron
Gaboriaud
Galzitskaya
Galzitskaya
Galzitskaya
Garbuzynskiy
Harris
Hirose
Ialenti
Kandala
Kyte
Li
Lin
Linding
Linding
Liu
Lupas
MacCallum
McDonald
Metafora
Metafora
Miele
Murzin
Obradovic
Obradovic
Ostrowski
Pan
Prilusky
Quevillon-Cheruel
Radivojac
Ragone
Romero
Romero
Rüping
Shimizu
Shimizu
Sickmeier
Stiuso
Stiuso
Tompa
Tufano
Uversky
Uversky
Vucetic
Ward
Weathers
Weathers
Wolf
Wootton
Wright
Yang
Publication venue
Publication date: 06/12/2007
Field of study

The potent immunomodulatory, anti-inflammatory and procoagulant properties of the
protein no. 4 secreted from the rat seminal vesicle epithelium (SV-IV) have been
previously found to be modulated by a supramolecular monomer-trimer equilibrium.
More structural details that integrate experimental data into a predictive framework
have recently been reported. Unfortunately, homology modelling and fold-recognition
strategies were not successful in creating a theoretical model of the structural
organization of SV-IV. It was inferred that the global structure of SV-IV is not similar
to any protein of known three-dimensional structure. Reversing the classical approach
to the sequence-structure-function paradigm, in this paper we report on novel
information obtained by comparing physicochemical parameters of SV-IV with two
datasets made of intrinsically unfolded and ideally globular proteins. In addition, we
have analysed the SV-IV sequence by several publicly available disorder-oriented
predictors. Overall, disorder predictions and a re-examination of existing experimental
data strongly suggest that SV-IV needs large plasticity to efficiently interact with the
different targets that characterize its multifaceted biological function and should be
therefore better classified as an intrinsically disordered protein

CiteSeerX

Crossref

Archivio della Ricerca - Università di Salerno

Nature Precedings

Input variable selection in time-critical knowledge integration applications: A review, analysis, and recommendation paper

Author: A. Mousavi
Ambrosetti
Askin
Banks
Banks
Beylkin
Blum
Borgonovo
Braddock
Brodersen
Buchenneder
Bunke
Buonomo
Charaniya
Chen
Chi
Cloke
Cukier
De Pauw
Duffy
Durkee
Faghihi
Gaweda
Guyon
Hand
He
Hung
Jain
James
Joliffe
Kang
Kim
Kohavi
Krugera
Krzykacz-Hausmann
Kwak
Lallemand
Lavrač
Lemaire
Li
Li
Liu
McRae
Mirkin
Mladenić
Norvig
Park
Quevedo
Ragg
Robert
S. Poslad
S. Tavakoli
Saltelli
Saltelli
Shonkwiler
Sobol
Takagi
Talavera
Tavakoli
Unler
Uysal
Xing
Xu
Yang
Øksendal
Publication venue: 'Elsevier BV'
Publication date: 01/10/2013
Field of study

This is the post-print version of the final paper published in Advanced Engineering Informatics. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.The purpose of this research is twofold: first, to undertake a thorough appraisal of existing Input Variable Selection (IVS) methods within the context of time-critical and computation resource-limited dimensionality reduction problems; second, to demonstrate improvements to, and the application of, a recently proposed time-critical sensitivity analysis method called EventTracker to an environment science industrial use-case, i.e., sub-surface drilling. Producing time-critical accurate knowledge about the state of a system (effect) under computational and data acquisition (cause) constraints is a major challenge, especially if the knowledge required is critical to the system operation where the safety of operators or integrity of costly equipment is at stake. Understanding and interpreting, a chain of interrelated events, predicted or unpredicted, that may or may not result in a specific state of the system, is the core challenge of this research. The main objective is then to identify which set of input data signals has a significant impact on the set of system state information (i.e. output). Through a cause-effect analysis technique, the proposed technique supports the filtering of unsolicited data that can otherwise clog up the communication and computational capabilities of a standard supervisory control and data acquisition system. The paper analyzes the performance of input variable selection techniques from a series of perspectives. It then expands the categorization and assessment of sensitivity analysis methods in a structured framework that takes into account the relationship between inputs and outputs, the nature of their time series, and the computational effort required. The outcome of this analysis is that established methods have a limited suitability for use by time-critical variable selection applications. By way of a geological drilling monitoring scenario, the suitability of the proposed EventTracker Sensitivity Analysis method for use in high volume and time critical input variable selection problems is demonstrated.E

Crossref

Brunel University Research Archive

Record Linkage and Matching Problems in Forensics

Author: Tan Xiao Hui
Publication venue: Iowa State University Digital Repository
Publication date: 01/11/2018
Field of study

In forensics, evidence such as DNA, fingerprints, bullets and cartridge cases, shoeprints or digital evidence is often compared, to infer if they come from the same or different sources. This helps to generate leads through database searches, where information from different investigations can be combined, if pieces of evidence are judged to have come from the same source. For specific pairs of comparisons, such as whether a particular cartridge case comes from a suspect\u27s gun, an inference of a match can also be used as testimony in courts. We demonstrate how such matching problems fit into the record linkage framework commonly used in statistics and computer science, illustrating this using examples from DNA and firearms identification. We propose some ways that record linkage can inform forensic matching. Finally, we develop methodology to match accounts on anonymous marketplaces. In forensic matching, the stakes are high and the consequences of false arrests or wrongful convictions are severe. The field would benefit from a more principled way of developing matching methods

Digital Repository @ Iowa State University (ISU)

Crossref