Search CORE

449 research outputs found

Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling

Author: Cresci Stefano
Di Pietro Roberto
Petrocchi Marinella
Spognardi Angelo
Tesconi Maurizio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Spambot detection in online social networks is a long-lasting challenge involving the study and design of detection techniques capable of efficiently identifying ever-evolving spammers. Recently, a new wave of social spambots has emerged, with advanced human-like characteristics that allow them to go undetected even by current state-of-the-art algorithms. In this paper, we show that efficient spambots detection can be achieved via an in-depth analysis of their collective behaviors exploiting the digital DNA technique for modeling the behaviors of social network users. Inspired by its biological counterpart, in the digital DNA representation the behavioral lifetime of a digital account is encoded in a sequence of characters. Then, we define a similarity measure for such digital DNA sequences. We build upon digital DNA and the similarity between groups of users to characterize both genuine accounts and spambots. Leveraging such characterization, we design the Social Fingerprinting technique, which is able to discriminate among spambots and genuine accounts in both a supervised and an unsupervised fashion. We finally evaluate the effectiveness of Social Fingerprinting and we compare it with three state-of-the-art detection algorithms. Among the peculiarities of our approach is the possibility to apply off-the-shelf DNA analysis techniques to study online users behaviors and to efficiently rely on a limited number of lightweight account characteristics

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Online Research Database In Technology

Classification of time series patterns from complex dynamic systems

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Algumas aplicações da Inteligência Artificial em Biotecnologia

Author: Letti Luiz Alberto Junior
Soccol Carlos Ricardo
Strapasson Rogerio Antonio
Woiciechiwski Adenise Lorenci
Publication venue: 'Universidade Federal do Tocantins'
Publication date: 04/02/2014
Field of study

The present work is a revision about neural networks. Initially presents a little introduction to neural networks, fuzzy logic, a brief history, and the applications of Neural Networks on Biotechnology. The chosen sub-areas of the applications of Neural Networks on Biotechnology are, Solid-State Fermentation Optimization, DNA Sequencing, Molecular Sequencing Analysis, Quantitative Structure-Activity Relationship, Soft Sensing, Spectra Interpretation, Data Mining, each one use a special kind of neural network like feedforward, recurrent, siamese, art, among others. Applications of the Neural-Networks in spectra interpretation and Quantitative Structure-activity relationships, is a direct application to Chemistry and consequently also to Biochemistry and Biotechnology. Soft Sensing is a special example for applications on Biotechnology. It is a method to measure variables that normally can’t be directly measure. Solid state fermentation was optimized and presenting, as result, a strong increasing of production efficiency.O presente trabalho é uma revisão sobre redes neurais. Inicialmente apresenta uma breve introdução a redes neurais, lógica difusa, um breve histórico, e aplicações de Redes Neurais em Biotecnologia. As subáreas escolhidas para aplicação das redes neurais são, Otimização da Fermentação no Estado-Sólido, Sequenciamento de DNA, Análise Molecular Sequencial, Relação Quantitativa Strutura-Atividade, Sensores inteligentes, Interpretação de espectros, Mineração de Dados, sendo que cada um usa um tipo especial de rede neural, tais como feed forward, recorrente, siamesa, art, entre outros. Aplicações de Redes Neurais em interpretação de espectros e Relação Quantitativa Estrutura-Atividade, como uma aplicação direta à química e consequentemente também para a Bioquímica e Biotecnologia. Os sensores Inteligentes são um exemplo especial de aplicação em Biotecnologia. É um método de medir variáveis que normalmente não podem ser medidas de forma direta. Fermentações no Estado-sólido foram otimizadas e, apresentaram como resultado um forte aumento do rendimento na produção final

Journal of Biotechnology and Biodiversity

Crossref

Periódicos UFT (Universidade Federal do Tocantins)

Recommended from our members

EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences.

Author: Ge Xinzhou
Kwon Soo Bin
Li Jingyi Jessica
Li Wei Vivian
Xie Lingjue
Zhang Haowen
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the efficacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns

eScholarship - University of California

Recommended from our members

Discovering multi-purpose modules through deep multitask learning

Author: Meyerson Elliot Keeler
Publication venue
Publication date: 03/04/2019
Field of study

Machine learning scientists aim to discover techniques that can be applied across diverse sets of problems. Such techniques need to exploit regularities that are shared across tasks. This begs the question: What shared regularity is not yet being exploited? Complex tasks may share structure that is difficult for humans to discover. The goal of deep multitask learning is to discover and exploit this structure automatically by training a joint model across tasks. To this end, this dissertation introduces a deep multitask learning framework for collecting generic functional modules that are used in different ways to solve different problems. Within this framework, a progression of systems is developed based on assembling shared modules into task models and leveraging the complementary advantages of gradient descent and evolutionary optimization. In experiments, these systems confirm that modular sharing improves performance across a range of application areas, including general video game playing, computer vision, natural language processing, and genomics; yielding state-of-the-art results in several cases. The conclusion is that multi-purpose modules discovered by deep multitask learning can exceed those developed by humans in performance and generality.Computer Science

Texas ScholarWorks

AP: Artificial Programming

Author: Kohli Pushmeet
Singh Rishabh
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 2nd Summit on Advances in Programming Languages (SNAPL 2017)
Publication date: 01/01/2017
Field of study

The ability to automatically discover a program consistent with a given user intent (specification) is the holy grail of Computer Science. While significant progress has been made on the so-called problem of Program Synthesis, a number of challenges remain; particularly for the case of synthesizing richer and larger programs. This is in large part due to the difficulty of search over the space of programs. In this paper, we argue that the above-mentioned challenge can be tackled by learning synthesizers automatically from a large amount of training data. We present a first step in this direction by describing our novel synthesis approach based on two neural architectures for tackling the two key challenges of Learning to understand partial input-output specifications and Learning to search programs. The first neural architecture called the Spec Encoder computes a continuous representation of the specification, whereas the second neural architecture called the Program Generator incrementally constructs programs in a hypothesis space that is conditioned by the specification vector. The key idea of the approach is to train these architectures using a large set of (spec,P) pairs, where P denotes a program sampled from the DSL L and spec denotes the corresponding specification satisfied by P. We demonstrate the effectiveness of our approach on two preliminary instantiations. The first instantiation, called Neural FlashFill, corresponds to the domain of string manipulation programs similar to that of FlashFill. The second domain considers string transformation programs consisting of composition of API functions. We show that a neural system is able to perform quite well in learning a large majority of programs from few input-output examples. We believe this new approach will not only dramatically expand the applicability and effectiveness of Program Synthesis, but also would lead to the coming together of the Program Synthesis and Machine Learning research disciplines

Dagstuhl Research Online Publication Server

Building a finite state automaton for physical processes using queries and counterexamples on long short-term memory models

Author: Skarstein Eivind Anton Sætre
Publication venue: The University of Bergen
Publication date: 26/06/2023
Field of study

Most neural networks (NN) are commonly used as black-box functions. A network takes an input and produces an output, without the user knowing what rules and system dynamics have produced the specific output. In some situations, such as safety-critical applications, having the capability of understanding and validating models before applying them can be crucial. In this regard, some approaches for representing NN in more understandable ways, attempt to accurately extract symbolic knowledge from the networks using interpretable and simple systems consisting of a finite set of states and transitions known as deterministic finite-state automata (DFA). In this thesis, we have considered a rule extraction approach developed by Weiss et al. that employs the exact learning method L* to extract DFA from recurrent neural networks (RNNs) trained on classifying symbolic data sequences. Our aim has been to study the practicality of applying their rule extraction approach on more complex data based on physical processes consisting of continuous values. Specifically, we experimented with datasets of varying complexities, considering both the inherent complexity of the dataset itself and complexities introduced from different discretization intervals used to represent the continuous data values. Datasets incorporated in this thesis encompass sine wave prediction datasets, sequence value prediction datasets, and a safety-critical well-drilling pressure scenario generated through the use of the well-drilling simulator OpenLab and the sparse identification of nonlinear dynamical systems (SINDy) algorithm. We observe that the rule extraction algorithm is able to extract simple and small DFA representations of LSTM models. On the considered datasets, extracted DFA generally demonstrates worse performance than the LSTM models used for extraction. Overall, for both increasing problem complexity and more discretization intervals, the performance of the extracted DFA decreases. However, DFA extracted from datasets discretized using few intervals yields more impressive results, and the algorithm can in some cases extract DFA that outperforms their respective LSTM models.Masteroppgave i informatikkINF399MAMN-INFMAMN-PRO

University of Bergen

Recommended from our members

Classification of time series patterns from complex dynamic systems

Author: Rao N.
Schryver J.C.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/07/1998
Field of study

An increasing availability of high-performance computing and data storage media at decreasing cost is making possible the proliferation of large-scale numerical databases and data warehouses. Numeric warehousing enterprises on the order of hundreds of gigabytes to terabytes are a reality in many fields such as finance, retail sales, process systems monitoring, biomedical monitoring, surveillance and transportation. Large-scale databases are becoming more accessible to larger user communities through the internet, web-based applications and database connectivity. Consequently, most researchers now have access to a variety of massive datasets. This trend will probably only continue to grow over the next several years. Unfortunately, the availability of integrated tools to explore, analyze and understand the data warehoused in these archives is lagging far behind the ability to gain access to the same data. In particular, locating and identifying patterns of interest in numerical time series data is an increasingly important problem for which there are few available techniques. Temporal pattern recognition poses many interesting problems in classification, segmentation, prediction, diagnosis and anomaly detection. This research focuses on the problem of classification or characterization of numerical time series data. Highway vehicles and their drivers are examples of complex dynamic systems (CDS) which are being used by transportation agencies for field testing to generate large-scale time series datasets. Tools for effective analysis of numerical time series in databases generated by highway vehicle systems are not yet available, or have not been adapted to the target problem domain. However, analysis tools from similar domains may be adapted to the problem of classification of numerical time series data

UNT Digital Library

딥러닝 기반의 분자 특성 예측 연구

Author: 조정희
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(박사) -- 서울대학교대학원 : 자연과학대학 협동과정 생물정보학전공, 2021.8. 윤성로.Deep learning (DL) has been advanced in various fields, such as vision tasks, language processing, and natural sciences. Recently, several remarkable researches in computational chemistry were accomplished by DL-based methods. However, the chemical system consists of diverse elements and their interactions. As a result, it is not trivial to predict chemical properties which are determined by intrinsically complicated factors. Consequently, conventional approaches usually depend on tremendous calculations for chemical simulations or predictions, which are cost-intensive and time-consuming. To address recent issues, we studied deep learning for computational chemistry. We focused on the chemical property prediction from molecular structure representations. A molecular structure is a complex of atoms and their arrangements. The molecular property is determined by the interactions from all these components. Therefore, molecular structural representations are the key factor in the chemical property prediction tasks. In particular, we explored public property prediction tasks in pharmacology, organic chemistry, and quantum chemistry. Molecular structures can be described as categorical sequences or geometric graphs. We utilized both representational formats for prediction tasks, and achieved competitive model performances. Our studies verified that the molecular representation is essential for various tasks in chemistry, and using appropriate types of neural networks for the representation type is significant to the model predictability.딥러닝 방법론은 이미지 및 언어 처리 분야를 포함하여, 공학 및 자연과학을 포함한 여러 분야에서 진보하였다. 최근에는 특히 계산 화학 분야에서 딥러닝 기반으로 연구된 우수한 성과들이 여럿 보고되었다. 그러나 화학적인 계 내에서는 많은 종류의 요소들과 상호작용들이 복잡하게 얽혀있다. 따라서 이러한 요소들을 이용하여 화학 특성을 예측하는 것은 쉽지 않은 일이다. 결과적으로, 전통적인 방법들은 주로 상당한 비용과 시간이 소요되는 엄청난 계산량을 기반으로 하였다. 이러한 한계점을 해결하기 위하여, 본 연구는 딥러닝을 활용한 화학에서의 계산 문제를 연구하였다. 본 연구에서는 특히 분자 구조 표현 데이터를 이용, 분자의 특성을 예측하는 문제들에 집중하였다. 분자 구조는 다양한 원자들이 특정한 배열을 이루고 있는 복합체이며, 분자 특성은 이러한 원자 및 그들의 상호 관계들에 의하여 결정 된다. 따라서, 분자 구조는 화학적 특성을 예측하는 문제에 있어서 필수적인 요소이다. 본 연구에서는 약학, 유기 화학, 양자 화학 등 다양한 분야에서의 화학 특성 예측연구들을 진행하였다. 분자 구조는 시퀀스 혹은 그래프 형태로 표현할 수 있고, 본 연구에서는 두 가지 형태를 모두 활용하여서 진행하였다. 본 연구는 분자 표현이 화학 분야 내의 여러 가지 태스크에 활용 될 수 있으며, 분자 표현에 따른 적절한 딥러닝 모델의 선택이 모델 성능을 크게 높일 수 있음을 보였다.1 Introduction 1 1.1 Motivation 1 1.2 Contents of dissertation 3 2 Background 8 2.1 Deep learning in Chemistry 8 2.2 Deep Learning for molecular property prediction 9 2.3 Approaches for molecular property prediction 12 2.3.1 Sequential modeling for molecular string 12 2.3.2 Structural modeling for molecular graph 15 2.4 Tasks on molecular properties 20 2.4.1 Pharmacological tasks 20 2.4.2 Biophysical and physiological tasks 21 2.4.3 Quantum-mechanical tasks 21 3 Application I. Drug class classification 23 3.1 Introduction 23 3.2 Proposed method 26 3.2.1 Preprocessing 27 3.2.2 Model architecture 27 3.2.3 Training and evaluation 30 3.3 Experimental results 31 3.4 Discussion 37 4 Application II. Biophysical property prediction 39 4.1 Introduction 39 4.2 Proposed method 41 4.2.1 Preprocessing 41 4.2.2 model architecture 42 4.2.3 Training and evaluation 45 4.3 Experimental results 47 4.4 Discussion 53 5 Application III. Quantum-mechanical property prediction 55 5.1 Introduction 55 5.2 Proposed method 57 5.2.1 Preprocessing 59 5.2.2 Model architecture 62 5.2.3 Training and evaluation 67 5.3 Experimental results 69 5.4 Discussion 70 6 Conclusion 74 Bibliography 76 초 록 93박

SNU Open Repository and Archive

Mining protein loops using a structural alphabet and statistical exceptionality

Author: A Dembo
A Efimov
A Golovin
A Sacan
A Via
AC Camproux
AC Camproux
AC Camproux
Anne-Claude Camproux
AR Panchenko
AR Panchenko
B Oliva
BJ Polacco
BL Sibanda
BL Sibanda
BL Sibanda
BW Matthews
C Kiss
CG Hunter
CM Venkatachalam
D Leader
D Stuart
DF Burke
E Rocha
EG Hutchinson
EJ Milner-White
EJ Milner-White
F den Hollander
G Ausiello
G Ausiello
G Nuel
G Nuel
G Nuel
G Pugalenthi
GD Rose
Gregory Nuel
J Espadaler
J Martin
J Martin
J van Helden
J Wojcik
JF Leszczynski
JM Kwasigroch
JS Fetrow
JS Richardson
Juliette Martin
JW Sammon
JW Torrance
KC Chou
L Regad
LE Donate
Leslie Regad
LN Johnson
LR Rabiner
LS Bernstein
M Hollander
M Mönnigmann
M Saraste
MY Leung
N Colloc'h
N Fernandez-Fuentes
N Fernandez-Fuentes
O Sander
P Fuchs
PA Rice
PN Lewis
R Kolodny
S Karlin
S Kim
S Kullback
S Sourice
SA Benner
SA Benner
SD Rufino
V Pavone
W Kabsch
W Li
W Li
WL DeLano
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. Conclusions We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at <url>http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals