Search CORE

9,475 research outputs found

Coarse-grained Classification of Web Sites by Their Structural Properties

Author: Lindemann Christoph
Littig Lars
Publication venue
Publication date: 28/01/2019
Field of study

In this paper, we identify and analyze structural properties which reflect the functionality of a Web site. These structural properties consider the size, the organization, the composition of URLs, and the link structure of Web sites. Opposed to previous work, we perform a comprehensive measurement study to delve into the relation between the structure and the functionality of Web sites. Our study focuses on five of the most relevant functional classes, namely Academic, Blog, Corporate, Personal, and Shop. It is based upon more than 1,400 Web sites composed of 7 million crawled and 47 million known Web pages. We present a detailed statistical analysis which provides insight into how structural properties can be used to distinguish between Web sites from different functional classes. Building on these results, we introduce a content-independent approach for the automated coarse-grained classification of Web sites. A naïve Bayesian classifier with advanced density estimation yields a precision of 82% and recall of 80% for the classification of Web sites into the considered classes

Qucosa - Publikationsserver der Universität Leipzig

Ab initio RNA folding

Author: Cragnolini Tristan
Derreumaux Philippe
Pasquali Samuela
Publication venue: 'IOP Publishing'
Publication date: 30/12/2014
Field of study

RNA molecules are essential cellular machines performing a wide variety of functions for which a specific three-dimensional structure is required. Over the last several years, experimental determination of RNA structures through X-ray crystallography and NMR seems to have reached a plateau in the number of structures resolved each year, but as more and more RNA sequences are being discovered, need for structure prediction tools to complement experimental data is strong. Theoretical approaches to RNA folding have been developed since the late nineties when the first algorithms for secondary structure prediction appeared. Over the last 10 years a number of prediction methods for 3D structures have been developed, first based on bioinformatics and data-mining, and more recently based on a coarse-grained physical representation of the systems. In this review we are going to present the challenges of RNA structure prediction and the main ideas behind bioinformatic approaches and physics-based approaches. We will focus on the description of the more recent physics-based phenomenological models and on how they are built to include the specificity of the interactions of RNA bases, whose role is critical in folding. Through examples from different models, we will point out the strengths of physics-based approaches, which are able not only to predict equilibrium structures, but also to investigate dynamical and thermodynamical behavior, and the open challenges to include more key interactions ruling RNA folding.Comment: 28 pages, 18 figure

arXiv.org e-Print Archive

Hal-Diderot

A literature survey of methods for analysis of subjective language

Author: Täckström Oscar
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/2009
Field of study

Subjective language is used to express attitudes and opinions towards things, ideas and people. While content and topic centred natural language processing is now part of everyday life, analysis of subjective aspects of natural language have until recently been largely neglected by the research community. The explosive growth of personal blogs, consumer opinion sites and social network applications in the last years, have however created increased interest in subjective language analysis. This paper provides an overview of recent research conducted in the area

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Eliciting Expertise

Author: Shadbolt N R
Publication venue: Taylor & Francis Ltd
Publication date: 04/04/2005
Field of study

Since the last edition of this book there have been rapid developments in the use and exploitation of formally elicited knowledge. Previously, (Shadbolt and Burton, 1995) the emphasis was on eliciting knowledge for the purpose of building expert or knowledge-based systems. These systems are computer programs intended to solve real-world problems, achieving the same level of accuracy as human experts. Knowledge engineering is the discipline that has evolved to support the whole process of specifying, developing and deploying knowledge-based systems (Schreiber et al., 2000) This chapter will discuss the problem of knowledge elicitation for knowledge intensive systems in general

Southampton (e-Prints Soton)

A Survey of Location Prediction on Twitter

Author: Han Jialong
Sun Aixin
Zheng Xin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

The MemProtMD database : a resource for membrane-embedded protein structures and their lipid interactions

Author: Newport Thomas D.
Sansom Mark S. P.
Stansfeld Phillip J.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

Integral membrane proteins fulfil important roles in many crucial biological processes, including cell signalling, molecular transport and bioenergetic processes. Advancements in experimental techniques are revealing high resolution structures for an increasing number of membrane proteins. Yet, these structures are rarely resolved in complex with membrane lipids. In 2015, the MemProtMD pipeline was developed to allow the automated lipid bilayer assembly around new membrane protein structures, released from the Protein Data Bank (PDB). To make these data available to the scientific community, a web database (http://memprotmd.bioch.ox.ac.uk) has been developed. Simulations and the results of subsequent analysis can be viewed using a web browser, including interactive 3D visualizations of the assembled bilayer and 2D visualizations of lipid contact data and membrane protein topology. In addition, ensemble analyses are performed to detail conserved lipid interaction information across proteins, families and for the entire database of 3506 PDB entries. Proteins may be searched using keywords, PDB or Uniprot identifier, or browsed using classification systems, such as Pfam, Gene Ontology annotation, mpstruc or the Transporter Classification Database. All files required to run further molecular simulations of proteins in the database are provided

Warwick Research Archives Portal Repository

Oxford University Research Archive

Unstable Slope Management Program

Author: Calvin Peter
Darrow Margaret M.
Huang Scott L.
Publication venue: Alaska University Transportation Center, Alaska Department of Transportation and Public Facilities
Publication date: 01/01/2009
Field of study

INE/AUTC 11.1

ScholarWorks@UA