608 research outputs found

    Web Page Prediction for Web Personalization: A Review

    Get PDF
    This paper proposes a survey of Web Page Ranking for web personalization. Web page prefetching has been widely used to reduce the access latency problem of the Internet. However, if most prefetched web pages are not visited by the users in their subsequent accesses, the limited network bandwidth and server resources will not be used efficiently and may worsen the access delay problem. Therefore, it is critical that we have an accurate prediction method during prefetching. The technique like Markov models have been widely used to represent and analyze user2018;s navigational behavior (usage data) in the Web graph, using the transitional probabilities between web pages, as recorded in the web logs. The recorded users2018; navigation is used to extract popular web paths and predict current users2018; next steps

    Machine Learning on Large Databases: Transforming Hidden Markov Models to SQL Statements

    Get PDF
    Machine Learning is a research field with substantial relevance for many applications in different areas. Because of technical improvements in sensor technology, its value for real life applications has even increased within the last years. Nowadays, it is possible to gather massive amounts of data at any time with comparatively little costs. While this availability of data could be used to develop complex models, its implementation is often narrowed because of limitations in computing power. In order to overcome performance problems, developers have several options, such as improving their hardware, optimizing their code, or use parallelization techniques like the MapReduce framework. Anyhow, these options might be too cost intensive, not suitable, or even too time expensive to learn and realize. Following the premise that developers usually are not SQL experts we would like to discuss another approach in this paper: using transparent database support for Big Data Analytics. Our aim is to automatically transform Machine Learning algorithms to parallel SQL database systems. In this paper, we especially show how a Hidden Markov Model, given in the analytics language R, can be transformed to a sequence of SQL statements. These SQL statements will be the basis for a (inter-operator and intra-operator) parallel execution on parallel DBMS as a second step of our research, not being part of this paper

    In silico local structure approach: A case study on Outer Membrane Proteins.

    Get PDF
    International audienceThe detection of Outer Membrane Proteins (OMP) in whole genomes is an actual question, their sequence characteristics have thus been intensively studied. This class of protein displays a common beta-barrel architecture, formed by adjacent antiparallel strands. However, due to the lack of available structures, few structural studies have been made on this class of proteins. Here we propose a novel OMP local structure investigation, based on a structural alphabet approach, i.e., the decomposition of 3D structures using a library of four-residue protein fragments. The optimal decomposition of structures using hidden Markov model results in a specific structural alphabet of 20 fragments, six of them dedicated to the decomposition of beta-strands. This optimal alphabet, called SA20-OMP, is analyzed in details, in terms of local structures and transitions between fragments. It highlights a particular and strong organization of beta-strands as series of regular canonical structural fragments. The comparison with alphabets learned on globular structures indicates that the internal organization of OMP structures is more constrained than in globular structures. The analysis of OMP structures using SA20-OMP reveals some recurrent structural patterns. The preferred location of fragments in the distinct regions of the membrane is investigated. The study of pairwise specificity of fragments reveals that some contacts between structural fragments in beta-sheets are clearly favored whereas others are avoided. This contact specificity is stronger in OMP than in globular structures. Moreover, SA20-OMP also captured sequential information. This can be integrated in a scoring function for structural model ranking with very promising results. Proteins 2007. (c) 2007 Wiley-Liss, Inc

    From Sensing to Predictions and Database Technique: A Review of TV White Space Information Acquisition in Cognitive Radio Networks

    Get PDF
    Strategies to acquire white space information is the single most significant functionality in cognitive radio networks (CRNs) and as such, it has gone some evolution to enhance information accuracy. The evolution trends are spectrum sensing, prediction algorithm and recently, geo-location database technique. Previously, spectrum sensing was the main technique for detecting the presence/absence of a primary user (PU) signal in a given radio frequency (RF) spectrum. However, this expectation could not materialized as a result of numerous technical challenges ranging from hardware imperfections to RF signal impairments. To convey the evolutionary trends in the development of white space information, we present a survey of the contemporary advancements in PU detection with emphasis on the practical deployment of CRNs i.e. Television white space (TVWS) networks. It is found that geo-location database is the most reliable technique to acquire TVWS information although, it is financially driven. Finally, using financially driven database model, this study compared the data-rate and spectral efficiency of FCC and Ofcom TV channelization. It was discovered that Ofcom TV channelization outperforms FCC TV channelization as a result of having higher spectrum bandwidth. We proposed the adoption of an all-inclusive TVWS information acquisition model as the future research direction for TVWS information acquisition techniques

    A Critical Review on Population Synthesis for Activity- and Agent-Based Transportation Models

    Get PDF
    Traditional four-step transportation planning models fail to capture novel transportation modes such as car/ridesharing. Hence, agent-based models are replacing those traditional models for their scalability, robustness, and capability of simulating nontraditional transportation modes. A crucial step in developing agent-based models is the definition of agents, e.g., household and persons. While model developers wish to capture typical workday travel patterns of the entire study population of travelers, such detailed data are unavailable due to privacy concerns and technical and financial feasibility issues. Hence, modelers opt for population syntheses based on travel diary surveys, land use data, and census data. The most prominent techniques are iterative proportional fitting (IPF), iterative proportional updating (IPU), combinatorial optimization, Markov-based and fitness-based syntheses, and other emerging approaches. Yet, at present, there is no clear guideline on using any of the available techniques. To bridge this gap, this chapter presents a comprehensive synthesis of practice and documents available successful studies

    Machine Learning on Large Databases: Transforming Hidden Markov Models to SQL Statements

    Get PDF
    Machine Learning is a research field with substantial relevance for many applications in different areas. Because of technical improvements in sensor technology, its value for real life applications has even increased within the last years. Nowadays, it is possible to gather massive amounts of data at any time with comparatively little costs. While this availability of data could be used to develop complex models, its implementation is often narrowed because of limitations in computing power. In order to overcome performance problems, developers have several options, such as improving their hardware, optimizing their code, or use parallelization techniques like the MapReduce framework. Anyhow, these options might be too cost intensive, not suitable, or even too time expensive to learn and realize. Following the premise that developers usually are not SQL experts we would like to discuss another approach in this paper: using transparent database support for Big Data Analytics. Our aim is to automatically transform Machine Learning algorithms to parallel SQL database systems. In this paper, we especially show how a Hidden Markov Model, given in the analytics language R, can be transformed to a sequence of SQL statements. These SQL statements will be the basis for a (inter-operator and intra-operator) parallel execution on parallel DBMS as a second step of our research, not being part of this paper

    Efficient Process Data Warehousing

    Get PDF
    This dissertation presents a data processing architecture for efficient data warehousing from historical data sources. The present work has three primary contributions. The first contribution is the development of a generalized process data warehousing (PDW) architecture that includes multilayer data processing steps to transform raw data streams into useful information that facilitates data-driven decision making. The second contribution is exploring the applicability of the proposed architecture to the case of sparse process data. We have tested the proposed approach in a medical monitoring system, which takes physiological data and predicts the clinical setting in which the data is most likely to be seen. We have performed a set of experiments with real clinical data (from Children’s Hospital of Pittsburgh) that demonstrate the high utility of the present approach. The third contribution is exploring the applicability of the proposed PDW architecture to the case of redundant process data. We have designed and developed a conflict-aware data fusion strategy for the efficient aggregation of historical data. We have elaborated a simulation-based study of the tradeoffs between the data fusion solutions and data accuracy, and have also evaluated the solutions to a large-scale integrated framework (Tycho data) that includes historical data from heterogeneous sources in different subject areas. Finally, we propose and have evaluated a state sequence recovery (SSR) framework, which integrates work from two previous studies, which are both sparse and redundant studies. Our experimental results are based on several algorithms that have been developed and tested in different simulation set-up scenarios under both normal and exponential data distributions
    • …
    corecore