48,741 research outputs found

    Using online linear classifiers to filter spam Emails

    Get PDF
    The performance of two online linear classifiers - the Perceptron and Littlestone’s Winnow – is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard Naïve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering

    The other side of the social web: A taxonomy for social information access

    Get PDF
    The power of the modern Web, which is frequently called the Social Web or Web 2.0, is frequently traced to the power of users as contributors of various kinds of contents through Wikis, blogs, and resource sharing sites. However, the community power impacts not only the production of Web content, but also the access to all kinds of Web content. A number of research groups worldwide explore what we call social information access techniques that help users get to the right information using "collective wisdom" distilled from actions of those who worked with this information earlier. This invited talk offers a brief introduction into this important research stream and reviews recent works on social information access performed at the University of Pittsburgh's PAWS Lab lead by the author. Copyright © 2012 by the Association for Computing Machinery, Inc. (ACM)

    Hybrid methods based on empirical mode decomposition for non-invasive fetal heart rate monitoring

    Get PDF
    This study focuses on fetal electrocardiogram (fECG) processing using hybrid methods that combine two or more individual methods. Combinations of independent component analysis (ICA), wavelet transform (WT), recursive least squares (RLS), and empirical mode decomposition (EMD) were used to create the individual hybrid methods. Following four hybrid methods were compared and evaluated in this study: ICA-EMD, ICA-EMD-WT, EMD-WT, and ICA-RLS-EMD. The methods were tested on two databases, the ADFECGDB database and the PhysioNet Challenge 2013 database. Extraction evaluation is based on fetal heart rate (fHR) determination. Statistical evaluation is based on determination of correct detection (ACC), sensitivity (Se), positive predictive value (PPV), and harmonic mean between Se and PPV (F1). In this study, the best results were achieved by means of the ICA-RLS-EMD hybrid method, which achieved accuracy(ACC) > 80% at 9 out of 12 recordings when tested on the ADFECGDB database, reaching an average value of ACC > 84%, Se > 87%, PPV > 92%, and F1 > 90%. When tested on the Physionet Challenge 2013 database, ACC > 80% was achieved at 12 out of 25 recordings with an average value of ACC > 64%, Se > 69%, PPV > 79%, and F1 > 72%.Web of Science8512185120

    The extraction of the new components from electrogastrogram (EGG), using both adaptive filtering and electrocardiographic (ECG) derived respiration signal

    Get PDF
    Electrogastrographic examination (EGG) is a noninvasive method for an investigation of a stomach slow wave propagation. The typical range of frequency for EGG signal is from 0.015 to 0.15 Hz or (0.015–0.3 Hz) and the signal usually is captured with sampling frequency not exceeding 4 Hz. In this paper a new approach of method for recording the EGG signals with high sampling frequency (200 Hz) is proposed. High sampling frequency allows collection of signal, which includes not only EGG component but also signal from other organs of the digestive system such as the duodenum, colon as well as signal connected with respiratory movements and finally electrocardiographic signal (ECG). The presented method allows improve the quality of analysis of EGG signals by better suppress respiratory disturbance and extract new components from high sampling electrogastrographic signals (HSEGG) obtained from abdomen surface. The source of the required new signal components can be inner organs such as the duodenum and colon. One of the main problems that appear during analysis the EGG signals and extracting signal components from inner organs is how to suppress the respiratory components. In this work an adaptive filtering method that requires a reference signal is proposed.Electrogastrographic examination (EGG) is a noninvasive method for an investigation of a stomach slow wave propagation. The typical range of frequency for EGG signal is from 0.015 to 0.15 Hz or (0.015–0.3 Hz) and the signal usually is captured with sampling frequency not exceeding 4 Hz. In this paper a new approach of method for recording the EGG signals with high sampling frequency (200 Hz) is proposed. High sampling frequency allows collection of signal, which includes not only EGG component but also signal from other organs of the digestive system such as the duodenum, colon as well as signal connected with respiratory movements and finally electrocardiographic signal (ECG). The presented method allows improve the quality of analysis of EGG signals by better suppress respiratory disturbance and extract new components from high sampling electrogastrographic signals (HSEGG) obtained from abdomen surface. The source of the required new signal components can be inner organs such as the duodenum and colon. One of the main problems that appear during analysis the EGG signals and extracting signal components from inner organs is how to suppress the respiratory components. In this work an adaptive filtering method that requires a reference signal is proposed

    Particle-filtering approaches for nonlinear Bayesian decoding of neuronal spike trains

    Full text link
    The number of neurons that can be simultaneously recorded doubles every seven years. This ever increasing number of recorded neurons opens up the possibility to address new questions and extract higher dimensional stimuli from the recordings. Modeling neural spike trains as point processes, this task of extracting dynamical signals from spike trains is commonly set in the context of nonlinear filtering theory. Particle filter methods relying on importance weights are generic algorithms that solve the filtering task numerically, but exhibit a serious drawback when the problem dimensionality is high: they are known to suffer from the 'curse of dimensionality' (COD), i.e. the number of particles required for a certain performance scales exponentially with the observable dimensions. Here, we first briefly review the theory on filtering with point process observations in continuous time. Based on this theory, we investigate both analytically and numerically the reason for the COD of weighted particle filtering approaches: Similarly to particle filtering with continuous-time observations, the COD with point-process observations is due to the decay of effective number of particles, an effect that is stronger when the number of observable dimensions increases. Given the success of unweighted particle filtering approaches in overcoming the COD for continuous- time observations, we introduce an unweighted particle filter for point-process observations, the spike-based Neural Particle Filter (sNPF), and show that it exhibits a similar favorable scaling as the number of dimensions grows. Further, we derive rules for the parameters of the sNPF from a maximum likelihood approach learning. We finally employ a simple decoding task to illustrate the capabilities of the sNPF and to highlight one possible future application of our inference and learning algorithm

    FARS: Fuzzy Ant based Recommender System for Web Users

    Get PDF
    Recommender systems are useful tools which provide an adaptive web environment for web users. Nowadays, having a user friendly website is a big challenge in e-commerce technology. In this paper, applying the benefits of both collaborative and content based filtering techniques is proposed by presenting a fuzzy recommender system based on collaborative behavior of ants (FARS). FARS works in two phases: modeling and recommendation. First, user’s behaviors are modeled offline and the results are used in second phase for online recommendation. Fuzzy techniques provide the possibility of capturing uncertainty among user interests and ant based algorithms provides us with optimal solutions. The performance of FARS is evaluated using log files of “Information and Communication Technology Center” of Isfahan municipality in Iran and compared with ant based recommender system (ARS). The results shown are promising and proved that integrating fuzzy Ant approach provides us with more functional and robust recommendations

    Hybrid Profiling in Information Retrieval

    Get PDF
    Abstract-One of the main challenges in search engine quality of service is how to satisfy the needs and the interests of individual users. This raises the fundamental issue of how to identify and select the information that is relevant to a specific user. This concern over generic provision and the lack of search precision have provided the impetus for the research into Web Search personalisation. In this paper a hybrid user profiling system is proposed -a combination of explicit and implicit user profiles for improving the web search effectiveness in terms of precision and recall. The proposed system is content-based and implements the Vector Space Model. Experimental results, supported by significance tests, indicate that the system offers better precision and recall in comparison to traditional search engines
    corecore