785 research outputs found

    Fuzzy document classification using ontology based approach for term weighting

    Get PDF
    With the surge in web corpus, document classification is a vital issue in information retrieval. Term weighting increases the accuracy of classification for documents represented in the vector space model. This paper proposes an ontoTf-idf term weighting method based on the assessment of semantic similarity between the group label and the term. In this paper, a comparative analysis of the performance of the traditional Term Frequency-Inverse Document Frequency (Tf-idf) method and ontoTf-idf method is carried on the WebKB and Reuters-21578 benchmark datasets. The efficiency of ontoTf-idf method is validated with kNN (k nearest neighbor) and Fuzzy kNN classifier on the WebKB and Reuters-21578 datasets. The experimental results obtained with the proposed ontoTf-idf method outperform the Tf-idf method. In the proposed work, distance metrics like Euclidean distance, Cosine similarity, Manhattan distance, and Jaccard co-efficient are applied with Fuzzy kNN classifier on the WebKB and Reuters-21578 dataset

    Towards Plasmon-Band Engineering in Ordered Plasmonic Nanostructures

    Get PDF
    In this thesis, the hybridization of localized surface plasmons to generate continuous plasmonic excitation bands was investigated. Localized surface plasmons are quasi-particles corresponding to collective oscillations of charge carriers (for example, conduction electrons in metals). They arise at interfaces between nanoparticles and their surroundings if the signs of the dielectric functions of the facing materials are opposite. If the spatial extension of the plasmon is confined to the nanoparticle, standing plasmon waves (so-called plasmon modes) with localized and amplified electromagnetic fields emerge. The field enhancement is confined to the boundaries of the nanoparticle, with material-dependent evanescent damping (decrease to 1/e in the range of several 10 nm). The penetration of the fields into the environment allows interaction of plasmon modes of different individual nanoparticles arranged in assemblies with interparticle distances of several nanometers. This interaction can lead to hybridization, i.e., spectral splitting of the coupled plasmons into bonding and anti-bonding modes, which is the basis for the emergence of continuous plasmonic excitation bands. In analogy to electronic band structures, which arise by multiple hybridization of atomic orbitals in periodic lattices, plasmonic band structures can be created in large periodic arrays of plasmonic nanoparticles. Also, tuning of the bands to generate desired band properties is possible in principle by periodically varying either the individual building blocks of the arrangement (shape, size, material) or the coupling strength (distance, dielectric spacer). In the present dissertation, the spectral and spatial characteristics (excitation energy, localization, etc.) of different arrangements of plasmonic nanostructures and their plasmonic response were investigated. Using a focused electron beam (probe) in a conventional transmission electron microscope, the plasmons were excited by the evanescent electromagnetic fields of the fast beam electrons (around half of the speed of light). By analyzing the energy loss of the beam electrons (which is caused by plasmon excitation) at different probe positions on the sample, the plasmons were characterized in terms of excitation energies and spatial localization. In addition, the measured data were supported by numerical simulations to verify the experiments. To gain a theoretical understanding, appropriate models were adapted to the present experiments. For example, the classical Mie theory (which describes the plasmonic response of a sphere to transverse electromagnetic waves) was generalized to the inhomogeneous case corresponding to the plasmon excitation by the evanescent field of the beam electrons. Furthermore, surface effects (the so-called axion mixing of magnetic and electric field components), for example, present in topological insulators, were taken into account in the generalization of the Mie theory. As a first step towards plasmon band engineering, the plasmonic response of gold nanoparticles of different shapes was studied to get a comprehensive understanding of the plasmonic behavior of isolated single nanoparticles, which can later be arranged into coupled plasmonic nanostructures. In the next step, gold nanospheres were arranged into chains of different lengths to observe the formation of plasmonic band structures. By examining the hybridization as a function of chain length, the formation of a quasi-continuous plasmonic band with strong dispersion was observed. To create more complex band structures with band gaps or crossings, gold and silver nanospheres were assembled to heterogeneous chains. Focusing on plasmon hybridization in coupled nanoparticles of different kinds, all possible permutations of four coupled gold and silver nanospheres were analyzed. Considering first pure gold and silver tetra chains, similar hybridized plasmon modes, differing only in a spectral red shift in the case of gold were observed. The mixed chains also show similar hybridized modes with intermediate spectral positions depending on the number of gold and silver spheres in the chains, which proves hybridization in heterogeneous arrangements. In addition, it was found that in particular silver nanoparticles degrade in air, resulting in a bad and undefined plasmonic response. The latter hampers the use of silver for plasmonic band engineering, although it has relatively low dielectric loss. To address the degradation and to deliberately tune the distance between the coupled nanoparticles, the use of a silicon dioxide shell as a dielectric spacer and protection layer was elicited. Silver nanocubes were encapsulated in silica shells of various controllable thicknesses and investigated in terms of the plasmonic properties. It was found that the coating significantly reduces both degradation and influence of the substrate, resulting in a highly predictable and reproducible plasmonic response. The dielectric silica shell can additionally sustain Mie type resonances, which may couple to plasmons and thus mediate effective plasmonic coupling over relatively large distances (about a factor of two compared to the coupling of uncoated nanoparticles). In contrast to the delocalized quasi-continuous plasmon bands in periodic nanostructures, localized (spectral and spatial) plasmonic modes can occur in disordered geometries. This effect can hamper the formation of plasmonic bands if the plasmons localize at imperfections (shape, size, or distance deviation) of the coupled nanoparticles. Related to this, the effect of plasmon localization in randomly disordered 2-dimensional gold webs was studied. Stronger localization with increasing plasmon excitation energy was found here. Finally, a geometry-dependent spectral threshold of vanishing localized plasmon modes was observed. In summary, several fundamental aspects of plasmonic band engineering were investigated, providing a basis for the specific design of plasmonic nanostructures with desired properties.:Abstract Acronyms List of Symbols List of Figures List of Tables Contents 1 Introduction 1.1 Synthesis of Plasmonic Systems 1.2 State of the Art 1.3 Outline 2 Theory 2.1 Surface Plasmons at Planar Interfaces 2.2 Modeling Dielectric Functions - the Drude Model 2.3 Axion Electrodynamics of Topological Insulators 2.4 Surface Plasmons at Spherical Geometries - Mie Theory and Generalization to Topological Insulators 2.4.1 Vector Spherical Harmonic Expansion 2.4.2 Axion Boundary Conditions 2.4.3 Homogeneous Case 2.4.4 Inhomogeneous Case 2.5 Complex Geometries and Coupled Nanoparticles 2.5.1 Plasmon Mode Hybridization 2.5.2 Numerical Solvers 2.5.3 Discrete Dipole Model 2.6 Plasmon Mode Classification 2.7 Plasmonics in the Transmission Electron Microscope 2.7.1 Electron Energy-Loss Probability 3 Methods 3.1 Experimental Setup 3.1.1 Energy Filter 3.1.2 Spectroscopy Mode - Direct Imaging of the Energy-Dispersive Plane 3.1.3 Imaging Mode - Energy-filtered Imaging of the Filter Entrance Plane 3.1.4 Alternative Modes 3.1.5 High-Angle Annular Dark Field-Detector 3.2 Data Post-Processing 3.2.1 Zero-Loss Peak Subtraction and Deconvolution 3.2.2 Correction of the Scattering Absorption 3.2.3 Enhancement of the Signal-to-Noise Ratio 3.3 Uncertainties of the Measurement 3.4 Plasma Cleaning of the Sample 4 Results 4.1 Interplay of the Nanoparticle’s Shape and Plasmonic Response 4.2 Self-Assembly of Spherical Nanoparticles to Homogeneous Chains 4.3 Self-Assembly of Spherical Nanoparticles to Heterogeneous Chains 4.4 Silica Encapsulation of Air Sensitive Nanoparticles 4.5 Localization of Surface Plasmon Modes in Disordered 2-Dimensional Webs 5 Summary and Outlook 5.1 Summary 5.2 Outlook 5.2.1 Measurement of the Plasmon Band Dispersion 5.2.2 Generalization of Anderson Localization to Plasmons 5.2.3 Measurement of the Axion Contribution in TIs 5.2.4 Non-Local Measurements Bibliography List of Publications Danksagung Erklärun

    PREDICTING THE UNKNOWN: MACHINE LEARNING TECHNIQUES FOR VIDEO FINGERPRINTING ATTACKS OVER TOR

    Get PDF
    In recent years, anonymization services such as Tor have become a popular resource for terrorist organizations and violent extremist groups. These adversaries use Tor to access the Dark Web to distribute video media as a way to recruit, train, and incite violence and acts of terrorism worldwide. This research strives to address this issue by examining and analyzing the use and development of video fingerprinting attacks using deep learning models. These high-performing deep learning models are called Deep Fingerprinting, which is used to predict video patterns with high accuracy in a closed-world setting. We pose ourselves as the adversary by passively observing raw network traffic as a user downloads a short video from YouTube. Based on traffic patterns, we can deduce what video the user was streaming with higher accuracy than previously obtained. In addition, our results include identifying the genre of the video. Our results suggest that an adversary may predict the video a user downloads over Tor with up to 83% accuracy, even when the user applies additional defenses to protect online privacy. By comparing different Deep Fingerprinting models with one another, we can better understand which models perform better from both the attacker and user’s perspective.Lieutenant, United States NavyApproved for public release. Distribution is unlimited

    Semantic feature reduction and hybrid feature selection for clustering of Arabic Web pages

    Get PDF
    In the literature, high-dimensional data reduces the efficiency of clustering algorithms. Clustering the Arabic text is challenging because semantics of the text involves deep semantic processing. To overcome the problems, the feature selection and reduction methods have become essential to select and identify the appropriate features in reducing high-dimensional space. There is a need to develop a suitable design for feature selection and reduction methods that would result in a more relevant, meaningful and reduced representation of the Arabic texts to ease the clustering process. The research developed three different methods for analyzing the features of the Arabic Web text. The first method is based on hybrid feature selection that selects the informative term representation within the Arabic Web pages. It incorporates three different feature selection methods known as Chi-square, Mutual Information and Term Frequency–Inverse Document Frequency to build a hybrid model. The second method is a latent document vectorization method used to represent the documents as the probability distribution in the vector space. It overcomes the problems of high-dimension by reducing the dimensional space. To extract the best features, two document vectorizer methods have been implemented, known as the Bayesian vectorizer and semantic vectorizer. The third method is an Arabic semantic feature analysis used to improve the capability of the Arabic Web analysis. It ensures a good design for the clustering method to optimize clustering ability when analysing these Web pages. This is done by overcoming the problems of term representation, semantic modeling and dimensional reduction. Different experiments were carried out with k-means clustering on two different data sets. The methods provided solutions to reduce high-dimensional data and identify the semantic features shared between similar Arabic Web pages that are grouped together in one cluster. These pages were clustered according to the semantic similarities between them whereby they have a small Davies–Bouldin index and high accuracy. This study contributed to research in clustering algorithm by developing three methods to identify the most relevant features of the Arabic Web pages

    Development and Improvement of Tools and Algorithms for the Problem of Atom Type Perception and for the Assessment of Protein-Ligand-Complex Geometries

    Get PDF
    In context of the present work, a scoring function for protein-ligand complexes has been developed, not aimed at affinity prediction, but rather a good recognition rate of near native geometries. The developed program DSX makes use of the same formalism as the knowledge-based scoring function DrugScore, hence using the knowledge from crystallographic databases and atom-type specific distance-dependent distribution functions. It is based on newly defined atom-types. Additionally, the program is augmented by two novel potentials which evaluate the torsion angles and (de-)solvation effects. Validation of DSX is based on a literature-known, comprehensive data-set that allows for comparison with other popular scoring functions. DSX is intended for the recognition of near-native binding modes. In this important task, DSX outperforms the competitors, but is also among the best scoring functions regarding the ranking of different compounds. Another essential step in the development of DSX was the automatical assignment of the new atom types. A powerful programming framework was implemented to fulfill this task. Validation was done on a literature-known data-set and showed superior efficiency and quality compared to similar programs where this data was available. The front-end fconv was developed to share this functionality with the scientific community. Multiple features useful in computational drug-design workflows are also included and fconv was made freely available as Open Source Project. Based on the developed potentials for DSX, a number of further applications was created and impemented: The program HotspotsX calculates favorable interaction fields in protein binding pockets that can be used as a starting point for pharmacophoric models and that indicate possible directions for the optimization of lead structures. The program DSFP calculates scores based on fingerprints for given binding geometries. These fingerprints are compared with reference fingerprints that are derived from DSX interactions in known crystal structures of the particular target. Finally, the program DSX_wat was developed to predict stable water networks within a binding pocket. DSX interaction fields are used to calculate the putative water positions

    Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports

    Get PDF
    With the rapid development of the internet technology, a large amount of internet text data can be obtained. The text classification (TC) technology plays a very important role in processing massive text data, but the accuracy of classification is directly affected by the performance of term weighting in TC. Due to the original design of information retrieval (IR), term frequency-inverse document frequency (TF-IDF) is not effective enough for TC, especially for processing text data with unbalanced distributions in internet media reports. Therefore, the variance between the DF value of a particular term and the average of all DFs , namely, the document frequency variance (ADF), is proposed to enhance the ability in processing text data with unbalanced distribution. Then, the normal TF-IDF is modified by the proposed ADF for processing unbalanced text collection in four different ways, namely, TF-IADF, TF-IADF+, TF-IADFnorm, and TF-IADF+norm. As a result, an effective model can be established for the TC task of internet media reports. A series of simulations have been carried out to evaluate the performance of the proposed methods. Compared with TF-IDF on state-of-the-art classification algorithms, the effectiveness and feasibility of the proposed methods are confirmed by simulation results

    Machine learning algorithms and techniques for sentiment analysis in scientific paper reviews: a systematic literature review

    Get PDF
    Sentiment analysis also referred to as opinion mining, is an automated process for identifying and classifying subjective information such as sentiments from a piece of text usually comments and reviews. Supported by machine learning algorithms, it is possible to identify positive, neutral or negative opinions, being possible to rank or classify them in order to reach some kind of conclusion or obtain any type of information. Thus, this paper aims to perform a systematic literature review in order to report the state-of-the-art of machine learning techniques for sentiment analysis applied to texts of reviews, comments and evaluations of scientific papers.This work has been supported by IViSSEM: POCI-01-0145-FEDER-28284, COMPETE: POCI-01- 0145-FEDER-007043 and FCT - Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/2013

    Social and content hybrid image recommender system for mobile social networks

    Get PDF
    One of the advantages of social networks is the possibility to socialize and personalize the content created or shared by the users. In mobile social networks, where the devices have limited capabilities in terms of screen size and computing power, Multimedia Recommender Systems help to present the most relevant content to the users, depending on their tastes, relationships and profile. Previous recommender systems are not able to cope with the uncertainty of automated tagging and are knowledge domain dependant. In addition, the instantiation of a recommender in this domain should cope with problems arising from the collaborative filtering inherent nature (cold start, banana problem, large number of users to run, etc.). The solution presented in this paper addresses the abovementioned problems by proposing a hybrid image recommender system, which combines collaborative filtering (social techniques) with content-based techniques, leaving the user the liberty to give these processes a personal weight. It takes into account aesthetics and the formal characteristics of the images to overcome the problems of current techniques, improving the performance of existing systems to create a mobile social networks recommender with a high degree of adaptation to any kind of user

    Algorithmic Techniques in Gene Expression Processing. From Imputation to Visualization

    Get PDF
    The amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments. First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing value imputation. Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA). Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.Siirretty Doriast
    corecore