337,380 research outputs found
The Google Similarity Distance
Words and phrases acquire meaning from the way they are used in society, from
their relative semantics to other words and phrases. For computers the
equivalent of `society' is `database,' and the equivalent of `use' is `way to
search the database.' We present a new theory of similarity between words and
phrases based on information distance and Kolmogorov complexity. To fix
thoughts we use the world-wide-web as database, and Google as search engine.
The method is also applicable to other search engines and databases. This
theory is then applied to construct a method to automatically extract
similarity, the Google similarity distance, of words and phrases from the
world-wide-web using Google page counts. The world-wide-web is the largest
database on earth, and the context information entered by millions of
independent users averages out to provide automatic semantics of useful
quality. We give applications in hierarchical clustering, classification, and
language translation. We give examples to distinguish between colors and
numbers, cluster names of paintings by 17th century Dutch masters and names of
books by English novelists, the ability to understand emergencies, and primes,
and we demonstrate the ability to do a simple automatic English-Spanish
translation. Finally, we use the WordNet database as an objective baseline
against which to judge the performance of our method. We conduct a massive
randomized trial in binary classification using support vector machines to
learn categories based on our Google distance, resulting in an a mean agreement
of 87% with the expert crafted WordNet categories.Comment: 15 pages, 10 figures; changed some text/figures/notation/part of
theorem. Incorporated referees comments. This is the final published version
up to some minor changes in the galley proof
Quantum-inspired algorithm for direct multi-class classification
Over the last few decades, quantum machine learning has emerged as a groundbreaking discipline. Harnessing the peculiarities of quantum computation for machine learning tasks offers promising
advantages. Quantum-inspired machine learning has revealed how relevant benefits for machine learning problems can be obtained using the quantum information theory even without employing
quantum computers. In the recent past, experiments have demonstrated how to design an algorithm for binary classification inspired by the method of quantum state discrimination, which exhibits high performance with respect to several standard classifiers. However, a generalization of this quantuminspired
binary classifier to a multi-class scenario remains nontrivial. Typically, a simple solution in machine learning decomposes multi-class classification into a combinatorial number of binary classifications, with a concomitant increase in computational resources. In this study, we introduce a quantum-inspired classifier that avoids this problem. Inspired by quantum state discrimination, our classifier performs multi-class classification directly without using binary classifiers. We first compared the performance of the quantum-inspired multi-class classifier with eleven standard classifiers. The
comparison revealed an excellent performance of the quantum-inspired classifier. Comparing these results with those obtained using the decomposition in binary classifiers shows that our method
improves the accuracy and reduces the time complexity. Therefore, the quantum-inspired machine learning algorithm proposed in this work is an effective and efficient framework for multi-class classification. Finally, although these advantages can be attained without employing any quantum component in the hardware, we discuss how it is possible to implement the model in quantum hardware
Arts, Computers and Artificial Intelligence
Science and art seem to belong to different cultures. Science and technology, mainly the products of the intellect, use terminology and vocabulary that are concise and well defined. In contrast, in artistic expression, ambiguity is a powerful component. Still the relationship between these two different categories of human activity is interesting and fascinating. In this paper, a general comparison of these two disciplines will be introduced. Then the possibility of mechanical creation of art using computers and artificial intelligence will be discussed. This will be followed by two techniques which are used to create poetry and music. First, a statistical approach for mechanical composition of music will be presented. This method uses parameters of existing music to create similar music. Second, a method of mechanical composition of poetry will be presented which combines linguistic models, a classification dictionary and semantic information
Recommended from our members
High performance latent dirichlet allocation for text mining
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Latent Dirichlet Allocation (LDA), a total probability generative model, is a three-tier Bayesian model. LDA computes the latent topic structure of the data and obtains the significant information of documents. However, traditional LDA has several limitations in practical applications. LDA cannot be directly used in classification because it is a non-supervised learning model. It needs to be embedded into appropriate classification algorithms. LDA is a generative model as it normally generates the latent topics in the categories where the target documents do not belong to, producing the deviation in computation and reducing the classification accuracy. The number of topics in LDA influences the learning process of model parameters greatly. Noise samples in the training data also affect the final text classification result. And, the quality of LDA based classifiers depends on the quality of the training samples to a great extent. Although parallel LDA algorithms are proposed to deal with huge amounts of data, balancing computing loads in a computer cluster poses another challenge. This thesis presents a text classification method which combines the LDA model and Support Vector Machine (SVM) classification algorithm for an improved accuracy in classification when reducing the dimension of datasets. Based on Density-Based Spatial Clustering of Applications with Noise (DBSCAN), the algorithm automatically optimizes the number of topics to be selected which reduces the number of iterations in computation. Furthermore, this thesis presents a noise data reduction scheme to process noise data. When the noise ratio is large in the training data set, the noise reduction scheme can always produce a high level of accuracy in classification. Finally, the thesis parallelizes LDA using the MapReduce model which is the de facto computing standard in supporting data intensive applications. A genetic algorithm based load balancing algorithm is designed to balance the workloads among computers in a heterogeneous MapReduce cluster where the computers have a variety of computing resources in terms of CPU speed, memory space and hard disk space
A NEW MODEL ON BENTHIC FORAMINIFER IMAGE CLASSIFICATION AND DEFINITIONS BASED ON CONVENTIONAL NEURAL NETWORK (CNN)
Fossil studies are of great importance in order to observe the change of living species over the years, to make inferences by using the information provided by the observed species, and to understand the developing and changing structure of the world we live in over the years. However, the examination and interpretation of fossil specimens is a complex and long process. Artificial intelligence studies have begun to be applied to this field in order to facilitate the working methods of paleontologists. The detection and classification of fossil specimens with the aid of computers simplifies this process as much as possible compared to manual classification processes and reduces foreign dependency for fossil assemblages for which paleontologists are not experts. To achieve this, 9 benthic foraminiferal species and non-foraminiferal sample photographs from a selected dataset were used. In this study, a new method developed for the classification of benthic foraminifera using deep convolutional neural networks, reaching higher accuracy than the results in the literature, is presented. With this method, at least 70% accuracy rates were achieved in the test results of the trained system. This study, which reached high accuracy rates with a new method, has created a successful development for the branch of paleontology in the use of artificial intelligence in microfossil identification
ANALISIS POSTUR KERJA KARYAWAN KANTOR PADA PT XZ
Ergonomics is a systematic branch of science to utilize information about human nature, human capabilities and limitations to design effective, safe and comfortable work systems. Ergonomics includes many things related to employee work, one of which is office ergonomics which includes the entire work environment and work tools related to computers, chairs and others. High demands for office employees at PT. XZ requires employees to work for quite a long time where based on existing surveys, it is found that office workers spend more than 75% of their working time sitting in front of the computer. Jobs like this are related to several ergonomic risks felt by employees, so it is necessary to measure the level of ergonomics risk in office employees at PT. XZ. Rapid Office Strain Assessment (ROSA) is a rapid analysis to measure work risks associated with the use of computers where this method of assessment is designed to measure the risk of worker injury and determine the level of change action based on reports of worker discomfort. From the results of the study it was found that complaints on the employee's body using the CMDQ questionnaire showed that 5 employees felt the most complaints in the lower back by 28.5%, the neck 21%, the upper back 18% and finally the hips / buttocks by 12 ,8%. From the analysis of work posture using the ROSA method, the final score of the five employees is the same, namely 5, which means that it is included in the warning level classification so that it is necessary to improve work posture according to the setting procedure for computer work stations, namely by paying attention to chair height, elbow position, monitor surface distance, monitor height, computer surface position, back and forth backrest, telephone distance, wrist angle, and mouse position
Recognition of prokaryotic promoters based on a novel variable-window Z-curve method
Transcription is the first step in gene expression, and it is the step at which most of the regulation of expression occurs. Although sequenced prokaryotic genomes provide a wealth of information, transcriptional regulatory networks are still poorly understood using the available genomic information, largely because accurate prediction of promoters is difficult. To improve promoter recognition performance, a novel variable-window Z-curve method is developed to extract general features of prokaryotic promoters. The features are used for further classification by the partial least squares technique. To verify the prediction performance, the proposed method is applied to predict promoter fragments of two representative prokaryotic model organisms (Escherichia coli and Bacillus subtilis). Depending on the feature extraction and selection power of the proposed method, the promoter prediction accuracies are improved markedly over most existing approaches: for E. coli, the accuracies are 96.05% (σ70 promoters, coding negative samples), 90.44% (σ70 promoters, non-coding negative samples), 92.13% (known sigma-factor promoters, coding negative samples), 92.50% (known sigma-factor promoters, non-coding negative samples), respectively; for B. subtilis, the accuracies are 95.83% (known sigma-factor promoters, coding negative samples) and 99.09% (known sigma-factor promoters, non-coding negative samples). Additionally, being a linear technique, the computational simplicity of the proposed method makes it easy to run in a matter of minutes on ordinary personal computers or even laptops. More importantly, there is no need to optimize parameters, so it is very practical for predicting other species promoters without any prior knowledge or prior information of the statistical properties of the samples
A descriptive review and classification of organizational information security awareness research
Information security awareness (ISA) is a vital component of information security in organizations. The purpose of this research is to descriptively review and classify the current body of knowledge on ISA. A sample of 59 peer-reviewed academic journal articles, which were published over the last decade from 2008 to 2018, were analyzed. Articles were classified using coding techniques from the grounded theory literature-review method. The results show that ISA research is evolving with behavioral research studies still being explored. Quantitative empirical research is the dominant methodology and the top three theories used are general deterrence theory, theory of planned behavior, and protection motivation theory. Future research could focus on qualitative approaches to provide greater depth of ISA understanding
- …