14,533 research outputs found
Recommended from our members
Hierarchical classification for multiple, distributed web databases
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Our research aims to provide an alternative hierarchical categorization and search capability based on a Bayesian network learning algorithm. Our proposed approach, which is grounded on automatic textual analysis of subject content of online web databases, attempts to address the database selection problem by first classifying web databases into a hierarchy of topic categories. The experimental results reported demonstrate that such a classification approach not only effectively reduces the class search space, but also helps to significantly improve the accuracy of classification performance
Predicting \u27Attention Deficit Hyperactive Disorder\u27 using large scale child data set
Attention deficit hyperactivity disorder (ADHD) is a disorder found in children affecting about 9.5% of American children aged 13 years or more. Every year, the number of children diagnosed with ADHD is increasing. There is no single test that can diagnose ADHD. In fact, a health practitioner has to analyze the behavior of the child to determine if the child has ADHD. He has to gather information about the child, and his/her behavior and environment. Because of all these problems in diagnosis, I propose to use Machine Learning techniques to predict ADHD by using large scale child data set. Machine learning offers a principled approach for developing sophisticated, automatic, and objective algorithms for analysis of disease. Lot of new approaches have immerged which allows to develop understanding and provides opportunity to do advanced analysis. Use of classification model in detection has made significant impacts in the detection and diagnosis of diseases. I propose to use binary classification techniques for detection and diagnosis of ADHD
Machine learning on Web documents
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (leaves 111-115).The Web is a tremendous source of information: so tremendous that it becomes difficult for human beings to select meaningful information without support. We discuss tools that help people deal with web information, by, for example, blocking advertisements, recommending interesting news, and automatically sorting and compiling documents. We adapt and create machine learning algorithms for use with the Web's distinctive structures: large-scale, noisy, varied data with potentially rich, human-oriented features. We adapt two standard classification algorithms, the slow but powerful support vector machine and the fast but inaccurate Naive Bayes, to make them more effective for the Web. The support vector machine, which cannot currently handle the large amount of Web data potentially available, is sped up by "bundling" the classifier inputs to reduce the input size. The Naive Bayes classifier is improved through a series of three techniques aimed at fixing some of the severe, inaccurate assumptions Naive Bayes makes. Classification can also be improved by exploiting the Web's rich, human-oriented structure, including the visual layout of links on a page and the URL of a document. These "tree-shaped features" are placed in a Bayesian mutation model and learning is accomplished with a fast, online learning algorithm for the model. These new methods are applied to a personalized news recommendation tool, "the Daily You." The results of a 176 person user-study of news preferences indicate that the new Web-centric techniques out-perform classifiers that use traditional text algorithms and features. We also show that our methods produce an automated ad-blocker that performs as well as a hand-coded commercial ad-blocker.by Lawrence Kai Shih.Ph.D
ServeNet: A Deep Neural Network for Web Services Classification
Automated service classification plays a crucial role in service discovery,
selection, and composition. Machine learning has been widely used for service
classification in recent years. However, the performance of conventional
machine learning methods highly depends on the quality of manual feature
engineering. In this paper, we present a novel deep neural network to
automatically abstract low-level representation of both service name and
service description to high-level merged features without feature engineering
and the length limitation, and then predict service classification on 50
service categories. To demonstrate the effectiveness of our approach, we
conduct a comprehensive experimental study by comparing 10 machine learning
methods on 10,000 real-world web services. The result shows that the proposed
deep neural network can achieve higher accuracy in classification and more
robust than other machine learning methods.Comment: Accepted by ICWS'2
Cache Hierarchy Inspired Compression: a Novel Architecture for Data Streams
We present an architecture for data streams based on structures typically found in web cache hierarchies. The main idea is to build a meta level analyser from a number of levels constructed over time from a data stream. We present the general architecture for such a system and an application to classification. This architecture is an instance of the general wrapper idea allowing us to reuse standard batch learning algorithms in an inherently incremental learning environment. By artificially generating data sources we demonstrate that a hierarchy containing a mixture of models is able to adapt over time to the source of the data. In these experiments the hierarchies use an elementary performance based replacement policy and unweighted voting for making classification decisions
Abusive Language Detection in Online Conversations by Combining Content-and Graph-based Features
In recent years, online social networks have allowed worldwide users to meet
and discuss. As guarantors of these communities, the administrators of these
platforms must prevent users from adopting inappropriate behaviors. This
verification task, mainly done by humans, is more and more difficult due to the
ever growing amount of messages to check. Methods have been proposed to
automatize this moderation process, mainly by providing approaches based on the
textual content of the exchanged messages. Recent work has also shown that
characteristics derived from the structure of conversations, in the form of
conversational graphs, can help detecting these abusive messages. In this
paper, we propose to take advantage of both sources of information by proposing
fusion methods integrating content-and graph-based features. Our experiments on
raw chat logs show that the content of the messages, but also of their dynamics
within a conversation contain partially complementary information, allowing
performance improvements on an abusive message classification task with a final
F-measure of 93.26%
- âŠ