Search CORE

63,411 research outputs found

Web news classification using neural networks based on PCA

Author: Omatu Sigeru
Selamat Ali
Yanagimoto Hidekazu
Publication venue
Publication date: 01/01/2002
Field of study

In this paper, we propose a news web page classification method (WPCM). The WPCM uses a neural network with inputs obtained by both the principal components and class profile-based features (CPBF). The fixed number of regular words from each class will be used as a feature vectors with the reduced features from the PCA. These feature vectors are then used as the input to the neural networks for classification. The experimental evaluation demonstrates that the WPCM provides acceptable classification accuracy with the sports news datasets

Universiti Teknologi Malaysia Institutional Repository

Machine Learning in Automated Text Categorization

Author: ANDROUTSOPOULOS I.
ATTARDI G.
BAKER L.D.
BIEBRICHER P.
CAROPRESO M.F.
CAVNAR W.B.
CHAKRABARTI S.
CLACK C.
CLEVERDON C.
COHEN W. W.
COHEN W. W.
COHEN W.W.
DAGAN I.
DEERWESTER S.
DENOYER L.
DIAZ ESTEBAN A.
DRUCKER H.
DUMAIS S.T.
DUMAIS S.T.
ESCUDERO G.
Fabrizio Sebastiani
FIELD B.
FORSYTH R. S.
FUHR N.
FUHR N.
FUHR N.
FURNKRANZ J.
GALAVOTTI L.
GALE W. A.
GOVERT N.
GRAY W.A.
GUTHRIE L.
HAYES P.J.
HEAPS H.
HERSH W.
HULL D. A.
HULL D. A.
ITTNER D.J.
IWAYAMA M.
IYER R.D.
JOACHIMS T.
JOACHIMS T.
JOACHIMS T.
JOHN G. H.
JUNKER M.
JUNKER M.
KESSLER B.
KIM Y.-H.
KLINKENBERG R.
KNORZ G.
KOLLER D.
LAM S.L.
LAM W.
LAM W.
LANG K.
LARKEY L. S.
LARKEY L. S.
LARKEY L.S.
LEWIS D. D.
LEWIS D. D.
LEWIS D. D.
LEWIS D. D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LI H.
LI Y.H.
LIERE R.
LIM J. H.
MASAND B.
MASAND B.
MCCALLUM A. K.
MCCALLUM A.K.
MLADENIC D.
MLADENIC D.
MOULINIER I.
MOULINIER I.
MYERS K.
NG H.T.
OH H.-J.
PAZIENZA M. T.
RILOFF E.
ROBERTSON S.E.
ROBERTSON S.E.
ROTH D.
RUIZ M.E.
SABLE C.L.
SARACEVIC T.
SCHAPIRE R. E.
SCHUTZE H.
SCHUTZE H.
SCOTT S.
SEBASTIANI F.
SINGHAL A.
SLONIM N.
TAIRA H.
TUMER K.
TZERAS K.
VAN RIJSBERGEN C. J.
WIENER E.D.
YANG Y.
YANG Y.
YANG Y.
YANG Y.
YU K.L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2001
Field of study

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

arXiv.org e-Print Archive

CiteSeerX

Crossref

KACST Arabic Text Classification Project: Overview and Preliminary Results

Author: Al-Rajeh A.
Alharbi S.
Almuhareb A.
Althubaity A.
Khorsheed M.
Publication venue
Publication date: 01/01/2008
Field of study

Electronically formatted Arabic free-texts can be found in abundance these days on the World Wide Web, often linked to commercial enterprises and/or government organizations. Vast tracts of knowledge and relations lie hidden within these texts, knowledge that can be exploited once the correct intelligent tools have been identified and applied. For example, text mining may help with text classification and categorization. Text classification aims to automatically assign text to a predefined category based on identifiable linguistic features. Such a process has different useful applications including, but not restricted to, E-Mail spam detection, web pages content filtering, and automatic message routing. In this paper an overview of King Abdulaziz City for Science and Technology (KACST) Arabic Text Classification Project will be illustrated along with some preliminary results. This project will contribute to the better understanding and elaboration of Arabic text classification techniques

Southampton (e-Prints Soton)

An Intelligent System For Arabic Text Categorization

Author: Fayed Z.T.
Habib M.B.
Syiam M.M.
Publication venue: Faculty of Computers and Information Sciences, Ain Shams University
Publication date: 01/01/2006
Field of study

Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. In this paper, an intelligent Arabic text categorization system is presented. Machine learning algorithms are used in this system. Many algorithms for stemming and feature selection are tried. Moreover, the document is represented using several term weighting schemes and finally the k-nearest neighbor and Rocchio classifiers are used for classification process. Experiments are performed over self collected data corpus and the results show that the suggested hybrid method of statistical and light stemmers is the most suitable stemming algorithm for Arabic language. The results also show that a hybrid approach of document frequency and information gain is the preferable feature selection criterion and normalized-tfidf is the best weighting scheme. Finally, Rocchio classifier has the advantage over k-nearest neighbor classifier in the classification process. The experimental results illustrate that the proposed model is an efficient method and gives generalization accuracy of about 98%

Maastricht University Research Portal

University of Twente Research Information

Recommended from our members

Functional Effects of let-7g Expression in Colon Cancer Metastasis.

Author: Chang Che-Mai
Chang Wei-Chiao
Chen Ben-Kuen
Chiu Siou-Jin
Hsu Wen-Li
Huang Chien-Yu
Maio Zhi-Feng
Tsai Yao-Ting
Wan Yu-Jui Yvonne
Wang Jaw-Yuan
Wong Henry Sung-Ching
Publication venue: eScholarship, University of California
Publication date: 01/04/2019
Field of study

MicroRNA regulation is crucial for gene expression and cell functions. It has been linked to tumorigenesis, development and metastasis in colorectal cancer (CRC). Recently, the let-7 family has been identified as a tumor suppressor in different types of cancers. However, the function of the let-7 family in CRC metastasis has not been fully investigated. Here, we focused on analyzing the role of let-7g in CRC. The Cancer Genome Atlas (TCGA) genomic datasets of CRC and detailed data from a Taiwanese CRC cohort were applied to study the expression pattern of let-7g. In addition, in vitro as well as in vivo studies have been performed to uncover the effects of let-7g on CRC. We found that the expression of let-7g was significantly lower in CRC specimens. Our results further supported the inhibitory effects of let-7g on CRC cell migration, invasion and extracellular calcium influx through store-operated calcium channels. We report a critical role for let-7g in the pathogenesis of CRC and suggest let-7g as a potential therapeutic target for CRC treatment

eScholarship - University of California

Role based behavior analysis

Author: Ramalho Ricardo Gonçalves
Publication venue
Publication date: 01/01/2009
Field of study

Tese de mestrado, Segurança Informática, Universidade de Lisboa, Faculdade de Ciências, 2009Nos nossos dias, o sucesso de uma empresa depende da sua agilidade e capacidade de se adaptar a condições que se alteram rapidamente. Dois requisitos para esse sucesso são trabalhadores proactivos e uma infra-estrutura ágil de Tecnologias de Informacão/Sistemas de Informação (TI/SI) que os consiga suportar. No entanto, isto nem sempre sucede. Os requisitos dos utilizadores ao nível da rede podem nao ser completamente conhecidos, o que causa atrasos nas mudanças de local e reorganizações. Além disso, se não houver um conhecimento preciso dos requisitos, a infraestrutura de TI/SI poderá ser utilizada de forma ineficiente, com excessos em algumas áreas e deficiências noutras. Finalmente, incentivar a proactividade não implica acesso completo e sem restrições, uma vez que pode deixar os sistemas vulneráveis a ameaças externas e internas. O objectivo do trabalho descrito nesta tese é desenvolver um sistema que consiga caracterizar o comportamento dos utilizadores do ponto de vista da rede. Propomos uma arquitectura de sistema modular para extrair informação de fluxos de rede etiquetados. O processo é iniciado com a criação de perfis de utilizador a partir da sua informação de fluxos de rede. Depois, perfis com características semelhantes são agrupados automaticamente, originando perfis de grupo. Finalmente, os perfis individuais são comprados com os perfis de grupo, e os que diferem significativamente são marcados como anomalias para análise detalhada posterior. Considerando esta arquitectura, propomos um modelo para descrever o comportamento de rede dos utilizadores e dos grupos. Propomos ainda métodos de visualização que permitem inspeccionar rapidamente toda a informação contida no modelo. O sistema e modelo foram avaliados utilizando um conjunto de dados reais obtidos de um operador de telecomunicações. Os resultados confirmam que os grupos projectam com precisão comportamento semelhante. Além disso, as anomalias foram as esperadas, considerando a população subjacente. Com a informação que este sistema consegue extrair dos dados em bruto, as necessidades de rede dos utilizadores podem sem supridas mais eficazmente, os utilizadores suspeitos são assinalados para posterior análise, conferindo uma vantagem competitiva a qualquer empresa que use este sistema.In our days, the success of a corporation hinges on its agility and ability to adapt to fast changing conditions. Proactive workers and an agile IT/IS infrastructure that can support them is a requirement for this success. Unfortunately, this is not always the case. The user’s network requirements may not be fully understood, which slows down relocation and reorganization. Also, if there is no grasp on the real requirements, the IT/IS infrastructure may not be efficiently used, with waste in some areas and deficiencies in others. Finally, enabling proactivity does not mean full unrestricted access, since this may leave the systems vulnerable to outsider and insider threats. The purpose of the work described on this thesis is to develop a system that can characterize user network behavior. We propose a modular system architecture to extract information from tagged network flows. The system process begins by creating user profiles from their network flows’ information. Then, similar profiles are automatically grouped into clusters, creating role profiles. Finally, the individual profiles are compared against the roles, and the ones that differ significantly are flagged as anomalies for further inspection. Considering this architecture, we propose a model to describe user and role network behavior. We also propose visualization methods to quickly inspect all the information contained in the model. The system and model were evaluated using a real dataset from a large telecommunications operator. The results confirm that the roles accurately map similar behavior. The anomaly results were also expected, considering the underlying population. With the knowledge that the system can extract from the raw data, the users network needs can be better fulfilled, the anomalous users flagged for inspection, giving an edge in agility for any company that uses it

Universidade de Lisboa: Repositório.UL