17 research outputs found

    A Novel Approach in Feature Selection Method for Text Document Classification

    Get PDF
    In this paper, a novel approach is proposed for extract eminence features for classifier. Instead of traditional feature selection techniques used for text document classification. We introduce a new model based on probability and over all class frequency of term. We applied this new technique to extract features from training text documents to generate training set for machine learning. Using these machine learning training set to automatic classify documents into corresponding class labels and improve the classification accuracy. The results on these proposed feature selection method illustrates that the proposed method performs much better than traditional methods. DOI: 10.17762/ijritcc2321-8169.15075

    Chunking with Max-Margin Markov Networks

    Get PDF
    PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

    Bayesian networks : a better than frequentist approach for parametrization, and a more accurate structural complexity measure than the number of parameters

    Get PDF
    We propose and justify a better-than-frequentist approach for bayesian network parametrization, and propose a structural entropy term that more precisely quantifies the complexity of a BN than the number of parameters. Algorithms for BN learning are deduced

    Active Learning of Continuous-time Bayesian Networks through Interventions

    Full text link
    We consider the problem of learning structures and parameters of Continuous-time Bayesian Networks (CTBNs) from time-course data under minimal experimental resources. In practice, the cost of generating experimental data poses a bottleneck, especially in the natural and social sciences. A popular approach to overcome this is Bayesian optimal experimental design (BOED). However, BOED becomes infeasible in high-dimensional settings, as it involves integration over all possible experimental outcomes. We propose a novel criterion for experimental design based on a variational approximation of the expected information gain. We show that for CTBNs, a semi-analytical expression for this criterion can be calculated for structure and parameter learning. By doing so, we can replace sampling over experimental outcomes by solving the CTBNs master-equation, for which scalable approximations exist. This alleviates the computational burden of sampling possible experimental outcomes in high-dimensions. We employ this framework in order to recommend interventional sequences. In this context, we extend the CTBN model to conditional CTBNs in order to incorporate interventions. We demonstrate the performance of our criterion on synthetic and real-world data.Comment: Accepted at ICML202

    B-splines in EMD and Graph Theory in Pattern Recognition

    Get PDF
    With the development of science and technology, a large amount of data is waiting for further scientific exploration. We can always build up some good mathematical models based on the given data to analyze and solve the real life problems. In this work, we propose three types of mathematical models for different applications.;In chapter 1, we use Bspline based EMD to analysis nonlinear and no-stationary signal data. A new idea about the boundary extension is introduced and applied to the Empirical Mode Decomposition(EMD) algorithm. Instead of the traditional mirror extension on the boundary, we propose a ratio extension on the boundary.;In chapter 2 we propose a weighted directed multigraph for text pattern recognition. We set up a weighted directed multigraph model using the distances between the keywords as the weights of arcs. We then developed a keyword-frequency-distance-based algorithm which not only utilizes the frequency information of keywords but also their ordering information.;In chapter 3, we propose a centrality guided clustering method. Different from traditional methods which choose a center of a cluster randomly, we start clustering from a LEADER - a vertex with highest centrality score, and a new member is added into an existing community if the new vertex meet some criteria and the new community with the new vertex maintain a certain density.;In chapter 4, we define a new graph optimization problem which is called postman tour with minimum route-pair cost. And we model the DNA sequence assembly problem as the postman tour with minimum route-pair cost problem
    corecore