177 research outputs found

    Extracting Interests of Users from Web Log Data Log

    Get PDF
    The knowledge on the cobweb is growing expressively. Without a recommendation theory, the clients may come through lots of instance on the network in finding the knowledge they are stimulated in. Today, many web recommendation theories cannot give clients adequate symbolized help but provide the client with lots of immaterial knowledge. One of the main reasons is that it can't accurately extract users interests. Therefore, analyzing users' Web Log Data and extracting users' potential interested domains become very important and challenging research topics of web usage mining. If users' interests can be automatically detected from users' Web Log Data, they can be used for information recommendation and marketing which are useful for both users and Web site developers. In this paper, some novel algorithms are proposed to mine users' interests. The algorithms are based on visit time and visit density which can be obtained from an analysis of web users' Web Log Data. The experimental results of the proposed methods succeed in finding users interested domains

    Mining a Small Medical Data Set by Integrating the Decision Tree and t-test

    Get PDF
    [[abstract]]Although several researchers have used statistical methods to prove that aspiration followed by the injection of 95% ethanol left in situ (retention) is an effective treatment for ovarian endometriomas, very few discuss the different conditions that could generate different recovery rates for the patients. Therefore, this study adopts the statistical method and decision tree techniques together to analyze the postoperative status of ovarian endometriosis patients under different conditions. Since our collected data set is small, containing only 212 records, we use all of these data as the training data. Therefore, instead of using a resultant tree to generate rules directly, we use the value of each node as a cut point to generate all possible rules from the tree first. Then, using t-test, we verify the rules to discover some useful description rules after all possible rules from the tree have been generated. Experimental results show that our approach can find some new interesting knowledge about recurrent ovarian endometriomas under different conditions.[[journaltype]]國外[[incitationindex]]EI[[booktype]]紙本[[countrycodes]]FI

    Medical Informatics

    Get PDF
    Information technology has been revolutionizing the everyday life of the common man, while medical science has been making rapid strides in understanding disease mechanisms, developing diagnostic techniques and effecting successful treatment regimen, even for those cases which would have been classified as a poor prognosis a decade earlier. The confluence of information technology and biomedicine has brought into its ambit additional dimensions of computerized databases for patient conditions, revolutionizing the way health care and patient information is recorded, processed, interpreted and utilized for improving the quality of life. This book consists of seven chapters dealing with the three primary issues of medical information acquisition from a patient's and health care professional's perspective, translational approaches from a researcher's point of view, and finally the application potential as required by the clinicians/physician. The book covers modern issues in Information Technology, Bioinformatics Methods and Clinical Applications. The chapters describe the basic process of acquisition of information in a health system, recent technological developments in biomedicine and the realistic evaluation of medical informatics

    Technology Forecasting Using Data Mining and Semantics: First Annual Report

    Get PDF
    The planning and management of research and development is a challenging process which is compounded by the large amounts of information which is available. The goal of this project is to mine science and technology databases for patterns and trends which facilitate the formation of research strategies. Examples of the types of information sources which we exploit are diverse and include academic journals, patents, blogs and news stories. The intended outputs of the project include growth forecasts for various technological sectors (with an emphasis on sustainable energy), an improved understanding of the underlying research landscape, as well as the identification of influential researchers or research groups. This paper focuses on the development of techniques to both organize and visualize the data in a way which reflects the semantic relationships between keywords. We studied the use of the joint term frequencies of pairs of keywords, as a means of characterizing this semantic relationship – this is based on the intuition that terms which frequently appear together are more likely to be closely related. Some of the results reported herein describe: (1) Using appropriate tools and methods, exploitable patterns and information can certainly be extracted from publicly available databases, (2) Adaptation of the Normalized Google Distance (NGD) formalism can provide measures of keyword distances that facilitate keyword clustering and hierarchical visualization, (3) Further adaptation of the NGD formalism can be used to provide an asymmetric measure of keyword distances to allow the automatic creation of a keyword taxonomy, and (4) Adaptation of the Latent Semantic Approach (LSA) can be used to identify concepts underlying collections of keywords

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    Information management and multivariate analysis techniques for metabolomics data

    Get PDF
    Among the so-called "omics" disciplines,metabolomics has been receiving considerable attention over the last few years. Metabolomics is the large-scale study ofmetabolites that are smallmolecules within cells, biofluids and tissues, produced as a result ofmetabolism. The growing interest inmetabolomics has been encouraged by rapid advances inmetabolic profiling techniques and by technological developments of the diverse analytical platforms, including proton NucleicMagnetic Resonance (1H NMR), Gas Chromatography-Mass Spectrometry (GC-MS) and Liquid Chromatography-Mass Spectrometry (LC-MS), used for extracting metabolic profiles. The output generated from these experimental techniques results in the production of a huge amount of data and information. This thesis attempts to provide an overview of the analytical technologies, the resources and databases employed in this emerging discipline, and ismainly focused on the following two aspects: (i) the challenges of handling the large amounts of data generated and managing the complex experimental processes needed to produce them; (ii) the techniques for the multivariate analysis of metabolomics data, with a special emphasis on methods based on the randomforest algorithm. To this aim, a detailed description and explanation of QTREDS, a software platform designed for managing, monitoring and tracking the experimental processes and activites of "omics" laboratories is provided. In addition, a thorough elucidation of the software package RFmarkerDetector, available through the Comprehensive R Archive Network (CRAN), and a description of the multivariate analysis techniques it implements, is also given

    Algorithmic aspects of sparse approximations

    Get PDF
    Typical tasks in signal processing may be done in simpler ways or more efficiently if the signals to analyze are represented in a proper way. This thesis deals with some algorithmic problems related to signal approximation, more precisely, in the novel field of sparse approximation using redundant dictionaries of functions. Orthogonal bases permit to approximate signals by just taking the N waveforms whose associated projections have maximal amplitudes. This nice property is no longer valid if the used base is redundant. In fact, finding the best decomposition becomes a NP Hard problem in the general case. Thus, suboptimal heuristics have been developed; the best known ones are Matching Pursuit and Basis Pursuit. Both remain highly complex which prevent them from being used in practice in many situations. The first part of the thesis is concerned with this computational bottleneck. We propose to create a tree structure endowing the dictionary and grouping similar atoms in the same branches. An approximation algorithm, called Tree-Based Pursuit, exploiting this structure is presented. It considerably lowers the cost of finding good approximations with redundant dictionaries. The quality of the representation does not only depend on the approximation algorithm but also on the dictionary used. One of the main advantages of these techniques is that the atoms can be tailored to match the features present in the signal. It might happen that some knowledge about the class of signals to approximate directly leads to the dictionary. For most natural signals, however, the underlying structures are not clearly known and may be obfuscated. Learning dictionaries based on examples is an alternative to manual design and is gaining in interest. Most natural signals exhibit behaviors invariant to translations in space or in time. Thus, we propose an algorithm to learn redundant dictionaries under the translation invariance constraint. In the case of images, the proposed solution is able to recover atoms similar to Gabor functions, line edge detectors and curved edge detectors. The two first categories were already observed and the third one completes the range of natural features and is a major contribution of this algorithm. Sparsity is used to define the efficiency of approximation algorithms as well as to characterize good dictionaries. It directly comes from the fact that these techniques aim at approximating signals with few significant terms. This property was successfully exploited as a dimension reduction method for different signal processing tasks as analysis, de-noising or compression. In the last chapter, we tackle the problem of finding the nearest neighbor to a query signal in a set of signals that have a sparse representation. We take advantage of sparsity to approximate quickly the distance between the query and all elements of the database. In this way, we are able to prune recursively all elements that do not match the query, while providing bounds on the true distance. Validation of this technique on synthetic and real data sets confirms that it could be very well suited to process queries over large databases of compressed signals, avoiding most of the burden of decoding
    • …
    corecore