23 research outputs found

    Effective and efficient user and content modeling for intelligent tutoring systems

    No full text
    Most effective teaching methods actively engage students in the process of learning. Active engagement requires individual attention of teachers for each student group or student, which is highly time and resource intensive and almost impossible to implement in most schools in most disciplines. In the last four decades, many intelligent tutoring systems (ITS) have been developed in several domains to provide individualized guidance without a need for one teacher for each student, and have been shown to produce the same improvements as do the effective teaching methods. In order to be effective, intelligent tutoring systems require i) sufficient and high quality educational content, ii) detailed, domain specific content models to assess the difficulty levels of educational materials, iii) fine-grained, domain specific student models and additional equipment such as microphones, cameras, and sensors, and iv) an intelligent recommendation module that automatically provides students with interesting content of appropriate difficulty. However, it is very time-consuming and costly to prepare sufficient and high quality educational content as well as to build domain specific student and content models. They require intensive labor from the domain experts; therefore, have long been recognized as major bottlenecks for the development of ITS. Furthermore, additional equipment such as microphones and cameras that are used for modeling students behaviors outside the tutoring system are not available in most public schools. This dissertation studies novel probabilistic approaches for effective and efficient user and content modeling for intelligent tutoring systems. In particular, we propose methods that i) analyze and model (i.e., detect the types of as well as identify the relevant and irrelevant information in) educational content without requiring domain experts\u27 help, ii) model students\u27 off-task behaviors with the equipment available in most schools, iii) model students\u27 performance without a need for domain expert knowledge, and iv) jointly model students and educational content for more effective student and content modeling. A fully-functioning prototype system has been developed and evaluated in local schools. Empirical studies conducted on real-world datasets from the prototype system as well as on external large-scale datasets demonstrate the effectiveness of the proposed models

    Generalizations with Probability Distributions for Data Anonymization

    Get PDF

    Effective query generation and postprocessing strategies for prior art patent search

    No full text
    Rapid increase in global competition demands increased protection of intellectual property rights and underlines the importance of patents as major intellectual property documents. Prior art patent search is the task of identifying related patents for a given patent file, and is an essential step in judging the validity of a patent application. This article proposes an automated query generation and postprocessing method for prior art patent search. The proposed approach first constructs structured queries by combining terms extracted from different fields of a query patent and then reranks the retrieved patents by utilizing the International Patent Classification (IPC) code similarities between the query patent and the retrieved patents along with the retrieval score. An extensive set of empirical results carried out on a large-scale, real-world dataset shows that utilizing 20 or 30 query terms extracted from all fields of an original query patent according to their log(tf)idf values helps form a representative search query out of the query patent and is found to be more effective than is using any number of query terms from any single field. It is shown that combining terms extracted from different fields of the query patent by giving higher importance to terms extracted from the abstract, claims, and description fields than to terms extracted from the title field is more effective than treating all extracted terms equally while forming the search query. Finally, utilizing the similarities between the IPC codes of the query patent and retrieved patents is shown to be beneficial to improve the effectiveness of the prior art search

    Forcasting user visits for online display advertising

    No full text
    Online display advertising is a multi-billion dollar industry where advertisers promote their products to users by having publishers display their advertisements on popular Web pages. An important problem in online advertising is how to forecast the number of user visits for a Web page during a particular period of time. Prior research addressed the problem by using traditional time-series forecasting techniques on historical data of user visits; (e.g., via a single regression model built for forecasting based on historical data for all Web pages) and did not fully explore the fact that different types of Web pages and different time stamps have different patterns of user visits. In this paper, we propose a series of probabilistic latent class models to automatically learn the underlying user visit patterns among multiple Web pages and multiple time stamps. The last (and the most effective) proposed model identifies latent groups/classes of (i) Web pages and (ii) time stamps with similar user visit patterns, and learns a specialized forecast model for each latent Web page and time stamp class. Compared with a single regression model as well as several other baselines, the proposed latent class model approach has the capability of differentiating the importance of different types of information across different classes of Web pages and time stamps, and therefore has much better modeling flexibility. An extensive set of experiments along with detailed analysis carried out on real-world data from Yahoo! demonstrates the advantage of the proposed latent class models in forecasting online user visits in online display advertising

    Delivering Guaranteed Display Ads under Reach and Frequency Requirements

    No full text
    We propose a novel idea in the allocation and serving of online advertising. We show that by using predetermined fixed-length streams of ads (which we call patterns) to serve advertising, we can incorporate a variety of interesting features into the ad allocation optimization problem. In particular, our formulation optimizes for representativeness as well as user-level diversity and pacing of ads, under reach and frequency requirements. We show how the problem can be solved efficiently using a column generation scheme in which only a small set of best patterns are kept in the optimization problem. Our numerical tests suggest that with parallelization of the pattern generation process, the algorithm has a promising run time and memory usage
    corecore