59 research outputs found

    AAPOR Report on Big Data

    Get PDF
    In recent years we have seen an increase in the amount of statistics in society describing different phenomena based on so called Big Data. The term Big Data is used for a variety of data as explained in the report, many of them characterized not just by their large volume, but also by their variety and velocity, the organic way in which they are created, and the new types of processes needed to analyze them and make inference from them. The change in the nature of the new types of data, their availability, the way in which they are collected, and disseminated are fundamental. The change constitutes a paradigm shift for survey research.There is a great potential in Big Data but there are some fundamental challenges that have to be resolved before its full potential can be realized. In this report we give examples of different types of Big Data and their potential for survey research. We also describe the Big Data process and discuss its main challenges

    Visualisation and dynamic querying of large multivariate data sets

    Get PDF
    The legitimacy and effectiveness of current methods and theories that guide the construction of visualisations is in question and there is a lack of any scientific support for many of these methods. A review of existing visualisation techniques demonstrates some of the innate strengths and weaknesses within the approaches used. By focusing on the more specific task of developing visualisations for large sets of multivariate data, the lack of any kind of guidance in this development process is acknowledged. A prototype visualisation tool based on the well-documented techniques of Parallel Coordinates and Dynamic Queries has been developed taking into account these findings. Incorporating new and novel ideas addressing identified weaknesses in current visualisations, this prototype also provides the basis for demonstrating, testing and evaluating these concepts

    Knowledge discovery for moderating collaborative projects

    Get PDF
    In today's global market environment, enterprises are increasingly turning towards collaboration in projects to leverage their resources, skills and expertise, and simultaneously address the challenges posed in diverse and competitive markets. Moderators, which are knowledge based systems have successfully been used to support collaborative teams by raising awareness of problems or conflicts. However, the functioning of a moderator is limited to the knowledge it has about the team members. Knowledge acquisition, learning and updating of knowledge are the major challenges for a Moderator's implementation. To address these challenges a Knowledge discOvery And daTa minINg inteGrated (KOATING) framework is presented for Moderators to enable them to continuously learn from the operational databases of the company and semi-automatically update the corresponding expert module. The architecture for the Universal Knowledge Moderator (UKM) shows how the existing moderators can be extended to support global manufacturing. A method for designing and developing the knowledge acquisition module of the Moderator for manual and semi-automatic update of knowledge is documented using the Unified Modelling Language (UML). UML has been used to explore the static structure and dynamic behaviour, and describe the system analysis, system design and system development aspects of the proposed KOATING framework. The proof of design has been presented using a case study for a collaborative project in the form of construction project supply chain. It has been shown that Moderators can "learn" by extracting various kinds of knowledge from Post Project Reports (PPRs) using different types of text mining techniques. Furthermore, it also proposed that the knowledge discovery integrated moderators can be used to support and enhance collaboration by identifying appropriate business opportunities and identifying corresponding partners for creation of a virtual organization. A case study is presented in the context of a UK based SME. Finally, this thesis concludes by summarizing the thesis, outlining its novelties and contributions, and recommending future research

    Probability models for information retrieval based on divergence from randomness

    Get PDF
    This thesis devises a novel methodology based on probability theory, suitable for the construction of term-weighting models of Information Retrieval. Our term-weighting functions are created within a general framework made up of three components. Each of the three components is built independently from the others. We obtain the term-weighting functions from the general model in a purely theoretic way instantiating each component with different probability distribution forms. The thesis begins with investigating the nature of the statistical inference involved in Information Retrieval. We explore the estimation problem underlying the process of sampling. De Finettiā€™s theorem is used to show how to convert the frequentist approach into Bayesian inference and we display and employ the derived estimation techniques in the context of Information Retrieval. We initially pay a great attention to the construction of the basic sample spaces of Information Retrieval. The notion of single or multiple sampling from different populations in the context of Information Retrieval is extensively discussed and used through-out the thesis. The language modelling approach and the standard probabilistic model are studied under the same foundational view and are experimentally compared to the divergence-from-randomness approach. In revisiting the main information retrieval models in the literature, we show that even language modelling approach can be exploited to assign term-frequency normalization to the models of divergence from randomness. We finally introduce a novel framework for the query expansion. This framework is based on the models of divergence-from-randomness and it can be applied to arbitrary models of IR, divergence-based, language modelling and probabilistic models included. We have done a very large number of experiment and results show that the framework generates highly effective Information Retrieval models

    Development of genre and function types for web page classification and rating

    Get PDF
    Thesis (S.B. and M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.Includes bibliographical references (leaves 58-59).by Beethoven Cheng.S.B.and M.Eng

    Analiza i predviđanje toka vremenskih serija pomoću ā€œCase-BasedReasoningā€ tehnologije.

    Get PDF
    This thesis describes one promising approach where a problem of time series analysis and prediction was solved by using Case Based Reasoning (CBR) technology. Foundations and main concepts of this technology are described in detail. Furthermore, a detailed study of different approaches in time series analysis is given. System CuBaGe (Curve Base Generator) - A robust and general architecture for curve representation and indexing time series databases, based on Case based reasoning technology, was developed. Also, a corresponding similarity measure was modelled for a given kind of curve representation. The presented architecture may be employed equally well not only in conventional time series (where all values are known), but also in some non-standard time series (sparse, vague, non-equidistant). Dealing with the non-standard time series is the highest advantage of the presented architecture.U ovoj doktorskoj disertaciji prikazan je interesantan i perspektivan pristup reÅ”avanja problema analize i predviđanja vremenskih serija koriŔćenjem Case Based Reasoning (CBR) tehnologije. Detaljno su opisane osnove i glavni koncepti ove tehnologije. Takođe, data je komparativna analiza različitih pristupa u analizi vremenskih serija sa posebnim osvrtom na predviđanje. Kao najveći doprinos ove disertacije, prikazan je sistem CuBaGe (Curve Base Generator) u kome je realizovan originalni način reprezentacije vremenskih serija zajedno sa, takođe originalnom, odgovarajućom merom sličnosti. Robusnost i generalnost sistema ilustrovana je realnom primenom u domenu finansijskog predviđanja, gde je pokazano da sistem jednako dobro funkcioniÅ”e sa standardnim, ali i sa nekim nestandardnim vremenskim serijama (neodređenim, retkim i neekvidistantnim)

    Speech enhancement by perceptual adaptive wavelet de-noising

    Get PDF
    This thesis work summarizes and compares the existing wavelet de-noising methods. Most popular methods of wavelet transform, adaptive thresholding, and musical noise suppression have been analyzed theoretically and evaluated through Matlab simulation. Based on the above work, a new speech enhancement system using adaptive wavelet de-noising is proposed. Each step of the standard wavelet thresholding is improved by optimized adaptive algorithms. The Quantile based adaptive noise estimate and the posteriori SNR based threshold adjuster are compensatory to each other. The combination of them integrates the advantages of these two approaches and balances the effects of noise removal and speech preservation. In order to improve the final perceptual quality, an innovative musical noise analysis and smoothing algorithm and a Teager Energy Operator based silent segment smoothing module are also introduced into the system. The experimental results have demonstrated the capability of the proposed system in both stationary and non-stationary noise environments
    • ā€¦
    corecore