5,604 research outputs found

    Neogeography: The Challenge of Channelling Large and Ill-Behaved Data Streams

    Get PDF
    Neogeography is the combination of user generated data and experiences with mapping technologies. In this article we present a research project to extract valuable structured information with a geographic component from unstructured user generated text in wikis, forums, or SMSes. The extracted information should be integrated together to form a collective knowledge about certain domain. This structured information can be used further to help users from the same domain who want to get information using simple question answering system. The project intends to help workers communities in developing countries to share their knowledge, providing a simple and cheap way to contribute and get benefit using the available communication technology

    Named Entity Extraction and Disambiguation: The Reinforcement Effect.

    Get PDF
    Named entity extraction and disambiguation have received much attention in recent years. Typical fields addressing these topics are information retrieval, natural language processing, and semantic web. Although these topics are highly dependent, almost no existing works examine this dependency. It is the aim of this paper to examine the dependency and show how one affects the other, and vice versa. We conducted experiments with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms as a representative example of named entities. We experimented with three approaches for disambiguation with the purpose to infer the country of the holiday home. We examined how the effectiveness of extraction influences the effectiveness of disambiguation, and reciprocally, how filtering out ambiguous names (an activity that depends on the disambiguation process) improves the effectiveness of extraction. Since this, in turn, may improve the effectiveness of disambiguation again, it shows that extraction and disambiguation may reinforce each other.\u

    Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

    Get PDF
    Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover

    Concept Extraction Challenge: University of Twente at #MSM2013

    Get PDF
    Twitter messages are a potentially rich source of continuously and instantly updated information. Shortness and informality of such messages are challenges for Natural Language Processing tasks. In this paper we present a hybrid approach for Named Entity Extraction (NEE) and Classification (NEC) for tweets. The system uses the power of the Conditional Random Fields (CRF) and the Support Vector Machines (SVM) in a hybrid way to achieve better results. For named entity type classification we used AIDA \cite{YosefHBSW11} disambiguation system to disambiguate the extracted named entities and hence find their type

    The Enhanced Definition and Control of Downstream Processing Operations

    Get PDF
    Monitoring product and contaminants is critically important at all stages of bioprocess operation, development and control. The availability of rapid measurements on product and key contaminants will yield a higher resolution of data points and will allow for more intelligent operation of a process and thereby enhance the definition and characterisation of a bioprocess. The need to control a bioseparation process is due to the variable nature of upstream conditions, process additives and sub-optimal performance of processing equipment which may lead to different requirements for the operating conditions either within batches or on batch to batch basis. Potential operations for downstream processing of intracellular proteins are the selective flocculation, packed bed and expanded bed chromatographic operations. These processes involve the removal of a large number of contaminants in a single dynamic step and hence are difficult unit operations to characterise and operate in an efficient and reproducible manner. In order to achieve rapid charactensation and control of these processes some form of rapid monitoring was required. A sampling and monitoring system for analysis of an enzyme produced intracellularly in S.cerevisiae, alcohol dehydrogenase (ADH), cell debris, protein and RNA contaminants has been constructed, with a measurement cycle time of 135 s. Both an extended Kalman filter and the Levenberg-Marquardt nonlinear least squares model parameter identification technique have been implemented for rapid process characterisation. Estimation of model parameters from at-line data enabled process performance predictions to be represented in an optimum graphical manner and the subsequent determination of ideal operating conditions in a feedback model based control configuration. The application of such a control strategy for the batch flocculation process yielded on average 92% accuracy in achieving optimum operating conditions. A structured and intelligent use of the at-line data would improve process characterisation in terms of speed and stability. It was demonstrated that rapid monitoring of the packed and expanded bed chromatographic operations yielded improved characterisation in terms of higher resolution data points, enabled real time process analysis and control of the load cycle. For the control of the expanded bed operation a predictive technique was applied to compensate for the large dead volume associated with this unit operation. The feedback control resulted in approximately 80% accurate breakthrough setpoint regulation
    corecore