84 research outputs found

    Text messaging and retrieval techniques for a mobile health information system

    Get PDF
    Mobile phones have been identified as one of the technologies that can be used to overcome the challenges of information dissemination regarding serious diseases. Short message services, a much used function of cell phones, for example, can be turned into a major tool for accessing databases. This paper focuses on the design and development of a short message services-based information access algorithm to carefully screen information on human immunodeficiency virus/acquired immune deficiency syndrome within the context of a frequently asked questions system. However, automating the short message services-based information search and retrieval poses significant challenges because of the inherent noise in its communications. The developed algorithm was used to retrieve the best-ranked question–answer pair. Results were evaluated using three metrics: average precision, recall and computational time. The retrieval efficacy was measured and it was confirmed that there was a significant improvement in the results of the proposed algorithm when compared with similar retrieval algorithms

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Normalization of noisy texts in Malaysian online reviews

    Get PDF
    The process of gathering useful information from online messages has increased as more and more people use the Internet and other online applications such as Facebook and Twitter to communicate with each other.One of the problems in processing online messages is the high number of noisy texts that exist in these messages.Few studies have shown that the noisy texts decreased the result of text mining activities.On the other hand, very few works have investigated on the patterns of noisy texts that are created by Malaysians.In this study, a common noisy terms list and an artificial abbreviations list were created using specific rules and were utilized to select candidates of correct words for a noisy term.Later, the correct term was selected based on a bi-gram words index.The experiments used online messages that were created by the Malaysians.The result shows that normalization of noisy texts using artificial abbreviations list compliments the use of common noisy texts list

    Short message service normalization for communication with a health information system

    Get PDF
    Philosophiae Doctor - PhDShort Message Service (SMS) is one of the most popularly used services for communication between mobile phone users. In recent times it has also been proposed as a means for information access. However, there are several challenges to be overcome in order to process an SMS, especially when it is used as a query in an information retrieval system.SMS users often tend deliberately to use compacted and grammatically incorrect writing that makes the message difficult to process with conventional information retrieval systems. To overcome this, a pre-processing step known as normalization is required. In this thesis an investigation of SMS normalization algorithms is carried out. To this end,studies have been conducted into the design of algorithms for translating and normalizing SMS text. Character-based, unsupervised and rule-based techniques are presented. An investigation was also undertaken into the design and development of a system for information access via SMS. A specific system was designed to access information related to a Frequently Asked Questions (FAQ) database in healthcare, using a case study. This study secures SMS communication, especially for healthcare information systems. The proposed technique is to encipher the messages using the secure shell (SSH) protocol

    Speech recognition, machine translation, and corpus analysis for identifying farmer demands and targeting digital extension

    Get PDF
    The increasing capabilities of Artificial Intelligence-augmented data analytics present significant opportunities for agricultural extension organizations operating in the Global South. In this project, we supported Farm Radio International (FRI) in investigating the possibility of automating the process of translating and analyzing farmers' voice message data. This report reviews several approaches to overcoming technical constraints and then presents a cutting-edge approach that utilizes innovations in unsupervised learning to deliver highly accurate speech recognition and machine translation in a diverse set of languages

    Detecting deceptive behaviour in the wild:text mining for online child protection in the presence of noisy and adversarial social media communications

    Get PDF
    A real-life application of text mining research “in the wild”, i.e. in online social media, differs from more general applications in that its defining characteristics are both domain and process dependent. This gives rise to a number of challenges of which contemporary research has only scratched the surface. More specifically, a text mining approach applied in the wild typically has no control over the dataset size. Hence, the system has to be robust towards limited data availability, a variable number of samples across users and a highly skewed dataset. Additionally, the quality of the data cannot be guaranteed. As a result, the approach needs to be tolerant to a certain degree of linguistic noise. Finally, it has to be robust towards deceptive behaviour or adversaries. This thesis examines the viability of a text mining approach for supporting cybercrime investigations pertaining to online child protection. The main contributions of this dissertation are as follows. A systematic study of different aspects of methodological design of a state-ofthe- art text mining approach is presented to assess its scalability towards a large, imbalanced and linguistically noisy social media dataset. In this framework, three key automatic text categorisation tasks are examined, namely the feasibility to (i) identify a social network user’s age group and gender based on textual information found in only one single message; (ii) aggregate predictions on the message level to the user level without neglecting potential clues of deception and detect false user profiles on social networks and (iii) identify child sexual abuse media among thousands of legal other media, including adult pornography, based on their filename. Finally, a novel approach is presented that combines age group predictions with advanced text clustering techniques and unsupervised learning to identify online child sex offenders’ grooming behaviour. The methodology presented in this thesis was extensively discussed with law enforcement to assess its forensic readiness. Additionally, each component was evaluated on actual child sex offender data. Despite the challenging characteristics of these text types, the results show high degrees of accuracy for false profile detection, identifying grooming behaviour and child sexual abuse media identification

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF
    corecore