14 research outputs found

    Protecting Whistleblowing (and Not Just Whistleblowers)

    Get PDF
    When the government contracts with private parties, the risk of fraud runs high. Fraud against the government hurts everyone: taxpayer money is wasted on inferior or nonexistent products and services, and the public bears the burdens attendant to those inadequate goods. To combat fraud, Congress has developed several statutory frameworks to encourage whistleblowers to come forward and report wrongdoing in exchange for a monetary reward. The federal False Claims Act allows whistleblowers to file an action in federal court on behalf of the United States, and to share in any recovery. Under the Dodd- Frank Act, the SEC Office of the Whistleblower investigates tips provided by whistleblowers and, in the event of a successful prosecution, pays an award to the tipster. The False Claims Act and SEC program both protect whistleblowers from retaliatory action from their employer. But the SEC program goes a step further: SEC Rule 21F-17 also prevents an employer from taking any action to interfere with the reporting of fraud. In this way, the SEC program protects not only whistleblowers, but also whistleblowing itself. It’s time for the False Claims Act to catch up. Congress should look to SEC Rule 21F-17 as a model for how it could amend the False Claims to establish a cause of action against contractors who take steps to chill or restrict their employees from bringing forward claims of fraud. In doing so, it will vindicate the original intent and purpose of the False Claims Act and encourage whistleblowers to come forward and put an end to corporate wrongdoing. Protecting whistleblowing benefits the government, taxpayers, and whistleblowers—and ensures that the False Claims Act remains an effective instrument in the fight against fraud

    Winona Daily News

    Get PDF
    https://openriver.winona.edu/winonadailynews/1338/thumbnail.jp

    Named entity extraction for speech

    Get PDF
    Named entity extraction is a field that has generated much interest over recent years with the explosion of the World Wide Web and the necessity for accurate information retrieval. Named entity extraction, the task of finding specific entities within documents, has proven of great benefit for numerous information extraction and information retrieval tasks.As well as multiple language evaluations, named entity extraction has been investigated on a variety of media forms with varying success. In general, these media forms have all been based upon standard text and assumed that any variation from standard text constitutes noise.We investigate how it is possible to find named entities in speech data.. Where others have focussed on applying named entity extraction techniques to transcriptions of speech, we investigate a method for finding the named entities direct from the word lattices associated with the speech signal. The results show that it is possible to improve named entity recognition at the expense of word error rate (WER) in contrast to the general view that F -score is directly proportional to WER.We use a. Hidden Markov Model {HMM) style approach to the task of named entity extraction and show how it is possible to utilise a HMM to find named entities within speech lattices. We further investigate how it is possible to improve results by considering an alternative derivation of the joint probability of words and entities than is traditionally used. This new derivation is particularly appropriate to speech lattices as no presumptions are made about the sequence of words.The HMM style approach that we use requires using a number of language models in parallel. We have developed a system for discriminately retraining these language models based upon the results of the output, and we show how it is possible to improve named entity recognition by iterations over both training data and development data. We also consider how part-of-speech (POS) can be used within word lattices. We devise a method of labelling a word lattice with POS tags and adapt the model to make use of these POS tags when producing the best path through the lattice. The resulting path provides the most likely sequence of words, entities and POS tags and we show how this new path is better than the previous path which ignored the POS tags

    Financial information extraction using pre-defined and user-definable templates in the Lolita system

    Get PDF
    Financial operators have today access to an extremely large amount of data, both quantitative and qualitative, real-time or historical and can use this information to support their decision-making process. Quantitative data are largely processed by automatic computer programs, often based on artificial intelligence techniques, that produce quantitative analysis, such as historical price analysis or technical analysis of price behaviour. Differently, little progress has been made in the processing of qualitative data, which mainly consists of financial news articles from financial newspapers or on-line news providers. As a result the financial market players are overloaded with qualitative information which is potentially extremely useful but, due to the lack of time, is often ignored. The goal of this work is to reduce the qualitative data-overload of the financial operators. The research involves the identification of the information in the source financial articles which is relevant for the financial operators' investment decision making process and to implement the associated templates in the LOLITA system. The system should process a large number of source articles and extract specific templates according to the relevant information located in the source articles. The project also involves the design and implementation in LOLITA of a user- definable template interface for allowing the users to easily design new templates using sentences in natural language. This allows user-defined information extraction from source texts. This differs from most of existing information extraction systems which require the developers to code the templates directly in the system. The results of the research have shown that the system performed well in the extraction of financial templates from source articles which would allow the financial operator to reduce his qualitative data-overload. The results have also shown that the user-definable template interface is a viable approach to user-defined information extraction. A trade-off has been identified between the ease of use of the user-definable template interface and the loss of performance compared to hand- coded templates

    Μηχανική Μάθηση στην Επεξεργασία Φυσικής Γλώσσας

    Get PDF
    Η διατριβή εξετάζει την χρήση τεχνικών μηχανικής μάθησης σε διάφορα στάδια της επεξεργασίας φυσικής γλώσσας, κυρίως για σκοπούς εξαγωγής πληροφορίας από κείμενα. Στόχος είναι τόσο η βελτίωση της προσαρμοστικότητας των συστημάτων εξαγωγής πληροφορίας σε νέες θεματικές περιοχές (ή ακόμα και γλώσσες), όσο και η επίτευξη καλύτερης απόδοσης χρησιμοποιώντας όσο το δυνατό λιγότερους πόρους (τόσο γλωσσικούς όσο και ανθρώπινους). Η διατριβή κινείται σε δύο κύριους άξονες: α) την έρευνα και αποτίμηση υπαρχόντων αλγορίθμων μηχανικής μάθησης κυρίως στα στάδια της προ-επεξεργασίας (όπως η αναγνώριση μερών του λόγου) και της αναγνώρισης ονομάτων οντοτήτων, και β) τη δημιουργία ενός νέου αλγορίθμου μηχανικής μάθησης και αποτίμησής του, τόσο σε συνθετικά δεδομένα, όσο και σε πραγματικά δεδομένα από το στάδιο της εξαγωγής σχέσεων μεταξύ ονομάτων οντοτήτων. Ο νέος αλγόριθμος μηχανικής μάθησης ανήκει στην κατηγορία της επαγωγικής εξαγωγής γραμματικών, και εξάγει γραμματικές ανεξάρτητες από τα συμφραζόμενα χρησιμοποιώντας μόνο θετικά παραδείγματα.This thesis examines the use of machine learning techniques in various tasks of natural language processing, mainly for the task of information extraction from texts. The objectives are the improvement of adaptability of information extraction systems to new thematic domains (or even languages), and the improvement of their performance using as fewer resources (either linguistic or human) as possible. This thesis has examined two main axes: a) the research and assessment of existing algorithms of machine learning mainly in the stages of linguistic pre-processing (such as part of speech tagging) and named-entity recognition, and b) the creation of a new machine learning algorithm and its assessment on synthetic data, as well as in real world data from the task of relation extraction between named entities. This new algorithm belongs to the category of inductive grammar learning, and can infer context free grammars from positive examples only

    The Trinity Reporter, Winter 2005

    Get PDF
    https://digitalrepository.trincoll.edu/reporter/2100/thumbnail.jp

    The 1995 Goddard Conference on Space Applications of Artificial Intelligence and Emerging Information Technologies

    Get PDF
    This publication comprises the papers presented at the 1995 Goddard Conference on Space Applications of Artificial Intelligence and Emerging Information Technologies held at the NASA/Goddard Space Flight Center, Greenbelt, Maryland, on May 9-11, 1995. The purpose of this annual conference is to provide a forum in which current research and development directed at space applications of artificial intelligence can be presented and discussed

    Semantic analysis for improved multi-document summarization of text

    Get PDF
    Excess amount of unstructured data is easily accessible in digital format. This information overload places too heavy a burden on society for its analysis and execution needs. Focused (i.e. topic, query, question, category, etc.) multi-document summarization is an information reduction solution which has reached a state-of-the-art that now demands the need to further explore other techniques to model human summarization activity. Such techniques have been mainly extractive and rely on distribution and complex machine learning on corpora in order to perform closely to human summaries. Overall, these techniques are still being used, and the field now needs to move toward more abstractive approaches to model human way of summarizing. A simple, inexpensive and domain-independent system architecture is created for adding semantic analysis to the summarization process. The proposed system is novel in its use of a new semantic analysis metric to better score sentences for selection into a summary. It also simplifies semantic processing of sentences to better capture more likely semantic-related information, reduce redundancy and reduce complexity. The system is evaluated against participants in the Document Understanding Conference and the later Text Analysis Conference using the performance ROUGE measures of n-gram recall between automated systems, human and baseline gold standard baseline summaries. The goal was to show that semantic analysis used for summarization can perform well, while remaining simple and inexpensive without significant loss of recall as compared to the foundational baseline system. Current results show improvement over the gold standard baseline when all factors of this work's semantic analysis technique are used in combination. These factors are the semantic cue words feature and semantic class weighting to determine sentences with important information. Also, the semantic triples clustering used to decompose natural language sentences to their most basic meaning and select the most important sentences added to this improvement. In competition against the gold standard baseline system on the standardized summarization evaluation metric ROUGE, this work outperforms the baseline system by more than ten position rankings. This work shows that semantic analysis and light-weight, open-domain techniques have potential.Ph.D., Information Studies -- Drexel University, 201
    corecore