302,905 research outputs found

    Using machine learning tools for protein database biocuration assistance

    Get PDF
    Biocuration in the omics sciences has become paramount, as research in these fields rapidly evolves towards increasingly data-dependent models. As a result, the management of web-accessible publicly-available databases becomes a central task in biological knowledge dissemination. One relevant challenge for biocurators is the unambiguous identification of biological entities. In this study, we illustrate the adequacy of machine learning methods as biocuration assistance tools using a publicly available protein database as an example. This database contains information on G Protein-Coupled Receptors (GPCRs), which are part of eukaryotic cell membranes and relevant in cell communication as well as major drug targets in pharmacology. These receptors are characterized according to subtype labels. Previous analysis of this database provided evidence that some of the receptor sequences could be affected by a case of label noise, as they appeared to be too consistently misclassified by machine learning methods. Here, we extend our analysis to recent and quite substantially modified new versions of the database and reveal their now extremely accurate labeling using several machine learning models and different transformations of the unaligned sequences. These findings support the adequacy of our proposed method to identify problematic labeling cases as a tool for database biocuration.Peer ReviewedPostprint (published version

    New probabilistic interest measures for association rules

    Full text link
    Mining association rules is an important technique for discovering meaningful patterns in transaction databases. Many different measures of interestingness have been proposed for association rules. However, these measures fail to take the probabilistic properties of the mined data into account. In this paper, we start with presenting a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world database from a grocery outlet to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left hand side of rules and that lift performs poorly to filter random noise in transaction data. Based on the probabilistic framework we develop two new interest measures, hyper-lift and hyper-confidence, which can be used to filter or order mined association rules. The new measures show significantly better performance than lift for applications where spurious rules are problematic

    An intelligent linked data quality dashboard

    Get PDF
    This paper describes a new intelligent, data-driven dashboard for linked data quality assessment. The development goal was to assist data quality engineers to interpret data quality problems found when evaluating a dataset us-ing a metrics-based data quality assessment. This required construction of a graph linking the problematic things identified in the data, the assessment metrics and the source data. This context and supporting user interfaces help the user to un-derstand data quality problems. An analysis widget also helped the user identify the root cause multiple problems. This supported the user in identification and prioritization of the problems that need to be fixed and to improve data quality. The dashboard was shown to be useful for users to clean data. A user evaluation was performed with both expert and novice data quality engineers

    Implications of probabilistic data modeling for rule mining

    Get PDF
    Mining association rules is an important technique for discovering meaningful patterns in transaction databases. In the current literature, the properties of algorithms to mine associations are discussed in great detail. In this paper we investigate properties of transaction data sets from a probabilistic point of view. We present a simple probabilistic framework for transaction data and its implementation using the R statistical computing environment. The framework can be used to simulate transaction data when no associations are present. We use such data to explore the ability to filter noise of confidence and lift, two popular interest measures used for rule mining. Based on the framework we develop the measure hyperlift and we compare this new measure to lift using simulated data and a real-world grocery database.Series: Research Report Series / Department of Statistics and Mathematic

    The Case for Legal Regulation of Physicians’ Off-Label Prescribing

    Get PDF
    Deafness has been associated with poor abilities to deal with digits in the context of arithmetic and memory, and language modality-specific differences in the phonological similarity of digits have been shown to influence short-term memory (STM). Therefore, the overall aim of the present thesis was to find out whether language modality-specific differences in phonological processing between sign and speech can explain why deaf signers perform at lower levels than hearing peers when dealing with digits. To explore this aim, the role of phonological processing in digit-based arithmetic and memory tasks was investigated, using both behavioural and neuroimaging methods, in adult deaf signers and hearing non-signers, carefully matched on age, sex, education and non-verbal intelligence. To make task demands as equal as possible for both groups, and to control for material effects, arithmetic, phonological processing, STM and working memory (WM) were all assessed using the same presentation and response mode for both groups. The results suggested that in digit-based STM, phonological similarity of manual numerals causes deaf signers to perform more poorly than hearing non-signers. However, for  digit-based WM there was no difference between the groups, possibly due to differences in allocation of resources during WM. This indicates that similar WM for the two groups can be generalized from lexical items to digits. Further, we found that in the present work deaf signers performed better than expected and on a par with hearing peers on all arithmetic tasks, except for multiplication, possibly because the groups studied here were very carefully matched. However, the neural networks recruited for arithmetic and phonology differed between groups. During multiplication tasks, deaf signers showed an increased  reliance on cortex of the right parietal lobe complemented by the left inferior frontal gyrus. In contrast, hearing non-signers relied on cortex of the left frontal and parietal lobes during multiplication. This suggests that while hearing non-signers recruit phonology-dependent arithmetic fact retrieval processes for multiplication, deaf signers recruit non-verbal magnitude manipulation processes. For phonology, the hearing non-signers engaged left lateralized frontal and parietal areas within the classical perisylvian language network. In deaf signers, however, phonological processing was limited to cortex of the left occipital lobe, suggesting that sign-based phonological processing does not necessarily activate the classical language network. In conclusion, the findings of the present thesis suggest that language modality-specific differences between sign and speech in different ways can explain why deaf signers perform at lower levels than hearing non-signers on tasks that include dealing with digits.Dövhet har kopplats till bristande förmÄga att hantera siffror inom omrÄdena aritmetik och minne. SÀrskilt har sprÄkmodalitetsspecifika skillnader i fonologisk likhet för siffror visat sig pÄverka korttidsminnet. Det övergripande syftet med den hÀr avhandlingen var dÀrför att undersöka om sprÄkmodalitetsspecifika skillnader i fonologisk bearbetning mellan teckenoch talsprÄk kan förklara varför döva presterar sÀmre Àn hörande pÄ sifferuppgifter. För att utforska det omrÄdet undersöktes fonologisk bearbetning i sifferbaserade minnesuppgifter och aritmetik med hjÀlp av bÄde beteendevetenskapliga metoder och hjÀrnavbildning hos grupper av teckensprÄkiga döva och talsprÄkiga hörande som matchats noggrant pÄ Älder, kön, utbildning och icke-verbal intelligens. För att testförhÄllandena skulle bli sÄ likartade som möjligt för de bÄda grupperna, och för att förebygga materialeffekter, anvÀndes samma presentations- och svarssÀtt för bÄda grupperna. Resultaten visade att vid sifferbaserat korttidsminne pÄverkas de dövas prestation av de tecknade siffrornas fonologiska likhet. DÀremot fanns det ingen skillnad mellan grupperna gÀllande sifferbaserat arbetsminne, vilket kan bero pÄ att de bÄda grupperna fördelar sina kognitiva resurser pÄ olika sÀtt. Dessutom fann vi att den grupp teckensprÄkiga döva som deltog i studien presterade bÀttre pÄ aritmetik Àn vad tidigare forskning visat och de skiljde sig bara frÄn hörande pÄ multiplikationsuppgifter, vilket kan bero pÄ att grupperna var sÄ noggrant matchade. DÀremot fanns det skillnader mellan grupperna i vilka neurobiologiska nÀtverk som aktiverades vid aritmetik och fonologi. Vid multiplikationsuppgifter aktiverades cortex i höger parietallob och vÀnster frontallob för de teckensprÄkiga döva, medan cortex i vÀnster frontal- och parietallob aktiverades för de talsprÄkiga hörande. Detta indikerar att de talsprÄkiga hörande förlitar sig pÄ fonologiberoende minnesstrategier medan de teckensprÄkiga döva förlitar sig pÄ ickeverbal magnitudmanipulering och artikulatoriska processer. Under den fonologiska uppgiften aktiverade de talsprÄkiga hörande vÀnsterlateraliserade frontala och parietala omrÄden inom det klassiska sprÄknÀtverket. För de teckensprÄkiga döva var fonologibearbetningen begrÀnsad till cortex i vÀnster occipitallob, vilket tyder pÄ att teckensprÄksbaserad fonologi inte behöver aktivera det klassiska sprÄknÀtverket. Sammanfattningsvis visar fynden i den hÀr avhandlingen att sprÄkmodalitetsspecifika skillnader mellan tecken- och talsprÄk pÄ olika sÀtt kan förklara varför döva presterar sÀmre Àn hörande pÄ vissa sifferbaserade uppgifter

    Flowfield-dependent variant method for moving-boundary problems

    Get PDF
    A novel numerical scheme using the combination of flowfield-dependent variation method and arbitrary Lagrangian–Eulerian method is developed. This method is a mixed explicit–implicit numerical scheme, and its implicitness is dependent on the physical properties of the flowfield. The scheme is discretized using the finite-volume method to give flexibility in dealing with complicated geometries. The formulation itself yields a sparse matrix, which can be solved by using any iterative algorithm. Several benchmark problems in two-dimensional inviscid and viscous flow have been selected to validate the method. Good agreement with available experimental and numerical data in the literature has been obtained, thus showing its promising application in complex fluid–structure interaction problems

    A robust digital image watermarking using repetition codes against common attacks

    Get PDF
    Digital watermarking is hiding the information inside a digital media to protect for such documents against malicious intentions to change such documents or even claim the rights of such documents. Currently the capability of repetition codes on various attacks in not sufficiently studied. In this project, a robust frequency domain watermarking scheme has been implemented using Discrete Cosine Transform (DCT). The idea of this scheme is to embed an encoded watermark using repetition code (3, 1) inside the cover image pixels based on Discrete Cosine Transform (DCT) embedding technique. The proposed methods have undergone several simulation attacks tests in order to check up and compare their robustness against various attacks, like salt and pepper, speckle, compress, Gaussian, image contrast, resizing and cropping attack. The robustness of the watermarking scheme has been calculated using Peak Signal-To-Noise Ratio (PSNR), Mean Squared Error (MSE) and Normalized Correlations (NC). In our experiments, the results show that the robustness of a watermark with repetition codes is much better than without repetition code
    • 

    corecore