48 research outputs found
Nucleotide Frequencies in Human Genome and Fibonacci Numbers
This work presents a mathematical model that establishes an interesting
connection between nucleotide frequencies in human single-stranded DNA and the
famous Fibonacci's numbers. The model relies on two assumptions. First,
Chargaff's second parity rule should be valid, and, second, the nucleotide
frequencies should approach limit values when the number of bases is
sufficiently large. Under these two hypotheses, it is possible to predict the
human nucleotide frequencies with accuracy. It is noteworthy, that the
predicted values are solutions of an optimization problem, which is commonplace
in many nature's phenomena.Comment: 12 pages, 2 figure
Combining support vector machines and segmentation algorithms for efficient anomaly detection: a petroleum industry application
Proceedings of: International Joint Conference SOCO’14-CISIS’14-ICEUTE’14, Bilbao, Spain, June 25th–27th, 2014, ProceedingsAnomaly detection is the problem of finding patterns in data that do not conform to expected behavior. Similarly, when patterns are numerically distant from the rest of sample, anomalies are indicated as outliers. Anomaly detection had recently attracted the attention of the research community for real-world applications. The petroleum industry is one of the application contexts where these problems are present. The correct detection of such types of unusual information empowers the decision maker with the capacity to act on the system in order to correctly avoid, correct, or react to the situations associated with them. In that sense, heavy extraction machines for pumping and generation operations like turbomachines are intensively monitored by hundreds of sensors each that send measurements with a high frequency for damage prevention. For dealing with this and with the lack of labeled data, in this paper we propose a combination of a fast and high quality segmentation algorithm with a one-class support vector machine approach for efficient anomaly detection in turbomachines. As result we perform empirical studies comparing our approach to other methods applied to benchmark problems and a real-life application related to oil platform turbomachinery anomaly detection.This work was partially funded by CNPq BJT Project 407851/2012-7 and CNPq PVE Project 314017/2013-
Overcoming Incomplete User Models in Recommendation Systems Via an Ontology
To make accurate recommendations, recommendation systems currently require more data about a customer than is usually available. We conjecture that the weaknesses are due to a lack of inductive bias in the learning methods used to build the prediction models. We propose a new method that extends the utility model and assumes that the structure of user preferences follows an ontology of product attributes. Using the data of the MovieLens system, we show experimentally that real user preferences indeed closely follow an ontology based on movie attributes. Furthermore, a recommender based just on a single individual’s preferences and this ontology performs better than collaborative filtering, with the greatest differences when little data about the user is available. This points the way to how proper inductive bias can be used for significantly more powerful recommender systems in the future
Guillain-Barré Syndrome Outbreak in Peru 2019 Associated With Campylobacter jejuni Infection
OBJECTIVE: To identify the clinical phenotypes and infectious triggers in the 2019 Peruvian Guillain-Barré syndrome (GBS) outbreak. METHODS: We prospectively collected clinical and neurophysiologic data of patients with GBS admitted to a tertiary hospital in Lima, Peru, between May and August 2019. Molecular, immunologic, and microbiological methods were used to identify causative infectious agents. Sera from 41 controls were compared with cases for antibodies to Campylobacter jejuni and gangliosides. Genomic analysis was performed on 4 C jejuni isolates. RESULTS: The 49 included patients had a median age of 44 years (interquartile range [IQR] 30-54 years), and 28 (57%) were male. Thirty-two (65%) had symptoms of a preceding infection: 24 (49%) diarrhea and 13 (27%) upper respiratory tract infection. The median time between infectious to neurologic symptoms was 3 days (IQR 2-9 days). Eighty percent had a pure motor form of GBS, 21 (43%) had the axonal electrophysiologic subtype, and 18% the demyelinating subtype. Evidence of recent C jejuni infection was found in 28/43 (65%). No evidence of recent arbovirus infection was found. Twenty-three cases vs 11 controls (OR 3.3, confidence interval [CI] 95% 1.2-9.2, p < 0.01) had IgM and/or IgA antibodies against C jejuni. Anti-GM1:phosphatidylserine and/or anti-GT1a:GM1 heteromeric complex antibodies were strongly positive in cases (92.9% sensitivity and 68.3% specificity). Genomic analysis showed that the C jejuni strains were closely related and had the Asn51 polymorphism at cstII gene. CONCLUSIONS: Our study indicates that the 2019 Peruvian GBS outbreak was associated with C jejuni infection and that the C jejuni strains linked to GBS circulate widely in different parts of the world
Semi-automated assembly of high-quality diploid human reference genomes
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements
Microbial genome sequencing.
Complete genome sequences of 30 microbial species have been determined during the past five years, and work in progress indicates that the complete sequences of more than 100 further microbial species will be available in the next two to four years. These results have revealed a tremendous amount of information on the physiology and evolution of microbial species, and should provide novel approaches to the diagnosis and treatment of infectious disease
kACTUS 2: Privacy Preserving in Classification Tasks Using k-Anonymity
Abstract. k-anonymity is the method used for masking sensitive data which successfully solves the problem of re-linking of data with an external source and makes it difficult to re-identify the individual. Thus k-anonymity works on a set of quasi-identifiers (public sensitive attributes), whose possible availability and linking is anticipated from external dataset, and demands that the released dataset will contain at least k records for every possible quasi-identifier value. Another aspect of k is its capability of maintaining the truthfulness of the released data (unlike other existing methods). This is achieved by generalization, aprimary technique in k-anonymity. Generalization consists of generalizing attribute values and substituting them with semantically consistent but less precise values. When the substituted value doesn’t preserve semantic validity the technique is called suppression which is a private case of generalization. We present a hybrid approach called compensation which is based on suppression and swapping for achieving privacy. Since swapping decreases the truthfulness of attribute values there is a tradeoff between level of swapping (information truthfulness) and suppression (information loss) incorporated in our algorithm. We use k-anonymity to explore the issue of anonymity preservation. Since we do not use generalization, we do not need a priori knowledge of attribute semantics. We investigate data anonymization in the context of classification and use tree properties to satisfy k-anonymization. Our work improves previous approaches by increasing classification accuracy