69 research outputs found

    Constellation Queries over Big Data

    Full text link
    A geometrical pattern is a set of points with all pairwise distances (or, more generally, relative distances) specified. Finding matches to such patterns has applications to spatial data in seismic, astronomical, and transportation contexts. For example, a particularly interesting geometric pattern in astronomy is the Einstein cross, which is an astronomical phenomenon in which a single quasar is observed as four distinct sky objects (due to gravitational lensing) when captured by earth telescopes. Finding such crosses, as well as other geometric patterns, is a challenging problem as the potential number of sets of elements that compose shapes is exponentially large in the size of the dataset and the pattern. In this paper, we denote geometric patterns as constellation queries and propose algorithms to find them in large data applications. Our methods combine quadtrees, matrix multiplication, and unindexed join processing to discover sets of points that match a geometric pattern within some additive factor on the pairwise distances. Our distributed experiments show that the choice of composition algorithm (matrix multiplication or nested loops) depends on the freedom introduced in the query geometry through the distance additive factor. Three clearly identified blocks of threshold values guide the choice of the best composition algorithm. Finally, solving the problem for relative distances requires a novel continuous-to-discrete transformation. To the best of our knowledge this paper is the first to investigate constellation queries at scale

    MAM: Método para Agrupamentos Múltiplos em Redes Sociais Online Baseado em Emoções, Personalidades e Textos

    Get PDF
    Um problema importante em análise de redes sociais é o particionamento de seus usuários com o objetivo de descobrir grupos que possuem interesses ou características comuns. Dada uma coleção de objetos, tipicamente não existe apenas uma única maneira de formar as partições. Além disto, quando objetos são usuários de uma rede social, cada objeto pode ser representado por diferentes conjuntos de dados. Esses conjuntos de dados oferecem oportunidades para explorar os comportamentos dos usuários a partir de diferentes perspectivas. Esse trabalho descreve um método agrupamento de múltiplas visões para agrupar objetos que contenham tais propriedades. Os agrupamentos produzidos por nosso método produzem agrupamentos alternativos não-redundantes. Devido a essas diferenças, eles podem revelar novas maneiras de interpretar os dados. Os experimentos conduzidos nesses trabalho usaram uma rede social online brasileira denominada MQD. No MQD os usuários são representados por três conjuntos de dados. Cada um deles corresponde a uma particular perspectiva: emoção, personalidade e postagem. Os resultados experimentais indicam que nosso método é capaz de produzir agrupamentos diferentes que consideram as três perspectivas dos usuários

    Adaptation of the Moodle for Application in Distance Education Course at the State University of Campinas

    Get PDF
    AbstractThis paper presents Pedagogical Platform for Interactive Communications (PPIC) which is a platform developed to support Distance Learning Courses (DLC). PPIC is based on Moodle and was developed by LANTEC/UNICAMP for support training of teachers in using technological tools in classrooms. The customization of PPIC was based on a survey that was conducted with 2.100 users that indicated the most frequent problems that they have faced while being a student in previous DLC. As many problems were addressed by our developed platform, we expect that our PPIC succeed in support DLC

    Discovering Tight Space-Time Sequences

    Get PDF
    International audienceThe problem of discovering spatiotemporal sequential patterns affects a broad range of applications. Many initiatives find sequences constrained by space and time. This paper addresses an appealing new challenge for this domain: find tight space-time sequences, i.e., find within the same process: i) frequent sequences constrained in space and time that may not be frequent in the entire dataset and ii) the time interval and space range where these sequences are frequent. The discovery of such patterns along with their constraints may lead to extract valuable knowledge that can remain hidden using traditional methods since their support is extremely low over the entire dataset. We introduce a new Spatio-Temporal Sequence Miner (ST SM) algorithm to discover tight space-time sequences. We evaluate ST SM using a proof of concept use case. When compared with general spatial-time sequence mining algorithms (GST SM), ST SM allows for new insights by detecting maximal space-time areas where each pattern is frequent. To the best of our knowledge, this is the first solution to tackle the problem of identifying tight space-time sequences

    Autoantibodies against MHC class I polypeptide-related sequence A are associated with increased risk of concomitant autoimmune diseases in celiac patients

    Get PDF
    Background: Overexpression of autologous proteins can lead to the formation of autoantibodies and autoimmune diseases. MHC class I polypeptide-related sequence A (MICA) is highly expressed in the enterocytes of patients with celiac disease, which arises in response to gluten. The aim of this study was to investigate anti-MICA antibody formation in patients with celiac disease and its association with other autoimmune processes. Methods: We tested serum samples from 383 patients with celiac disease, obtained before they took up a gluten-free diet, 428 patients with diverse autoimmune diseases, and 200 controls for anti-MICA antibodies. All samples were also tested for anti-endomysium and anti-transglutaminase antibodies. Results: Antibodies against MICA were detected in samples from 41.7% of patients with celiac disease but in only 3.5% of those from controls (P <0.0001) and 8.2% from patients with autoimmune disease (P <0.0001). These antibodies disappeared after the instauration of a gluten-free diet. Anti-MICA antibodies were significantly prevalent in younger patients (P <0.01). Fifty-eight patients with celiac disease (15.1%) presented a concomitant autoimmune disease. Anti-MICA-positive patients had a higher risk of autoimmune disease than MICA antibody-negative patients (P <0.0001; odds ratio = 6.11). The risk was even higher when we also controlled for age (odds ratio = 11.69). Finally, we found that the associated risk of developing additional autoimmune diseases was 16 and 10 times as high in pediatric patients and adults with anti-MICA, respectively, as in those without. Conclusions: The development of anti-MICA antibodies could be related to a gluten-containing diet, and seems to be involved in the development of autoimmune diseases in patients with celiac disease, especially younger ones

    Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

    Get PDF
    The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology

    Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

    Get PDF
    publication en ligne. Article dans revue scientifique avec comité de lecture. nationale.National audienceThe human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology
    corecore