33 research outputs found

    Mobility Schemes for future networks based on the IMS

    Get PDF

    Managing tail latency in large scale information retrieval systems

    Get PDF
    As both the availability of internet access and the prominence of smart devices continue to increase, data is being generated at a rate faster than ever before. This massive increase in data production comes with many challenges, including efficiency concerns for the storage and retrieval of such large-scale data. However, users have grown to expect the sub-second response times that are common in most modern search engines, creating a problem - how can such large amounts of data continue to be served efficiently enough to satisfy end users? This dissertation investigates several issues regarding tail latency in large-scale information retrieval systems. Tail latency corresponds to the high percentile latency that is observed from a system - in the case of search, this latency typically corresponds to how long it takes for a query to be processed. In particular, keeping tail latency as low as possible translates to a good experience for all users, as tail latency is directly related to the worst-case latency and hence, the worst possible user experience. The key idea in targeting tail latency is to move from questions such as "what is the median latency of our search engine?" to questions which more accurately capture user experience such as "how many queries take more than 200ms to return answers?" or "what is the worst case latency that a user may be subject to, and how often might it occur?" While various strategies exist for efficiently processing queries over large textual corpora, prior research has focused almost entirely on improvements to the average processing time or cost of search systems. As a first contribution, we examine some state-of-the-art retrieval algorithms for two popular index organizations, and discuss the trade-offs between them, paying special attention to the notion of tail latency. This research uncovers a number of observations that are subsequently leveraged for improved search efficiency and effectiveness. We then propose and solve a new problem, which involves processing a number of related queries together, known as multi-queries, to yield higher quality search results. We experiment with a number of algorithmic approaches to efficiently process these multi-queries, and report on the cost, efficiency, and effectiveness trade-offs present with each. Ultimately, we find that some solutions yield a low tail latency, and are hence suitable for use in real-time search environments. Finally, we examine how predictive models can be used to improve the tail latency and end-to-end cost of a commonly used multi-stage retrieval architecture without impacting result effectiveness. By combining ideas from numerous areas of information retrieval, we propose a prediction framework which can be used for training and evaluating several efficiency/effectiveness trade-off parameters, resulting in improved trade-offs between cost, result quality, and tail latency

    Phylogenetics of Archerfishes (Toxotidae) and Evolution of the Toxotid Shooting Apparatus

    Get PDF
    A grant from the One-University Open Access Fund at the University of Kansas was used to defray the author's publication fees in this Open Access journal. The Open Access Fund, administered by librarians from the KU, KU Law, and KUMC libraries, is made possible by contributions from the offices of KU Provost, KU Vice Chancellor for Research & Graduate Studies, and KUMC Vice Chancellor for Research. For more information about the Open Access Fund, please see http://library.kumc.edu/authors-fund.xml.Archerfishes (Toxotidae) are variously found in the fresh- and brackish-water environments of Asia Pacific and are well known for their ability to shoot water at terrestrial prey. These shots of water are intended to strike their prey and cause it to fall into the water for capture and consumption. While this behavior is well known, there are competing hypotheses (blowpipe vs. pressure tank hypothesis) of how archerfishes shoot and which oral structures are involved. Current understanding of archerfish shooting structures is largely based on two species, Toxotes chatareus and T. jaculatrix. We do not know if all archerfishes possess the same oral structures to shoot water, if anatomical variation is present within these oral structures, or how these features have evolved. Additionally, there is little information on the evolution of the Toxotidae as a whole, with all previous systematic works focusing on the interrelationships of the family. We first investigate the limits of archerfish species using new and previously published genetic data. Our analyses highlight that the current taxonomy of archerfishes does not conform to the relationships we recover. Toxotes mekongensis and T. siamensis are placed in the synonymy of T. chatareus, Toxotes carpentariensis is recognized as a species and removed from the synonymy of T. chatareus, and the genus Protoxotes is recognized for T. lorentzi based on the results of our analyses. We then take an integrative approach, using a combined analysis of discrete hard- and soft-tissue morphological characters with genetic data, to construct a phylogeny of the Toxotidae. Using the resulting phylogenetic hypothesis, we then characterize the evolutionary history and anatomical variation within the archerfishes. We discuss variation in the oral structures and the evolution of the mechanism with respect to the interrelationships of archerfishes, and find that the oral structures of archerfishes support the blowpipe hypothesis but soft-tissue oral structures may also play a role in shooting. Finally, by comparing the morphology of archerfishes to their sister group, we find that the Leptobramidae has relevant shooting features in the oral cavity, suggesting that some components of the archerfish shooting mechanism are examples of co-opted or exapted traits

    Effective Math-Aware Ad-Hoc Retrieval based on Structure Search and Semantic Similarities

    Get PDF
    Despite the prevalence of digital scientific and educational contents on the Internet, only a few search engines are capable to retrieve them efficiently and effectively. The main challenge in freely searching scientific literature arises from the presence of structured math formulas and their heterogeneous and contextually important surrounding words. This thesis introduces an effective math-aware, ad-hoc retrieval model that incorporates structure search and semantic similarities. Transformer-based neural retrievers have been adopted to capture additional semantics using domain-adapted supervised retrieval. To enable structure search, I suggest an unsupervised retrieval model that can filter potential mathematical formulas based on structure similarity. This similarity is determined by measuring the largest common substructure(s) in a formula tree representation, known as the Operator Tree (OPT). The structure matching is approximated by employing maximum matching of path-based structure features. The proposed structure similarity measurement can be tailored based on the desired effectiveness and efficiency trade-offs. It may consider various node types, such as operators and operands, and accommodate different numbers of common subtrees with varying weights. In addition to structure similarity, this unsupervised model also captures symbol substitutions through a greedy matching algorithm applied to the matched substructure(s). To achieve efficient structure search, I introduce a dynamic pruning algorithm to the problem of structure retrieval. The proposed retrieval algorithm efficiently identifies the maximum common subtree among formula candidates and safely eliminates potential structure matches that exceed a dynamic threshold. To accomplish this, three rank-safe pruning strategies are suggested and compared against exhaustive search baselines. Additionally, more aggressive thresholding policies are proposed to balance effectiveness with further speed improvements. A novel hierarchical inverted index has been implemented. This index is designed to be compatible with traditional information retrieval (IR) infrastructure and optimization techniques. To capture other semantic similarities, I have incorporated neural retrievers into a hybrid setting with structure search. This approach has achieved the state-of-the-art effectiveness in recent math information retrieval tasks. In comparison to strict and unsupervised matching, I have found that supervised neural retrievers are able to capture additional semantic similarities in a highly complementary manner. In order to learn effective representations in heterogeneous math contents, I have proposed a novel pretraining architecture that can improve the contextual awareness between math and its surrounding texts. This pretraining scheme generates effective downstream single-vector representations, eliminating the efficiency bottleneck from using multi-vector dense representations. In the end, the thesis examines future directions, specifically the integration of recent advancements in language modeling. This includes incorporating ongoing exciting developments of large language models for improved math information retrieval. A preliminary evaluation has been conducted to assess the impact of these advancements

    The doctoral research abstracts. Vol:11 2017 / Institute of Graduate Studies, UiTM

    Get PDF
    Foreword: Congratulation to IGS on the continuous effort to publish the 11th issue of the Doctoral Research Abstracts which highlights the research in various disciplines from science and technology, business and administration to social science and humanities. This research abstract issue features the abstracts from 91 PhD doctorates who will receive their scrolls in this 86th UiTM momentous convocation ceremony. This is a special year for the Institute of Graduate Studies where we are celebrating our 20th anniversary. The 20th anniversary is celebrated with pride with an increase in the number of PhD graduates. In this 86th convocation, the number of PhD graduates has increased by 30% compared to the previous convocation. Each research produces an innovation and this year, 91 research innovations have been successfully recognized to have made contributions to the body of knowledge. This is in line with this year UiTM theme that is “Inovasi Melonjak Persaingan Global (Innovation Soars Global Competition)”. Embarking on PhD research may not have been an easy decision for many of you. It often comes at a point in life when the decision to further one’s studies is challenged by the comfort of status quo. I would like it to be known that you have most certainly done UiTM proud by journeying through the scholarly world with its endless challenges and obstacles, and by persevering right till the very end. Again, congratulations to all PhD graduates. As you leave the university as alumni we hope a new relationship will be fostered between you and UiTM to ensure UiTM soars to greater heights. I wish you all the best in your future endeavor. Keep UiTM close to your heart and be our ambassadors wherever you go. / Prof Emeritus Dato’ Dr Hassan Said Vice Chancellor Universiti Teknologi MAR

    Synthesis of new pyrazolium based tunable aryl alkyl ionic liquids and their use in removal of methylene blue from aqueous solution

    Get PDF
    In this study, two new pyrazolium based tunable aryl alkyl ionic liquids, 2-ethyl-1-(4-methylphenyl)-3,5- dimethylpyrazolium tetrafluoroborate (3a) and 1-(4-methylphenyl)-2-pentyl-3,5-dimethylpyrazolium tetrafluoroborate (3b), were synthesized via three-step reaction and characterized. The removal of methylene blue (MB) from aqueous solution has been investigated using the synthesized salts as an extractant and methylene chloride as a solvent. The obtained results show that MB was extracted from aqueous solution with high extraction efficiency up to 87 % at room temperature at the natural pH of MB solution. The influence of the alkyl chain length on the properties of the salts and their extraction efficiency of MB was investigated

    Genomweite Genexpressionsanalyse einer Nichtmodell-Pflanze im Hochdurchsatz : das Transkriptom der Wurzel und Wurzelknöllchen der Kichererbsen-Pflanze unter Salz- und Trockenstress

    Get PDF
    Drought and salt stress are the major constraint to increase yield in chickpea (Cicer arietinum). Improving drought and high-salinity tolerance is therefore of outmost importance for breeding. However, the complexity of these traits allowed only marginal progress. A solution to the current stagnation is expected from innovative molecular tools such as transcriptome analyses providing insight into stress-related gene activity, which combined with molecular markers and expression (e)QTL mapping, may accelerate knowledge-based breeding. SuperSAGE, an improved version of the serial analysis of gene expression (SAGE) technique, generating genome-wide, high-quality transcription profiles from any eukaryote, has been employed in the present study. The method produces 26bp long fragments (26bp tags) from defined positions in cDNAs, providing sufficient sequence information to unambiguously characterize the mRNAs. Further, SuperSAGE tags may be immediately used to produce microarrays and probes for real-time-PCR, thereby overcoming the lack of genomic tools in non-model organisms.Die vorliegende Dissertationsschrift präsentiert die erste Hochdurchsatz-Transkriptom-Analyse der Kichererbse (Cicer arietinum L.), einer Kulturpflanze, die von der Forschung weitgehend vernachlässigt worden ist. Dazu wurden mehr als 270,000 cDNA-Sequenzen, jede 26 Basenpaare (Bp) lang (als „Tags“ bezeichnet), die mehr als 30,000 einzigartige Transkripte (sog. UniTags) repräsentieren, sequenziert, und ihre Reaktionen auf Salz- und Trockenstreß hin untersucht. Die wichtigsten Ergebnisse werden hier kurz aufgelistet: (1) SuperSAGE als eine Technik zur Charakterisierung des Transkriptoms. Im Rahmen dieser Dissertation wurde die SuperSAGE-Technik erheblich verbessert. Zusätzlich zur Vereinfachung des ursprünglichen Protokolls wurde SuperSAGE mit einer Sequenziertechnologie der zweiten Generation, der Pyrosequenzierung von 454 Life Sciences (USA), kombiniert, was den Informationsgehalt der Ergebnisse um das 10fache steigerte (bezogen auf die originären SAGE- und LongSAGE-Protokolle). (2) Das Wurzeltranskriptom unter Salzstress. In Wurzeln des salz-toleranten Kultivars INRAT-93 wurden insgesamt 86,919 Tags identifiziert, die sich in 17,918 UniTags gruppieren ließen. Von diesen UniTags wurden durch Salzstreß 2,055 (11%) induziert bzw. 346 (1,93%) reprimiert (jeweils mindestens 8fach). Ein Transkript mit Sequenzähnlichkeit zu einem Enod 40-Protein wurde dabei am stärksten (>250fach) induziert, während Transkripte für Superoxyd-Dismutase, Trypsin-Inhibitor und Extensin immerhin um das 30fache aufreguliert wurden. Als Stoffwechselwege, die unter Salzstreß vorwiegend mit Transkripten versorgt werden, wurden RNA-Biosynthese, post-translationelle Proteinmodifikationen, zelluläre Organisation und Proteinfaltung identifiziert (sog. Gene Ontology Categories, GO-Katagorien). (3) Das Wurzelknöllchentranskriptome unter Salzstreß. In Wurzelknöllchen der gleichen Pflanzen wurden 57,281 26 Bp-Tags sequenziert, die von insgesamt 13,115 UniTags stammen. Auch hier war das Transkript für das Enod4-Protein am stärksten induziert (60fach). Dennoch reagierten Wurzeln und Wurzelknöllchen sehr verschieden auf den gleichen Salzstreß. Zum Beispiel waren von 2,207 bzw. 2,162 mehr als 3.0fach induzierten UniTags aus Wurzeln und Knöllchen nur 363 beiden Organen gemeinsam. (4) Das Wurzeltranskriptom unter Trockenstress. In Wurzeln des dürre-toleranten Kultivars ICC588 waren von 80,012 sequenzierten Transkripten (entsprechend 17,498 UniTags) sechs Stunden nach Beginn des Trockenstresses 388 (2,22%) mindestens 8fach induziert bzw. 589 (3.37%) reprimiert. Ein Transkript, das für ein 14-3-3-Protein kodiert, war am stärksten induziert (45fach). Weiterhin war die Zahl der Transkripte für einExtensin und eine NADP-abhängige Isocitrat-Dehydrogenase um mehr als das 30fache erhöht. Die GO-Kategorien Translation, Reizbeantwortung, Produktion von Vorläufer-Metaboliten und Energie, und Reaktion auf biotischen Streß waren eindeutig überrepräsentiert. (5) Transkript-Isoformen. Im Rahmen dieser Untersuchungen wurden verschiedenste Transkript-Isoformen von Genen entdeckt, die nach Streß aktiviert werden. Zum Beispiel waren Genfamilien, wie etwa die Genfamilie für Rezeptor-ähnliche Kinasen (receptor-like kinases, RLKs) durch mehr als 36 UniTags vertreten, die zudem noch eine differentielle Organ- und Streß-spezifische Regulation aufwiesen. (6) Übertragbarkeit von Transkriptomdaten. Die durch SuperSAGE gewonnenen Resultate waren mit verschiedenen anderen Plattformen wie z.B. quantitativer Echtzeit-PCR (qRT-PCR) oder Microarrays kompatibel, was weitere Anwendungen impliziert, wie z.B. eine funktionelle Genanalyse mit small interfering RNAs (siRNAs), oder eine Expressionskartierung (eQTL mapping). (7) Eine in silico-Analyse der vorliegenden Daten ergab, dass i) Kichererbsenpflanzen auf Salz-und Trockenstreß hin starkem osmotischen und ionischen Streß und darüber hinaus einer Überproduktion von Sauerstoffradikalen (reactive oxygen radicals, ROSs) ausgesetzt sind. ii) in Wurzelknöllchen der Kichererbse vor Einsetzen eines Stresses bereits Transkripte für Proteine der ROS-Kontrolle stärkstens induziert sind, was auf eine vorgebildete ROS-Detoxifizierung schließen lässt. iii) die in dieser Arbeit beobachteten Transkriptionsprofile nach Einsetzen beider Streßformen keine aktive Neusynthese des Streßhormons Abscissinsäure (abscissic acid, ABA) vermuten lassen. Jedoch wurden einige ABA-aktivierte Gene induziert, was wiederum auf eine Rolle alternativer ABA-Quellen in den betroffenen Pflanzen (wie z.B. die Freisetzung von ABA aus Konjugaten) hinweist

    Pretrained Transformers for Text Ranking: BERT and Beyond

    Get PDF
    The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage architectures and dense retrieval techniques that perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i.e., result quality) and efficiency (e.g., query latency, model and index size). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading

    Ecological restoration of European flat oysters in the German Bight

    Get PDF
    Several marine ecosystems currently face severe degradation, in the form of habitat loss. As a consequence, humans are undertaking initiatives to restore species and habitats to restore and preserve ecosystem services and functions. Although there have been many initiatives to restock commercial marine species for fisheries and aquaculture, the restoration of marine habitats is a relatively new discipline. To recover ecosystem conditions that maintain their structure and function, ecological restoration was conducted and implemented by the Alfred-Wegener-Institut Helmholtz Zentrum für Polar- und Meeresforschung (AWI) and the Bundesamt für Naturschutz (BfN) for re-establishing lost and ecologically relevant biogenic oyster reefs in the frame of marine conservation measures in the German North Sea. From 2016 to 2019, the AWI-led and BfN-funded RESTORE project actively investigated the technical and biological feasibility of restoration, from which this thesis originates. In this context, three key topics (and their associated subtopics), relevant for the development of a successful restoration programme, are addressed in this thesis: I) Oyster supply - How can we provide ecological restoration efforts with substantial amounts of appropriate Ostrea edulis seeds (i.e. gametes, larvae and spat)? Which production techniques and knowledge exist? Which are appropriate for restoration? II) Supply of essential settlement substrate for the oyster life cycle - Which types of substrate to use in accordance with biological traits of O. edulis? Which types of substrate to use in accordance with legislative restrictions? III) Biosecurity aspects of oyster restoration - How to avoid the transfer of pathogens or invasive species during ecological restoration projects (focusing on seed production and substrate transfer)
    corecore