3,650 research outputs found

    Modeling of Induced Hydraulically Fractured Wells in Shale Reservoirs Using Branched Fractals

    Get PDF
    Imperial Users onl

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    An investigation of dead-zone pattern matching algorithms

    Get PDF
    Thesis (MA)--Stellenbosch, 2016.ENGLISH ABSTRACT: Pattern matching allows us to search some text for a word or for a sequence of characters|a popular feature of computer programs such as text editors. Traditionally, three distinct families of pattern matching algorithms exist: the Boyer-Moore (BM) algorithm, the Knuth-Morris-Pratt (KMP) algorithm, and the Rabin-Karp (RK) algorithm. The basic algorithm in all these algorithmic families was developed in the 1970s and 1980s. However a new family of pattern matching algorithms, known as the Dead-Zone (DZ) family of algorithms, has recently been developed. In a previous study, it was theoretically proven that DZ is able to pattern match a text with fewer match attempts than the well- known Horspool algorithm, a derivative of the BM algorithm. The main aim of this study was to provide empirical evidence to determine whether DZ is faster in practice. A benchmark platform was developed to com- pare variants of the DZ algorithm to existing pattern matching algorithms. Initial experiments were performed with four C implementations of the DZ algorithm (two recursive and two iterative implementations). Subsequent to this, DZ variants that make use of di erent shift functions as well as two parallel variants of DZ (implemented with Pthreads and CUDA) were devel- oped. Additionally, the underlying skeleton of the DZ algorithm was tweaked to determine whether the DZ code was optimal. The benchmark results showed that the C implementation of the iterative DZ variants performed favourably. Both iterative algorithms beat traditional pat- tern matching algorithms when searching natural language and genome texts, particularly for short patterns. When di erent shift functions were used, the only time a DZ implementation performed better than an implementation of the traditional algorithm was for a pattern length of 65536 characters. Con- trary to our expectations, the parallel implementation of DZ did not always provide a speedup. In fact, the Pthreaded variants of DZ were slower than the non-threaded DZ implementations, although the CUDA DZ variants were consistently ve times faster than a CPU implementation of Horspool. By us- ing a cache-friendly DZ algorithm, which reduces cache misses by about 20%, the the original DZ can be improved by approximately 5% for relatively short patterns (up to 128 characters with a natural language text). Moreover, a cost of recursion and the impact of information sharing were observed for all DZ variants and have thus been identi ed as intrinsic DZ characteristics. Further research is recommended to determine whether the cache-friendly DZ algorithm should become the standard implementation of the DZ algorithm. In addition, we hope that the development of our benchmark platform has produced a technique that can be used by researchers in future studies to conduct benchmark testsAFRIKAANSE OPSOMMING: Patroonpassing word gebruik om vir 'n reeks opeenvolgende karakters in 'n blok van teks te soek. Dit word breedvoerig programmaties in rekenaarprogramme gebruik, byvoorbeeld in die teksredigeerders. Tradisioneel is daar drie afsonderlike patroonpassingalgoritme families: die Boyer-Moore (BM) familie, Knuth-Morris-Pratt (KMP) familie en Rabin-Karp (RK) familie. Die basisal- goritmes in hierdie algoritmefamilies was reeds in die 1970s en 1980s ontwikkel. Maar, 'n nuwe patroonpassingsalgoritme familie is egter onlangs ontwikkel. Dit staan as die Dooie Gebied (DG) algoritme familie bekend. 'n Vorige studie het bewys dat DG algoritmes in staat is om patroonpassing uit te voer met minder passingpogings as die welbekende Hoorspool algoritme, wat 'n afgeleide algoritme van die BM algoritme is. Die hoofdoel met hierdie studie was om die DG familie van algoritmes empiries te ondersoek. 'n Normtoets platform is ontwikkel om veranderlikes van die DG algoritme met bestaande patroonpassingsalgoritmes te vergelyk. Aanvanklike eksperimente is met vier C implementasies van die DG algoritme uitgevoer. Twee van die implementasies is rekursief en die ander twee is iteratief. Daarna was DG variante ontwikkel wat van verskillende skuif- funksies gebruik gemaak Twee parallelle variante van DG was ook ontwikkel. Een maak gebruik van \Pthreads' en die ander is in CUDA geimplementeer. Verder was die C kode weergawe van die basiese DG algoritme fyn aangepas om vas te stel of die kode optimaal was. Die normtoetsresultate dui aan dat die C-implementasie van die iteratiewe DG variante gunstig presteer bo-oor die tradisionele patroonpassingsalgoritmes. Beide van die iteratiewe algoritmes klop die tradisionele patroonpassingsalgoritmes wanneer daar met relatiewe kort patrone getoets word. Die verrigting van verskeie skuif-funksies was ook geondersoek. Die enigste keer wanneer die DG algoritmes beter presteer het as die tradisionele algoritme, was vir patroonlengtes van 65536 karakters. Teen ons verwagtinge, het die parallelle implementasie nie altyd spoedtoename voorsien nie. Tewens, die \Pthread" variante van DG was stadiger as die nie-gerygde DG implementasies. Die CUDA DG variante was egter telkens vyf keer vinniger as die konvensionele SVE implementasie van Horspool. Die normtoetse het ook aangedui dat die oorspronklike DG kode naby aan optimaal was. Egter, deur 'n kas-vriendelike weergawe te gebruik wat kas oorslane met omtrent 20% verminder, kon die prestasie met naastenby 5% verbeter word vir relatiewe kort patrone (tot by 128 karakters met natuurlike taal teks). Verder was daar vir al die DG variante n rekursiekoste en 'n impak op inligtingdeling waargeneem wat as interne DG kenmerke geidentifiseer is. Verdere navorsing word aanbeveel om vas te stel of die kas-vriendelike DG algoritme die standaard implementasie van die DG algoritme behoort te word. Bykomstiglik, hoop ons dat die ontwikkeling van ons normtoets platform 'n tegniek geproduseer het wat deur navorsers in toekomstige studies gebruik kan word om normtoetse uit te voer

    Use of Machine Learning and Natural Language Processing to Enhance Traffic Safety Analysis

    Get PDF
    Despite significant advances in vehicle technologies, safety data collection and analysis, and engineering advancements, tens of thousands of Americans die every year in motor vehicle crashes. Alarmingly, the trend of fatal and serious injury crashes appears to be heading in the wrong direction. In 2021, the actual rate of fatalities exceeded the predicted rate. This worrisome trend prompts and necessitates the development of advanced and holistic approaches to determining the causes of a crash (particularly fatal and major injuries). These approaches range from analyzing problems from multiple perspectives, utilizing available data sources, and employing the most suitable tools and technologies within and outside traffic safety domain.The primary source for traffic safety analysis is the structure (also called tabular) data collected from crash reports. However, structure data may be insufficient because of missing information, incomplete sequence of events, misclassified crash types, among many issues. Crash narratives, a form of free text recorded by police officers to describe the unique aspects and circumstances of a crash, are commonly used by safety professionals to supplement structure data fields. Due to its unstructured nature, engineers have to manually review every crash narrative. Thanks to the rapid development in natural language processing (NLP) and machine learning (ML) techniques, text mining and analytics has become a popular tool to accelerate information extraction and analysis for unstructured text data. The primary objective of this dissertation is to discover and develop necessary tools, techniques, and algorithms to facilitate traffic safety analysis using crash narratives. The objectives are accomplished in three areas: enhancing data quality by recovering missed crashes through text classification, uncovering complex characteristics of collision generation through information extraction and pattern recognition, and facilitating crash narrative analysis by developing a web-based tool. At first, a variety of NoisyOR classifiers were developed to identify and investigate work zone (WZ), distracted (DD), and inattentive (ID) crashes. In addition, various machine learning (ML) models, including multinomial naive bayes (MNB), logistic regression (LGR), support vector machine (SVM), k-nearest neighbor (K-NN), random forest (RF), and gated recurrent unit (GRU), were developed and compared with NoisyOR. The comparison shows that NoisyOR is simple, computationally efficient, theoretically sound, and has one of the best model performances. Furthermore, a novel neural network architecture named Sentence-based Hierarchical Attention Network (SHAN) was developed to classify crashes and its performance exceeds that of NoisyOR, GRU, Hierarchical Attention Network (HAN), and other ML models. SHAN handled noisy or irrelevant parts of narratives effectively and the model results can be visualized by attention weight. Because a crash often comprises a series of actions and events, breaking the chain of events could prevent a crash from reaching its most dangerous stage. With the objectives of creating crash sequences, discovering pattern of crash events, and finding missing events, the Part-of-Speech tagging (PT), Pattern Matching with POS Tagging (PMPT), Dependency Parser (DP), and Hybrid Generalized (HGEN) algorithms were developed and thoroughly tested using crash narratives. The top performer, HGEN, uses predefined events and event-related action words from crash narratives to find new events not captured in the data fields. Besides, the association analysis unravels the complex interrelations between events within a crash. Finally, the crash information extraction, analysis, and classification tool (CIEACT), a simple and flexible online web tool, was developed to analyze crash narratives using text mining techniques. The tool uses a Python-based Django Web Framework, HTML, and a relational database (PostgreSQL) that enables concurrent model development and analysis. The tool has built-in classifiers by default or can train a model in real time given the data. The interface is user friendly and the results can be displayed in a tabular format or on an interactive map. The tool also provides an option for users to download the word with their probability scores and the results in csv files. The advantages and limitations of each proposed methodology were discussed, and several future research directions were outlined. In summary, the methodologies and tools developed as part of the dissertation can assist transportation engineers and safety professionals in extracting valuable information from narratives, recovering missed crashes, classifying a new crash, and expediting their review process on a large scale. Thus, this research can be used by transportation agencies to analyze crash records, identify appropriate safety solutions, and inform policy making to improve highway safety of our transportation system

    Application of Streamline Simulation for Gas Displacement Processes

    Get PDF
    Performance evaluation of miscible and near-miscible gas injection processes is available through conventional finite difference (FD) compositional simulation, which is widely used for solving large-scale multiphase displacement problems that always require large computation time. A step can be taken to reduce the time needed by considering low-resolution compositional simulation. The model can be adversely affected by numerical dispersion and may fail to represent geological heterogeneities adequately. The number of fluid components can possibly be reduced at the price of less accurate representation of phase behaviour. Streamline methods have been developed in which fluid is transported along the streamlines instead of the finite difference grid. In streamline-based simulation, a 3D flow problem is decoupled into a set of 1D problems solved along streamlines, reducing simulation time and suppressing any numerical dispersion. Larger time steps and higher spatial resolution can be achieved in these simulations, particularly when sensitivity runs are needed to reduce study uncertainties. Streamline-based reservoir simulation, being orders of magnitude faster than the conventional finite difference methods, may mitigate many of the challenges noted above. For gas injection, the streamline approach could not provide a high resolution or adequate representation for the multiphase displacement. In this work, the streamline simulations for both compositional and miscible gas injection were tested. In addition, the conventional gas injection scheme and detailed comparison between the FD simulation and the streamline approach are illustrated. A detailed comparison is given between the results of conventional FD simulation and the streamline approach for gas displacement processes. Finally, some guidelines are given on how the streamline method can potentially be used to enhance for gas displacement processes

    Spatial Analysis for Landscape Changes

    Get PDF
    Recent increasing trends of the occurrence of natural and anthropic processes have a strong impact on landscape modification, and there is a growing need for the implementation of effective instruments, tools, and approaches to understand and manage landscape changes. A great improvement in the availability of high-resolution DEMs, GIS tools, and algorithms of automatic extraction of landform features and change detections has favored an increase in the analysis of landscape changes, which became an essential instrument for the quantitative evaluation of landscape changes in many research fields. One of the most effective ways of investigating natural landscape changes is the geomorphological one, which benefits from recent advances in the development of digital elevation model (DEM) comparison software and algorithms, image change detection, and landscape evolution models. This Special Issue collects six papers concerning the application of traditional and innovative multidisciplinary methods in several application fields, such as geomorphology, urban and territorial systems, vegetation restoration, and soil science. The papers include multidisciplinary studies that highlight the usefulness of quantitative analyses of satellite images and UAV-based DEMs, the application of Landscape Evolution Models (LEMs) and automatic landform classification algorithms to solve multidisciplinary issues of landscape changes. A review article is also presented, dealing with the bibliometric analysis of the research topic
    • …
    corecore