4,812 research outputs found

    Automatic mapping of free texts to bioinformatics ontology terms

    Get PDF
    Bioinformaatika valdkonnas on olemas palju tööriistu ja teenuseid, mille hulk kasvab üha kiirenevas tempos.\n\rEt informatsioon nende ressursside kohta oleks kättesaadav võimalikult kasulikul viisil, annoteerime me need ontoloogia terminitega.\n\rHetkel toimub annoteerimine käsitsi, mis on aeganõudev ja veaohtlik protsess.\n\rAntud töös seame eesmärgiks luua tööriist, mis aitab annotaatorit, pakkudes talle annoteerimissoovitusi.\n\rMe loome programmi, mis loeb sisse vabatekstilise tööriistade ja teenuste kirjelduse, lisab neile seotud veebilehtede ja teadusartiklite sisu ja sellel põhinevalt annab välja parimad leitud ontoloogia terminite vasted.\n\rSeejärel, optimimeerime programmi parameetreid käsitsi tehtud annotatsioonide põhjal.\n\rEsmased tulemused on paljulubavad -- paljud leitud soovitused on kooskõlas käsitsi tehtud annotatsioonidega.\n\rVeelgi enam, kogenud annotaatorite väitel on mitmed teised soovitused samuti korrektsed.In the field of bioinformatics, the number of tools and services is ever-increasing.\n\rIn order to make information about these resources available in a useful was, we annotate them with ontology terms.\n\rThis is currently done manually -- which is time-consuming and error-prone.\n\rIn this thesis, we set out to make a tool that helps the annotator by providing annotation suggestions.\n\rWe developed a program, that reads in free text descriptions of tools and services, adds content of web pages and publications related to the tool and based on this outputs best matches to ontology terms.\n\rThen, we optimised the parameters of the program on manually done annotation sets.\n\rInitial results look promising, as when comparing performance against these manual annotations, we see that many suggestions are agreeing with them.\n\rMoreover, according to experienced annotators, many of the other suggestions are also correct

    Improving search engines with open Web-based SKOS vocabularies

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThe volume of digital information is increasingly larger and even though organiza-tions are making more of this information available, without the proper tools users have great difficulties in retrieving documents about subjects of interest. Good infor-mation retrieval mechanisms are crucial for answering user information needs. Nowadays, search engines are unavoidable - they are an essential feature in docu-ment management systems. However, achieving good relevancy is a difficult problem particularly when dealing with specific technical domains where vocabulary mismatch problems can be prejudicial. Numerous research works found that exploiting the lexi-cal or semantic relations of terms in a collection attenuates this problem. In this dissertation, we aim to improve search results and user experience by inves-tigating the use of potentially connected Web vocabularies in information retrieval en-gines. In the context of open Web-based SKOS vocabularies we propose a query expan-sion framework implemented in a widely used IR system (Lucene/Solr), and evaluated using standard IR evaluation datasets. The components described in this thesis were applied in the development of a new search system that was integrated with a rapid applications development tool in the context of an internship at Quidgest S.A.Fundação para a Ciência e Tecnologia - ImTV research project, in the context of the UTAustin-Portugal collaboration (UTA-Est/MAI/0010/2009); QSearch project (FCT/Quidgest

    JobHam-place with smart recommend job options and candidate filtering options

    Full text link
    Due to the increasing number of graduates, many applicants experience the situation about finding a job, and employers experience difficulty filtering job applicants, which might negatively impact their effectiveness. However, most job-hunting websites lack job recommendation and CV filtering or ranking functionality, which are not integrated into the system. Thus, a smart job hunter combined with the above functionality will be conducted in this project, which contains job recommendations, CV ranking and even a job dashboard for skills and job applicant functionality. Job recommendation and CV ranking starts from the automatic keyword extraction and end with the Job/CV ranking algorithm. Automatic keyword extraction is implemented by Job2Skill and the CV2Skill model based on Bert. Job2Skill consists of two components, text encoder and Gru-based layers, while CV2Skill is mainly based on Bert and fine-tunes the pre-trained model by the Resume- Entity dataset. Besides, to match skills from CV and job description and rank lists of jobs and candidates, job/CV ranking algorithms have been provided to compute the occurrence ratio of skill words based on TFIDF score and match ratio of the total skill numbers. Besides, some advanced features have been integrated into the website to improve user experiences, such as the calendar and sweetalert2 plugin. And some basic features to go through job application processes, such as job application tracking and interview arrangement

    Semantic concept extraction from electronic medical records for enhancing information retrieval performance

    Get PDF
    With the healthcare industry increasingly using EMRs, there emerges an opportunity for knowledge discovery within the healthcare domain that was not possible with paper-based medical records. One such opportunity is to discover UMLS concepts from EMRs. However, with opportunities come challenges that need to be addressed. Medical verbiage is very different from common English verbiage and it is reasonable to assume extracting any information from medical text requires different protocols than what is currently used in common English text. This thesis proposes two new semantic matching models: Term-Based Matching and CUI-Based Matching. These two models use specialized biomedical text mining tools that extract medical concepts from EMRs. Extensive experiments to rank the extracted concepts are conducted on the University of Pittsburgh BLULab NLP Repository for the TREC 2011 Medical Records track dataset that consists of 101,711 EMRs that contain concepts in 34 predefined topics. This thesis compares the proposed semantic matching models against the traditional weighting equations and information retrieval tools used in the academic world today

    A Picture Is Worth a Thousand Words: Code Clone Detection Based on Image Similarity

    Get PDF
    This paper introduces a new code clone detection technique based on image similarity. The technique captures visual perception of code seen by humans in an IDE by applying syntax highlighting and images conversion on raw source code text. We compared two similarity measures, Jaccard and earth mover’s distance (EMD) for our image-based code clone detection technique. Jaccard similarity offered better detection performance than EMD. The F1 score of our technique on detecting Java clones with pervasive code modifications is comparable to five well-known code clone detectors: CCFinderX, Deckard, iClones, NiCad, and Simian. A Gaussian blur filter is chosen as a normalisation technique for type-2 and type-3 clones. We found that blurring code images before similarity computation resulted in higher precision and recall. The detection performance after including the blur filter increased by 1 to 6 percent. The manual investigation of clone pairs in three software systems revealed that our technique, while it missed some of the true clones, could also detect additional true clone pairs missed by NiCad

    Exploiting Context in Dealing with Programming Errors and Exceptions in the IDE

    Get PDF
    Studies show that software developers spend about 19% of their development time in web surfing. While collecting necessary information using traditional web search, they face several practical challenges. First, it does not consider context (i.e., surroundings, circumstances) of the programming problems during search unless the developers do so in search query formulation, and forces the developers to frequently switch between their working environment (e.g., IDE) and the web browser. Second, technical details (e.g., stack trace) of an encountered exception often contain a lot of information, and they cannot be directly used as a search query given that the traditional search engines do not support long queries. Third, traditional search generally returns hundreds of search results, and the developers need to manually analyze the result pages one by one in order to extract a working solution. Both manual analysis of a page for content relevant to the encountered exception (and its context) and working an appropriate solution out are non-trivial tasks. Traditional code search engines share the same set of limitations of the web search ones, and they also do not help much in collecting the code examples that can be used for handling the encountered exceptions. In this thesis, we present a context-aware and IDE-based approach that helps one overcome those four challenges above. In our first study, we propose and evaluate a context-aware meta search engine for programming errors and exceptions. The meta search collects results for any encountered exception in the IDE from three popular search engines- Google, Bing and Yahoo and one programming Q & A site- StackOverflow, refines and ranks the results against the detailed context of the encountered exception, and then recommends them within the IDE. From this study, we not only explore the potential of the context-aware and meta search based approach but also realize the significance of appropriate search queries in searching for programming solutions. In the second study, we propose and evaluate an automated query recommendation approach that exploits the technical details of an encountered exception, and recommends a ranked list of search queries. We found the recommended queries quite promising and comparable to the queries suggested by experts. We also note that the support for the developers can be further complemented by post-search content analysis. In the third study, we propose and evaluate an IDE-based context-aware content recommendation approach that identifies and recommends sections of a web page that are relevant to the encountered exception in the IDE. The idea is to reduce the cognitive effort of the developers in searching for content of interest (i.e., relevance) in the page, and we found the approach quite effective through extensive experiments and a limited user study. In our fourth study, we propose and evaluate a context-aware code search engine that collects code examples from a number of code repositories of GitHub, and the examples contain high quality handlers for the exception of interest. We validate the performance of each of our proposed approaches against existing relevant literature and also through several mini user studies. Finally, in order to further validate the applicability of our approaches, we integrate them into an Eclipse plug in prototype--ExcClipse. We then conduct a task-oriented user study with six participants, and report the findings which are significantly promising

    FORENSIC ANALYSIS OF THE GARMIN CONNECT ANDROID APPLICATION

    Get PDF
    Wearable smart devices are becoming more prevalent in our lives. These tiny devices read various health signals such as heart rate and pulse and also serve as companion devices that store sports activities and even their coordinates. This data is typically sent to the smartphone via a companion application installed. These applications hold a high forensic value because of the users’ private information they store. They can be crucial in a criminal investigation to understand what happened or where that person was during a given period. They also need to guarantee that the data is secure and that the application is not vulnerable to any attack that can lead to data leaks. The present work aims to do a complete forensic analysis of the companion application Garmin Connect for Android devices. We used a Garmin Smartband to generate data and test the application with a rooted Android device. This analysis is split into two parts. The first part will be a traditional Post Mortem analysis where we will present the application, data generation process, acquisition process, tools, and methodologies. Lastly, we analyzed the data extracted and studied what can be considered a forensic artifact. In the second part of this analysis, we performed a dynamic analysis. We used various offensive security techniques and methods to find vulnerabilities in the application code and network protocol to obtain data in transit. Besides completing the Garmin Connect application analysis, we contributed various modules and new features for the tool Android Logs Events And Protobuf Parser (ALEAPP) to help forensic practitioners analyze the application and to improve the open-source digital forensics landscape. We also used this analysis as a blueprint to explore six other fitness applications that can receive data from Garmin Connect. With this work, we could conclude that Garmin Connect stores a large quantity of private data in its device, making it of great importance in case of a forensic investigation. We also studied its robustness and could conclude that the application is not vulnerable to the tested scenarios. Nevertheless, we found a weakness in their communication methods that lets us obtain any data from the user even if it was not stored in the device. This fact increased its forensic importance even more
    corecore