4,812 research outputs found
Automatic mapping of free texts to bioinformatics ontology terms
Bioinformaatika valdkonnas on olemas palju tööriistu ja teenuseid, mille hulk kasvab üha kiirenevas tempos.\n\rEt informatsioon nende ressursside kohta oleks kättesaadav võimalikult kasulikul viisil, annoteerime me need ontoloogia terminitega.\n\rHetkel toimub annoteerimine käsitsi, mis on aeganõudev ja veaohtlik protsess.\n\rAntud töös seame eesmärgiks luua tööriist, mis aitab annotaatorit, pakkudes talle annoteerimissoovitusi.\n\rMe loome programmi, mis loeb sisse vabatekstilise tööriistade ja teenuste kirjelduse, lisab neile seotud veebilehtede ja teadusartiklite sisu ja sellel põhinevalt annab välja parimad leitud ontoloogia terminite vasted.\n\rSeejärel, optimimeerime programmi parameetreid käsitsi tehtud annotatsioonide põhjal.\n\rEsmased tulemused on paljulubavad -- paljud leitud soovitused on kooskõlas käsitsi tehtud annotatsioonidega.\n\rVeelgi enam, kogenud annotaatorite väitel on mitmed teised soovitused samuti korrektsed.In the field of bioinformatics, the number of tools and services is ever-increasing.\n\rIn order to make information about these resources available in a useful was, we annotate them with ontology terms.\n\rThis is currently done manually -- which is time-consuming and error-prone.\n\rIn this thesis, we set out to make a tool that helps the annotator by providing annotation suggestions.\n\rWe developed a program, that reads in free text descriptions of tools and services, adds content of web pages and publications related to the tool and based on this outputs best matches to ontology terms.\n\rThen, we optimised the parameters of the program on manually done annotation sets.\n\rInitial results look promising, as when comparing performance against these manual annotations, we see that many suggestions are agreeing with them.\n\rMoreover, according to experienced annotators, many of the other suggestions are also correct
Improving search engines with open Web-based SKOS vocabularies
Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThe volume of digital information is increasingly larger and even though organiza-tions are making more of this information available, without the proper tools users have great difficulties in retrieving documents about subjects of interest. Good infor-mation retrieval mechanisms are crucial for answering user information needs.
Nowadays, search engines are unavoidable - they are an essential feature in docu-ment management systems. However, achieving good relevancy is a difficult problem particularly when dealing with specific technical domains where vocabulary mismatch problems can be prejudicial. Numerous research works found that exploiting the lexi-cal or semantic relations of terms in a collection attenuates this problem.
In this dissertation, we aim to improve search results and user experience by inves-tigating the use of potentially connected Web vocabularies in information retrieval en-gines. In the context of open Web-based SKOS vocabularies we propose a query expan-sion framework implemented in a widely used IR system (Lucene/Solr), and evaluated using standard IR evaluation datasets.
The components described in this thesis were applied in the development of a new search system that was integrated with a rapid applications development tool in the context of an internship at Quidgest S.A.Fundação para a Ciência e Tecnologia - ImTV research project, in the context of the UTAustin-Portugal collaboration (UTA-Est/MAI/0010/2009); QSearch project (FCT/Quidgest
JobHam-place with smart recommend job options and candidate filtering options
Due to the increasing number of graduates, many applicants experience the
situation about finding a job, and employers experience difficulty filtering
job applicants, which might negatively impact their effectiveness. However,
most job-hunting websites lack job recommendation and CV filtering or ranking
functionality, which are not integrated into the system. Thus, a smart job
hunter combined with the above functionality will be conducted in this project,
which contains job recommendations, CV ranking and even a job dashboard for
skills and job applicant functionality. Job recommendation and CV ranking
starts from the automatic keyword extraction and end with the Job/CV ranking
algorithm. Automatic keyword extraction is implemented by Job2Skill and the
CV2Skill model based on Bert. Job2Skill consists of two components, text
encoder and Gru-based layers, while CV2Skill is mainly based on Bert and
fine-tunes the pre-trained model by the Resume- Entity dataset. Besides, to
match skills from CV and job description and rank lists of jobs and candidates,
job/CV ranking algorithms have been provided to compute the occurrence ratio of
skill words based on TFIDF score and match ratio of the total skill numbers.
Besides, some advanced features have been integrated into the website to
improve user experiences, such as the calendar and sweetalert2 plugin. And some
basic features to go through job application processes, such as job application
tracking and interview arrangement
Semantic concept extraction from electronic medical records for enhancing information retrieval performance
With the healthcare industry increasingly using EMRs, there emerges an opportunity for knowledge discovery within the healthcare domain that was not possible with paper-based medical records. One such opportunity is to discover UMLS concepts from EMRs. However, with opportunities come challenges that need to be addressed. Medical verbiage is very different from common English verbiage and it is reasonable to assume extracting any information from medical text requires different protocols than what is currently used in common English text. This thesis proposes two new semantic matching models: Term-Based Matching and CUI-Based Matching. These two models use specialized biomedical text mining tools that extract medical concepts from EMRs. Extensive experiments to rank the extracted concepts are conducted on the University of Pittsburgh BLULab NLP Repository for the TREC 2011 Medical Records track dataset that consists of 101,711 EMRs that contain concepts in 34 predefined topics. This thesis compares the proposed semantic matching models against the traditional weighting equations and information retrieval tools used in the academic world today
A Picture Is Worth a Thousand Words: Code Clone Detection Based on Image Similarity
This paper introduces a new code clone detection
technique based on image similarity. The technique captures
visual perception of code seen by humans in an IDE by applying
syntax highlighting and images conversion on raw source code
text. We compared two similarity measures, Jaccard and earth
mover’s distance (EMD) for our image-based code clone detection
technique. Jaccard similarity offered better detection performance
than EMD. The F1 score of our technique on detecting
Java clones with pervasive code modifications is comparable
to five well-known code clone detectors: CCFinderX, Deckard,
iClones, NiCad, and Simian. A Gaussian blur filter is chosen as a
normalisation technique for type-2 and type-3 clones. We found
that blurring code images before similarity computation resulted
in higher precision and recall. The detection performance after
including the blur filter increased by 1 to 6 percent. The manual
investigation of clone pairs in three software systems revealed that
our technique, while it missed some of the true clones, could also
detect additional true clone pairs missed by NiCad
Exploiting Context in Dealing with Programming Errors and Exceptions in the IDE
Studies show that software developers spend about 19% of their development time in web surfing. While collecting necessary information using traditional web search, they face several practical challenges. First, it does not consider context (i.e., surroundings, circumstances) of the programming problems during search unless the developers do so in search query formulation, and forces the developers to frequently switch between their working environment (e.g., IDE) and the web browser. Second, technical details (e.g., stack trace) of an encountered exception often contain a lot of information, and they cannot be directly used as a search query given that the traditional search engines do not support long queries. Third, traditional search generally returns hundreds of search results, and the developers need to manually analyze the result pages one by one in order to extract a working solution. Both manual analysis of a page for content relevant to the encountered exception (and its context) and working an appropriate solution out are non-trivial tasks. Traditional code search engines share the same set of limitations of the web search ones, and they also do not help much in collecting the code examples that can be used for handling the encountered exceptions.
In this thesis, we present a context-aware and IDE-based approach that helps one overcome those four challenges above. In our first study, we propose and evaluate a context-aware meta search engine for programming errors and exceptions. The meta search collects results for any encountered exception in the IDE from three popular search engines- Google, Bing and Yahoo and one programming Q & A site- StackOverflow, refines and ranks the results against the detailed context of the encountered exception, and then recommends them within the IDE. From this study, we not only explore the potential of the context-aware and meta search based approach but also realize the significance of appropriate search queries in searching for programming solutions. In the second study, we propose and evaluate an automated query recommendation approach that exploits the technical details of an encountered exception, and recommends a ranked list of search queries. We found the recommended queries quite promising and comparable to the queries suggested by experts. We also note that the support for the developers can be further complemented by post-search content analysis. In the third study, we propose and evaluate an IDE-based context-aware content recommendation approach that identifies and recommends sections of a web page that are relevant to the encountered exception in the IDE. The idea is to reduce the cognitive effort of the developers in searching for content of interest (i.e., relevance) in the page, and we found the approach quite effective through extensive experiments and a limited user study. In our fourth study, we propose and evaluate a context-aware code search engine that collects code examples from a number of code repositories of GitHub, and the examples contain high quality handlers for the exception of interest. We validate the performance of each of our proposed approaches against existing relevant literature and also through several mini user studies. Finally, in order to further validate the applicability of our approaches, we integrate them into an Eclipse plug in prototype--ExcClipse. We then conduct a task-oriented user study with six participants, and report the findings which are significantly promising
FORENSIC ANALYSIS OF THE GARMIN CONNECT ANDROID APPLICATION
Wearable smart devices are becoming more prevalent in our lives. These tiny devices
read various health signals such as heart rate and pulse and also serve as companion
devices that store sports activities and even their coordinates. This data is typically
sent to the smartphone via a companion application installed. These applications
hold a high forensic value because of the users’ private information they store. They
can be crucial in a criminal investigation to understand what happened or where
that person was during a given period. They also need to guarantee that the data
is secure and that the application is not vulnerable to any attack that can lead to
data leaks.
The present work aims to do a complete forensic analysis of the companion
application Garmin Connect for Android devices. We used a Garmin Smartband to
generate data and test the application with a rooted Android device. This analysis is
split into two parts. The first part will be a traditional Post Mortem analysis where
we will present the application, data generation process, acquisition process, tools,
and methodologies. Lastly, we analyzed the data extracted and studied what can
be considered a forensic artifact. In the second part of this analysis, we performed
a dynamic analysis. We used various offensive security techniques and methods to
find vulnerabilities in the application code and network protocol to obtain data in
transit.
Besides completing the Garmin Connect application analysis, we contributed
various modules and new features for the tool Android Logs Events And Protobuf
Parser (ALEAPP) to help forensic practitioners analyze the application and to
improve the open-source digital forensics landscape. We also used this analysis as a
blueprint to explore six other fitness applications that can receive data from Garmin
Connect.
With this work, we could conclude that Garmin Connect stores a large quantity
of private data in its device, making it of great importance in case of a forensic
investigation. We also studied its robustness and could conclude that the application
is not vulnerable to the tested scenarios. Nevertheless, we found a weakness in their
communication methods that lets us obtain any data from the user even if it was
not stored in the device. This fact increased its forensic importance even more
- …