47 research outputs found
XKMis: Effective and efficient keyword search in XML databases
We present XKMis, a system for keyword search in xml documents. Unlike previous work, our method is not based on the lowest common ancestor (LCA) or its variant, rather we divide the nodes into meaningful and self-containing information segments, called minimal information segments (MISs), and return MIS-subtrees which consist of MISs that are logically connected by the keywords. The MIS-subtrees are closer to what the user wants. The MIS-subtrees enable us to use the region code of xml trees to develop an algorithm for the search which is more efficient especially for large xml trees. We report our experiment results, which verify the better effectiveness and efficiency of our system. Copyright ©2009 ACM
Adaptive intelligent personalised learning (AIPL) environment
As individuals the ideal learning scenario would be a learning environment tailored just for how we like to learn, personalised to our requirements. This has previously been almost inconceivable given the complexities of learning, the constraints within the environments in which we teach, and the need for global repositories of knowledge to facilitate this process. Whilst it is still not necessarily achievable in its full sense this research project represents a path towards this ideal.In this thesis, findings from research into the development of a model (the Adaptive Intelligent Personalised Learning (AIPL)), the creation of a prototype implementation of a system designed around this model (the AIPL environment) and the construction of a suite of intelligent algorithms (Personalised Adaptive Filtering System (PAFS)) for personalised learning are presented and evaluated. A mixed methods approach is used in the evaluation of the AIPL environment. The AIPL model is built on the premise of an ideal system being one which does not just consider the individual but also considers groupings of likeminded individuals and their power to influence learner choice. The results show that: (1) There is a positive correlation for using group-learning-paradigms. (2) Using personalisation as a learning aid can help to facilitate individual learning and encourage learning on-line. (3) Using learning styles as a way of identifying and categorising the individuals can improve their on-line learning experience. (4) Using Adaptive Information Retrieval techniques linked to group-learning-paradigms can reduce and improve the problem of mis-matching. A number of approaches for further work to extend and expand upon the work presented are highlighted at the end of the Thesis
Three Essays on Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing and Text Mining
Patient recruitment and enrollment are critical factors for a successful clinical trial; however, recruitment tends to be the most common problem in most clinical trials. The success of a clinical trial depends on efficiently recruiting suitable patients to conduct the trial. Every clinical trial research has a protocol, which describes what will be done in the study and how it will be conducted. Also, the protocol ensures the safety of the trial subjects and the integrity of the data collected. The eligibility criteria section of clinical trial protocols is important because it specifies the necessary conditions that participants have to satisfy.
Since clinical trial eligibility criteria are usually written in free text form, they are not computer interpretable. To automate the analysis of the eligibility criteria, it is therefore necessary to transform those criteria into a computer-interpretable format. Unstructured format of eligibility criteria additionally create search efficiency issues. Thus, searching and selecting appropriate clinical trials for a patient from relatively large number of available trials is a complex task.
A few attempts have been made to automate the matching process between patients and clinical trials. However, those attempts have not fully integrated the entire matching process and have not exploited the state-of-the-art Natural Language Processing (NLP) techniques that may improve the matching performance. Given the importance of patient recruitment in clinical trial research, the objective of this research is to automate the matching process using NLP and text mining techniques and, thereby, improve the efficiency and effectiveness of the recruitment process.
This dissertation research, which comprises three essays, investigates the issues of clinical trial subject recruitment using state-of-the-art NLP and text mining techniques.
Essay 1: Building a Domain-Specific Lexicon for Clinical Trial Subject Eligibility Analysis
Essay 2: Clustering Clinical Trials Using Semantic-Based Feature Expansion
Essay 3: An Automatic Matching Process of Clinical Trial Subject Recruitment
In essay1, I develop a domain-specific lexicon for n-gram Named Entity Recognition (NER) in the breast cancer domain. The domain-specific dictionary is used for selection and reduction of n-gram features in clustering in eassy2. The domain-specific dictionary was evaluated by comparing it with Systematized Nomenclature of Medicine--Clinical Terms (SNOMED CT). The results showed that it add significant number of new terms which is very useful in effective natural language processing In essay 2, I explore the clustering of similar clinical trials using the domain-specific lexicon and term expansion using synonym from the Unified Medical Language System (UMLS). I generate word n-gram features and modify the features with the domain-specific dictionary matching process. In order to resolve semantic ambiguity, a semantic-based feature expansion technique using UMLS is applied. A hierarchical agglomerative clustering algorithm is used to generate clinical trial clusters. The focus is on summarization of clinical trial information in order to enhance trial search efficiency. Finally, in essay 3, I investigate an automatic matching process of clinical trial clusters and patient medical records. The patient records collected from a prior study were used to test our approach. The patient records were pre-processed by tokenization and lemmatization. The pre-processed patient information were then further enhanced by matching with breast cancer custom dictionary described in essay 1 and semantic feature expansion using UMLS Metathesaurus. Finally, I matched the patient record with clinical trial clusters to select the best matched cluster(s) and then with trials within the clusters. The matching results were evaluated by internal expert as well as external medical expert
Trust based Privacy Policy Enforcement in Cloud Computing
Cloud computing offers opportunities for organizations to reduce IT costs by using the computation and storage of a remote provider. Despite the benefits offered by cloud computing paradigm, organizations are still wary of delegating their computation and storage to a cloud service provider due to trust concerns. The trust issues with the cloud can be addressed by a combination of regulatory frameworks and supporting technologies. Privacy Enhancing Technologies (PET) and remote attestation provide the technologies for addressing the trust concerns. PET provides proactive measures through cryptography and selective dissemination of data to the client. Remote attestation mechanisms provides reactive measures by enabling the client to remotely verify if a provider is compromised. The contributions of this work are three fold. This thesis explores the PET landscape by studying in detail the implications of using PET in cloud architectures. The practicality of remote attestation in Software as a Service (SaaS) and Infrastructure as a Service (IaaS) scenarios is also analyzed and improvements have been proposed to the state of the art. This thesis also propose a fresh look at trust relationships in cloud computing, where a single provider changes its configuration for each client based on the subjective and dynamic trust assessments of clients. We conclude by proposing a plan for expanding on the completed work
Benchmarking VisualStudio.NET for the development and implementation of a manufacturing execution system
The focus of this thesis is to show the utility of Microsoft\u27s\u27 .NET framework in developing and implementing a MES system. The manufacturing environment today, more than ever, is working towards achieving better yields, productivity, quality, and customer satisfaction. Companies such as DELL are rapidly outgrowing their competition due to better management of their product lifecycles. The time between receiving a new order to the time the final product is shipped is getting shorter. Historically, business management applications such as Enterprise Resource Planning (ERP) systems and Customer Relationship Management (CRM) systems have been implemented without too much importance given to the operational and shop floor needs. The fact is that these business systems can be successful only when they are properly integrated with real-time data from the shop floor, which is the core of any manufacturing set-up. A Manufacturing Execution System or a MES is this link between the shop floor and the top floor. MESA international defines MES as Systems that deliver information enabling the optimization of production activities from order launch to finished goods Thus, a MES provides the right information to the right people at the right time in a right format, to help them make well-informed decisions. Thus, a necessity for an efficient MES is high capability of integration with the existing systems on the operational level. This is where Microsoft\u27s\u27 VS.NET fits in. Microsoft defines .NET as A set of software technologies for connecting information, people, systems and devices . The vision of .NET is to enable the end user to connect to information from any place at anytime, using any device and in a manner that is independent of the platform on which the service is based. The building block of the .NET framework is the Common Language Runtime or CLR, which is capable of converting data from its original format into a format understandable to .NET and then use that format to interface with its client. This feature that .NET provides holds the key in the context of a MES development and implementation. The aim of this applied research is to design a MES using VS.NET to control the working of a Flexible Manufacturing System (FMS) namely CAMCELL. The architecture used for the MES will then be gauged against an MES implementation done previously using a Siemens\u27 PC-based automation technology and Visual FoxPro. This study will integrate the Siemens\u27 technology with the .NET framework to enhance the resulting MES efficiency. The shop floor details or the real-time data collection will be done using the databases from WinCC and data aggregation and manipulation will be done within the .NET framework. The software architecture used for this study will achieve vertical integration between the CAMCELL ERP layer, the MES layer and the Control layer. The study will demonstrate how the data stored in a high level ERP database can be converted into useful information for the control layer for process control and also how real-time information gathered from the control layer can be filtered into useful information up to the ERP layer to facilitate the decision making process. VS.NET user interface screens will be proposed to support these activities. The performance of the proposed architecture will be compared to that from previous studies, thus benchmarking VS.NET for the implementation of the MES
Keyword-Based Querying for the Social Semantic Web
Enabling non-experts to publish data on the web is an important
achievement of the social web and one of the primary goals of the social
semantic web. Making the data easily accessible in turn has received only
little attention, which is problematic from the point of view of
incentives: users are likely to be less motivated to participate in the
creation of content if the use of this content is mostly reserved to
experts.
Querying in semantic wikis, for example, is typically realized in terms of
full text search over the textual content and a web query language such as
SPARQL for the annotations. This approach has two shortcomings that limit
the extent to which data can be leveraged by users: combined queries over
content and annotations are not possible, and users either are restricted
to expressing their query intent using simple but vague keyword queries or
have to learn a complex web query language.
The work presented in this dissertation investigates a more suitable form
of querying for semantic wikis that consolidates two seemingly conflicting
characteristics of query languages, ease of use and expressiveness. This
work was carried out in the context of the semantic wiki KiWi, but the
underlying ideas apply more generally to the social semantic and social
web.
We begin by defining a simple modular conceptual model for the KiWi wiki
that enables rich and expressive knowledge representation. A component of
this model are structured tags, an annotation formalism that is simple yet
flexible and expressive, and aims at bridging the gap between atomic tags
and RDF. The viability of the approach is confirmed by a user study, which
finds that structured tags are suitable for quickly annotating evolving
knowledge and are perceived well by the users.
The main contribution of this dissertation is the design and
implementation of KWQL, a query language for semantic wikis. KWQL combines
keyword search and web querying to enable querying that scales with user
experience and information need: basic queries are easy to express; as the
search criteria become more complex, more expertise is needed to formulate
the corresponding query. A novel aspect of KWQL is that it combines both
paradigms in a bottom-up fashion. It treats neither of the two as an
extension to the other, but instead integrates both in one framework. The
language allows for rich combined queries of full text, metadata, document
structure, and informal to formal semantic annotations. KWilt, the KWQL
query engine, provides the full expressive power of first-order queries,
but at the same time can evaluate basic queries at almost the speed of the
underlying search engine. KWQL is accompanied by the visual query language
visKWQL, and an editor that displays both the textual and visual form of
the current query and reflects changes to either representation in the
other. A user study shows that participants quickly learn to construct
KWQL and visKWQL queries, even when given only a short introduction.
KWQL allows users to sift the wealth of structure and annotations in an
information system for relevant data. If relevant data constitutes a
substantial fraction of all data, ranking becomes important. To this end,
we propose PEST, a novel ranking method that propagates relevance among
structurally related or similarly annotated data. Extensive experiments,
including a user study on a real life wiki, show that pest improves the
quality of the ranking over a range of existing ranking approaches
Recommended from our members
Biomedical Informatics Research Network (BIRN) Data Repository
The primary goal of this project was to streamline the submission process for researchers to categorize, approve, curate, and publish these data sets. By supporting the sharing and exchange of these data types in fulfillment of the National Institute of Health guidelines the BIRN Data Repository will have the ability to grow as the need develops for these researchers to easily share information
Recommended from our members
Ontology reuse and synthesis for modelling and simulation
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe proliferation and ubiquity of SemanticWeb technologies have transformed the way computer
society reshapes its technology through knowledge integration, knowledge reuse and
knowledge sharing. Ontology, one of the Semantic Web components, is a way to represent
domain knowledge into a human-understandable and machine-readable format. Ontology
in simulation has been seen as a conceptual model of a system in an explicit and unambiguous
manner, where it can be applied to better capture the modeler’s perspective of the
domain. Regarding an ontology for simulation modeling, by reusing ontologies, it helps to
reduce time and effort in attaining the domain knowledge, and at the same time assist in
domain understanding. For a semantically-richer simulation ontology, it is useful to engage
with real data and existing ontologies. This research contributes a rigorous method that extracts
domain knowledge, synthesizes processes performed within the domain, and builds a
minimal and viable ontology for simulation modeling, knownas aMinimal Viable Simulation
Ontology (MVSimO). The research method initially applies ontology selection techniques in
Ontology Reuse Framework (ORF) to obtain suitable existing ontologies for reuse. ORF incorporates
a module extraction technique during the domain conceptualization phase, where
the modules will represent domain knowledge as sub-ontologies. Formal Concept Analysis
is later applied to the real-world data to reveal the process details of the domain. Finally, the
development of MVSimO is completed by the derivation of event semantic of the processes.
The effectiveness of ontology selection and synthesizing methods, is reviewed by evaluating
the selected ontology knowledge extracted, and the detailed ontological model of MVSimO.
The evaluation of,MVSimO is performed to determine its agreement to the established simulation
model of the domain. The evaluation results are encouraging, providing concrete
outcomes of the new technique of ontology reuse and new development to the research area