7 research outputs found

    Research on Cloud Databases

    Get PDF
    随着云计算的发展,云数据库的重要性和价值日益显现;介绍了云数据库的特性、影响、相关产品;详细讨论了云数据库领域的研究问题,包括数据模型、系统体系架构、事务一致性、编程模型、数据安全、性能优化和测试基准等;最后讨论了云数据库的未来研究方向。With the recent development of cloud computing, the importance of cloud databases has been widely acknowledged. Here the features, influence and related products of cloud databases are first discussed. Then research issues of cloud databases are presented in detail, which include data model, architecture, consistency, programming model, data security, performance optimization, benchmark, and so on. Finally, some future trends in this area are discussed.国家自然科学基金(61001013, 61102136); 福建省自然科学基金(2011J05156, 2011J05158); 厦门大学基础创新科研基金(中央高校基本科研业务费专项资金)(2011121049, 2010121066

    云数据库

    Get PDF
    报告包括如下内容:云数据库概念和特点、云数据库与传统的分布式数据库、云数据库的影响、云数据库产品、云数据库领域的研究问题

    Bitcoin trading system

    Get PDF
    In this thesis an information solution was developed that enables the implementation of different trading strategies and backtesting over cryptocurrency Bitcoin trading data. Supported exchanges are Bitstamp, BTC-e and MtGox. In the field of technical analysis there already exist various solutions for Bitcoin that help traders to trade and advise them on basis of technical indicators and patterns. However, each has its own drawbacks, which we are aiming to fix. A web application was developed in Node.js technology, which in addition to the backtesting for each supported exchange also displays a chart that shows the value of cryptocurrency over time with Japanese candlesticks and market depth chart along with orderbook. The user has the option to implement its own strategy. Trading data, which are necessary for web application to function properly are obtained using APIs of supported exchanges with the help of Java programs and are stored in MongoDB database

    Embedding qualitative research in randomised controlled trials to improve recruitment: findings from two recruitment optimisation studies of orthopaedic surgical trials

    Get PDF
    Background: Recruitment of patients is one of the main challenges when designing and conducting randomised controlled trials (RCTs). Trials of rare injuries, or those that include surgical interventions pose added challenges due to the small number of potentially eligible patients and issues with patient preferences and surgeon equipoise. We explore key issues to consider when recruiting to orthopaedic surgical trials from the perspective of staff and patients with the aim of informing the development ofstrategies to improve recruitment in future research. Design: Two qualitative process evaluations of a United Kingdom-wide orthopaedic surgical RCT (ACTIVE) and mixed methods randomised feasibility study (PRESTO). Qualitative semi-structured interviews were conducted and data was analysed thematically. Setting: NHS secondary care organisations throughout the United Kingdom. Interviewswere undertaken via telephone. Participants: 37 health professionals including UK based spinal and orthopaedic surgeons and individuals involved in recruitment to the ACTIVE and PRESTO studies (e.g. research nurses, surgeons, physiotherapists). 22 patients including patients who agreed to participate in the ACTIVE and PRESTO studies (n=15) and patients that declined participation in the ACTIVE study (n=7) were interviewed. Results: We used a mixed methods systematic review of recruiting patients to randomised controlled trials as a framework for reporting and analysing our findings. Our findings mapped onto those identified in the systematic review and highlighted the importance of equipoise, randomisation, communication, patient’s circumstances, altruism and trust in clinical and research teams. Our findings also emphasised the importance of considering how eligibility criteria are operationalised and the impact of complex patient pathways when recruiting to surgical trials. In particular, the influence of health professionals, who are not involved in trial recruitment, on patients’ treatment preferences by suggesting they would receive a certain treatment ahead of recruitment consultations should not be underestimated. Conclusions: A wealth of evidence exploring factors affecting recruitment to randomised controlled trials exists. A methodological shift is now required to ensure that this evidence is used by all those involved in recruitment and to ensure that existing knowledge is translated into methods for optimising recruitment to future trials. Trial registries: ACTIVE: (ISRCTN98152560) PRESTO: (ISRCTN12094890

    Enterprise Data Mining & Machine Learning Framework on Cloud Computing for Investment Platforms

    Get PDF
    Machine Learning and Data Mining are two key components in decision making systems which can provide valuable in-sights quickly into huge data set. Turning raw data into meaningful information and converting it into actionable tasks makes organizations profitable and sustain immense competition. In the past decade we saw an increase in Data Mining algorithms and tools for financial market analysis, consumer products, manufacturing, insurance industry, social networks, scientific discoveries and warehousing. With vast amount of data available for analysis, the traditional tools and techniques are outdated for data analysis and decision support. Organizations are investing considerable amount of resources in the area of Data Mining Frameworks in order to emerge as market leaders. Machine Learning is a natural evolution of Data Mining. The existing Machine Learning techniques rely heavily on the underlying Data Mining techniques in which the Patterns Recognition is an essential component. Building an efficient Data Mining Framework is expensive and usually culminates in multi-year project for the organizations. The organization pay a heavy price for any delay or inefficient Data Mining foundation. In this research, we propose to build a cost effective and efficient Data Mining (DM) and Machine Learning (ML) Framework on cloud computing environment to solve the inherent limitations in the existing design methodologies. The elasticity of the cloud architecture solves the hardware constraint on businesses. Our research is focused on refining and enhancing the current Data Mining frameworks to build an enterprise data mining and machine learning framework. Our initial studies and techniques produced very promising results by reducing the existing build time considerably. Our technique of dividing the DM and ML Frameworks into several individual components (5 sub components) which can be reused at several phases of the final enterprise build is efficient and saves operational costs to the organization. Effective Aggregation using selective cuboids and parallel computations using Azure Cloud Services are few of many proposed techniques in our research. Our research produced a nimble, scalable portable architecture for enterprise wide implementation of DM and ML frameworks

    Definición de un framework para el análisis predictivo de datos no estructurados

    Get PDF
    La cantidad de información que se genera segundo a segundo en Internet aumenta en volumen y variedad cada día. La web 2.0, el Internet de las cosas y los dispositivos móviles son tan sólo algunos de los elementos que han generado tal incremento en el volumen de los datos. En el futuro cercano, la introducción de la tecnología 5G propiciará un incremento exponencial en la generación de datos al permitir una mayor transferencia de Gb/s. Por lo anterior, la investigación en esta área debe establecer las pautas que guíen el camino mediante el cual se puedan establecer metodologías para el análisis de los datos, así como medios para tratarlos. No obstante, el tamaño y la diversidad de estos datos hacen que tengan que conjuntarse diversas disciplinas científicas para poder analizar los datos y obtener hallazgos relevantes dentro de la información. Es decir, que no sólo se aplicarán las técnicas tradicionales para realizar el análisis, sino que se tendrán que conjuntar otras áreas de la ciencia para poder extraer la denominada ‘información oculta’ que se encuentra tras estos datos. Por otra parte, dentro de esta disponibilidad de datos que se está generando, la web 2.0 contribuye con el paradigma de las redes sociales y los tipos de datos (no estructurados) que estos generan, comúnmente texto libre. Este texto libre puede venir asociado a otros elementos dependiendo de la fuente de donde procedan, por ejemplo, pueden estar asociados a una escala de valoración de algún producto o servicio. Por todo lo anterior, esta tesis plantea la definición de un framework que permita el análisis de datos no estructurados de redes sociales mediante técnicas de aprendizaje automático, procesamiento de lenguaje natural y big data. Dentro de las características principales de este framework se tienen: - El framework está dividido en dos fases, cada una de las cuáles consta de un conjunto de etapas definidas con el propósito de analizar un volumen de datos ya sea pequeño (inferior a lo considerado big data) o grande (big data). - El elemento central de la fase uno del framework es el modelo de aprendizaje automático el cual consiste de dos elementos: (i) una serie de técnicas de procesamiento de lenguaje natural orientadas al preprocesamiento de datos y (ii) una serie de algoritmos de aprendizaje automático para la clasificación de la información. - El modelo de aprendizaje automático construido en la primera fase tiene como intención el poder ser empleado en la segunda (big data) para analizar el mismo origen de datos, pero a un volumen mucho mayor. - El modelo de aprendizaje automático no está relacionado directamente con la aplicación de determinados algoritmos para su uso, lo que lo convierte en un modelo versátil para emplear. De tal manera que como se observa, el marco en que se desenvuelve esta investigación es multidisciplinar al conjuntar diversas disciplinas científicas con un mismo propósito. Por lo cual, el resolver el problema de análisis de datos no estructurados provenientes de redes sociales requiere de la unión de técnicas heterogéneas procedentes de diversas áreas de la ciencia y la ingeniería. La metodología de investigación seguida para la elaboración de esta tesis doctoral ha consistido en: 1. Estado del Arte: Se presenta una selección de estudios que otros autores en las áreas de Big Data, Machine Learning y Procesamiento de Lenguaje Natural han realizado al respecto, así como la unión de estos temas con el área de análisis de sentimientos y los sistemas de calificación de redes sociales. También se presenta una comparativa que integra los temas abordados con el propósito de conocer el estado del arte en cuanto a lo que otros autores han propuesto en sus estudios al combinar las tres áreas cubiertas por el framework. 2. Estado de la Técnica: En esta fase se analizaron los diversos elementos que componen el framework y a partir de esto se presenta una retrospectiva teórica al respecto. Se abordan temas más técnicos, para lo cual se presenta un panorama de las tecnologías que se están empleando en la investigación actual. 3. Solución Propuesta: En esta fase se presenta el framework propuesto analizándolo desde dos perspectivas: los aspectos teóricos que comprende cada fase y los aspectos de implementación, en los cuáles se abordan temas como la complejidad de llevar a la práctica cada fase en una situación real. 4. Evaluación y Validación: Se definen una serie de pruebas destinadas a comprobar las hipótesis establecidas al principio de la investigación, para demostrar la validez del modelo propuesto. 5. Documentación y Conclusiones.: Esta actividad consistió en documentar todos los aspectos relacionados con esta tesis y presentar las conclusiones que surgen al término de la investigación. Por consiguiente, se construyó un framework que contempla dos fases a través de las cuáles se realiza el análisis de un conjunto de datos no estructurados, siendo una distinción de este framework la construcción de un modelo de aprendizaje automático durante la primera fase, que pretende servir como base en la segunda, la cual se caracteriza por el procesamiento de datos de gran volumen. Para poder validar este trabajo de tesis, se emplearon datos de Yelp, concretamente del sector de la hotelería. De igual manera, se evaluó el framework mediante la ejecución de diversas pruebas empleando clasificadores de aprendizaje automático, obteniendo porcentajes altos de predicción en la búsqueda binaria llevada a cabo tanto en el entorno no big data como en big data. Las conclusiones obtenidas tras haber diseñado el framework, así como haber analizado y validado los resultados conseguidos demuestran que el modelo presentado es capaz de analizar datos no estructurados de redes sociales tanto a una escala menor (no big data) como mayor (big data) de análisis. Por otra parte, interesantes retos y futuras líneas de investigación surgen tras haber concluido el modelo tanto para extenderlo hacia el análisis de otro tipo de información, como en el aspecto de la integración y adaptación del modelo de aprendizaje automático de la primera hacia la segunda fase.The amount of information generated continuously on the Internet increases in volume and variety each day. Web 2.0, the Internet of things and mobile devices are just some of the elements that have generated such an increase in the volume of data. In the near future, the introduction of 5G technology will lead to an exponential increase in data generation by allowing a greater Gb/s transfer. Therefore, research in this area should establish the guidelines that guide the way by which methodologies can be established for the analysis of data, as well as means to deal with them. However, the size and diversity of these data mean that different scientific disciplines have to be combined in order to analyze the data and obtain relevant findings within the information. That is, not only traditional techniques will be applied to carry out the analysis, but other areas of science will have to be combined in order to extract the so-called 'hidden information' found behind these data. On the other hand, in this availability of data being generated, web 2.0 contributes with the paradigm of social networks and the types of (unstructured) data that these generate, commonly free text. This free text may be associated with other elements depending on the source they come from, for example, they may be associated with a rating scale of a product or service. For all the above, this thesis proposes the definition of a framework that allows the analysis of unstructured data of social networks using machine learning, natural language processing and big data techniques. The main features of this framework are: - The framework is divided into two phases, each of which consists of a set of stages defined for the purpose of analyzing a volume of data either small (less than big data) or large (big data). - The central element of phase one of the framework is the machine learning model which consists of two elements: (i) a series of natural language processing techniques for data preprocessing and (ii) a series of machine learning algorithms for the classification of information. - The machine learning model built in the first phase is intended to be used in the second phase (big data phase) to analyze the same data source, but at a much larger volume. - The machine learning model is not directly related to the application of certain algorithms for its use, which makes it a versatile model to adopt. Therefore, the framework where this research is developed is multidisciplinary by combining diverse scientific disciplines with a same purpose. Therefore, to solve the problem of unstructured data analysis of social networks requires the union of heterogeneous techniques from various areas of science and engineering. The research methodology for the preparation of this doctoral thesis consisted of the following: 1. State of the Art: It presents a selection of studies where other authors in the Big Data, Machine Learning and Natural Language Processing areas have done research about them, as well as the union of these topics with sentiment analysis and social network rating systems. It also presents a comparison that integrates the mentioned topics with the purpose of knowing the state of the art in terms of what other authors have proposed in their studies by combining the three areas covered by the framework. 2. State of the Technique: In this phase, the various elements that make up the framework were analyzed, presenting a theoretical retrospective about. More technical issues are addressed, presenting an overview of the technologies that are being used in current research. 3. Proposed Solution: In this phase, the proposed framework is presented analyzing it from two perspectives: the theoretical aspects that each phase comprises and the aspects of implementation, where topics as complexity of carrying out each phase in a real situation are addressed. 4. Evaluation and Validation: A series of tests are defined to verify the hypotheses established at the beginning of the research, to demonstrate the validity of the proposed model. 5. Documentation and Conclusions: This activity consisted of documenting all the aspects related to this thesis and presenting the conclusions that emerge at the end of the research. Therefore, a framework was built including two phases that perform the analysis of a set of unstructured data, a distinction of this framework is the construction of a machine learning model during the first phase, which aims to serve as a basis in the second, characterized by the processing of large volume of data. In order to validate this thesis, Yelp data was used, specifically in the hotel sector. Likewise, the framework was evaluated by executing several tests using machine learning classifiers, obtaining high prediction percentages in the binary search carried out both in the non-big data and the big data environment. The conclusions obtained after having designed the framework, as well as having analyzed and validated the results obtained show that the presented model is capable of analyzing unstructured data of social networks both on a smaller scale (not big data) and a higher scale (big data) of analysis. On the other hand, interesting challenges and future lines of research arise after having completed the model for both extending it to the analysis of another type of information, as in the aspect of integration and adaptation of the machine learning model from the first to the second phase.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Alejandro Calderón Mateos.- Secretario: Alejandro Rodríguez González.- Vocal: Mario Graff Guerrer
    corecore