1,123 research outputs found
OpenML: networked science in machine learning
Many sciences have made significant breakthroughs by adopting online tools
that help organize, structure and mine information that is too detailed to be
printed in journals. In this paper, we introduce OpenML, a place for machine
learning researchers to share and organize data in fine detail, so that they
can work more effectively, be more visible, and collaborate with others to
tackle harder problems. We discuss how OpenML relates to other examples of
networked science and what benefits it brings for machine learning research,
individual scientists, as well as students and practitioners.Comment: 12 pages, 10 figure
Searching Data: A Review of Observational Data Retrieval Practices in Selected Disciplines
A cross-disciplinary examination of the user behaviours involved in seeking
and evaluating data is surprisingly absent from the research data discussion.
This review explores the data retrieval literature to identify commonalities in
how users search for and evaluate observational research data. Two analytical
frameworks rooted in information retrieval and science technology studies are
used to identify key similarities in practices as a first step toward
developing a model describing data retrieval
Privacy-preserving efficient searchable encryption
Data storage and computation outsourcing to third-party managed data centers,
in environments such as Cloud Computing, is increasingly being adopted
by individuals, organizations, and governments. However, as cloud-based outsourcing
models expand to society-critical data and services, the lack of effective
and independent control over security and privacy conditions in such settings
presents significant challenges.
An interesting solution to these issues is to perform computations on encrypted
data, directly in the outsourcing servers. Such an approach benefits
from not requiring major data transfers and decryptions, increasing performance
and scalability of operations. Searching operations, an important application
case when cloud-backed repositories increase in number and size, are good examples
where security, efficiency, and precision are relevant requisites. Yet existing
proposals for searching encrypted data are still limited from multiple perspectives,
including usability, query expressiveness, and client-side performance and
scalability.
This thesis focuses on the design and evaluation of mechanisms for searching
encrypted data with improved efficiency, scalability, and usability. There are
two particular concerns addressed in the thesis: on one hand, the thesis aims at
supporting multiple media formats, especially text, images, and multimodal data
(i.e. data with multiple media formats simultaneously); on the other hand the
thesis addresses client-side overhead, and how it can be minimized in order to
support client applications executing in both high-performance desktop devices
and resource-constrained mobile devices.
From the research performed to address these issues, three core contributions
were developed and are presented in the thesis: (i) CloudCryptoSearch, a middleware
system for storing and searching text documents with privacy guarantees,
while supporting multiple modes of deployment (user device, local proxy, or computational cloud) and exploring different tradeoffs between security, usability, and performance; (ii) a novel framework for efficiently searching encrypted images
based on IES-CBIR, an Image Encryption Scheme with Content-Based Image
Retrieval properties that we also propose and evaluate; (iii) MIE, a Multimodal
Indexable Encryption distributed middleware that allows storing, sharing, and
searching encrypted multimodal data while minimizing client-side overhead and
supporting both desktop and mobile devices
Linking genes to literature: text mining, information extraction, and retrieval applications for biology
Efficient access to information contained in online scientific literature collections is essential for life science research, playing a crucial role from the initial stage of experiment planning to the final interpretation and communication of the results. The biological literature also constitutes the main information source for manual literature curation used by expert-curated databases. Following the increasing popularity of web-based applications for analyzing biological data, new text-mining and information extraction strategies are being implemented. These systems exploit existing regularities in natural language to extract biologically relevant information from electronic texts automatically. The aim of the BioCreative challenge is to promote the development of such tools and to provide insight into their performance. This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the following: the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. The current trend in biomedical text mining points toward an increasing diversification in terms of application types and techniques, together with integration of domain-specific resources such as ontologies. Additional descriptions of some of the systems discussed here are available on the internet
AI in Medical Imaging Informatics: Current Challenges and Future Directions
This paper reviews state-of-the-art research solutions across the spectrum of medical imaging informatics, discusses clinical translation, and provides future directions for advancing clinical practice. More specifically, it summarizes advances in medical imaging acquisition technologies for different modalities, highlighting the necessity for efficient medical data management strategies in the context of AI in big healthcare data analytics. It then provides a synopsis of contemporary and emerging algorithmic methods for disease classification and organ/ tissue segmentation, focusing on AI and deep learning architectures that have already become the de facto approach. The clinical benefits of in-silico modelling advances linked with evolving 3D reconstruction and visualization applications are further documented. Concluding, integrative analytics approaches driven by associate research branches highlighted in this study promise to revolutionize imaging informatics as known today across the healthcare continuum for both radiology and digital pathology applications. The latter, is projected to enable informed, more accurate diagnosis, timely prognosis, and effective treatment planning, underpinning precision medicine
Integration among databases and data sets to support productive nanotechnology: Challenges and recommendations
Many groups within the broad field of nanoinformatics are already developing data repositories and analytical tools driven by their individual organizational goals. Integrating these data resources across disciplines and with non-nanotechnology resources can support multiple objectives by enabling the reuse of the same information. Integration can also serve as the impetus for novel scientific discoveries by providing the framework to support deeper data analyses. This article discusses current data integration practices in nanoinformatics and in comparable mature fields, and nanotechnology-specific challenges impacting data integration. Based on results from a nanoinformatics-community-wide survey, recommendations for achieving integration of existing operational nanotechnology resources are presented. Nanotechnology-specific data integration challenges, if effectively resolved, can foster the application and validation of nanotechnology within and across disciplines. This paper is one of a series of articles by the Nanomaterial Data Curation Initiative that address data issues such as data curation workflows, data completeness and quality, curator responsibilities, and metadata
Information retrieval and text mining technologies for chemistry
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European
Communityâs Horizon 2020 Program (project reference:
654021 - OpenMinted). M.K. additionally acknowledges the
Encomienda MINETAD-CNIO as part of the Plan for the
Advancement of Language Technology. O.R. and J.O. thank
the Foundation for Applied Medical Research (FIMA),
University of Navarra (Pamplona, Spain). This work was
partially funded by ConselleriÌa
de Cultura, EducacioÌn e OrdenacioÌn Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic
funding of UID/BIO/04469/2013 unit and COMPETE 2020
(POCI-01-0145-FEDER-006684). We thank InÌigo GarciaÌ -Yoldi
for useful feedback and discussions during the preparation of
the manuscript.info:eu-repo/semantics/publishedVersio
Infrastructuring educational genomics:Associations, architectures and apparatuses
Technoscientific transformations in molecular genomics have begun to influence knowledge production in education. Interdisciplinary scientific consortia are seeking to identify âgenetic influencesâ on âeducationally relevantâ traits, behaviors, and outcomes. This article examines the emerging âknowledge infrastructureâ of educational genomics, attending to the assembly and choreography of organizational associations, epistemic architecture, and technoscientific apparatuses implicated in the generation of genomic understandings from masses of bioinformation. As an infrastructure of datafied knowledge production, educational genomics is embedded in data-centered epistemologies and practices which recast educational problems in terms of molecular genetic associationsâinsights about which are deemed discoverable from digital bioinformation and potentially open to genetically informed interventions in policy and practice. While scientists claim to be âopening the black box of the genomeâ and its association with educational outcomes, we open the black box of educational genomics itself as a source of emerging scientific authority. Data-intensive educational genomics does not straightforwardly âdiscoverâ the biological bases of educationally relevant behaviors and outcomes. Rather, this knowledge infrastructure is also an experimental âontological infrastructureâ supporting particular ways of knowing, understanding, explaining, and intervening in education, and recasting the human subjects of education as being surveyable and predictable through the algorithmic processing of bioinformation
Technical Debt in Software Development : Examining Premises and Overcoming Implementation for EïŹcient Management
Software development is a unique ïŹeld of engineering: all software constructs retain their modiïŹability â arguably, at least â until client release, no single project stakeholder has exhaustive knowledge about the project, and even this portion of the knowledge is generally acquired only at project completion. These characteristics imply that the ïŹeld of software development is subject to design decisions that are known to be sub-optimalâeither deliberately emphasizing interests of particular stakeholders or indeliberately harming the project due to lack of exhaustive knowledge. Technical debt is a concept that accounts for these decisions and their eïŹects. The conceptâs intention is to capture, track, and manage the decisions and their products: the aïŹected software constructs.
Reviewing the previous, it is vital for software development projects to acknowledge technical debt both as an enabler and as a hindrance. This thesis looks into facilitating eïŹcient technical debt management for varying software development projects. In the thesis, examination of technical debtâs role in software development produces the premises on to which a management implementation approach is introduced.
The thesis begins with a revision of motivations. Basing on prior research in the ïŹelds of technical debt management and software engineering in general, the ïŹve motivations establish the premises for technical debt in software development. These include notions of subjectivity in technical debt estimation, update frequency demands posed on technical debt information, and technical debtâs polymorphism. Three research questions are derived from the motivations. They ask for tooling support for technical debt management, capturing and modelling technical debt propagation, and characterizing software development environments and their technical debt instances. The questions imply consecutive completion as the ïŹrst pursued tool would beneïŹt fromâpossibly automatically assessableâpropagation models, and ïŹnally the toolâs introduction to software development organizations could be assisted by tailoring it based on the software development environment and the technical debt instance characterizations.
The thesis has seven included publications. In introducing them, the thesis maps their backgrounds to the motivations and their outcomes to the research questions. Amongst the outcomes are the DebtFlag tool for technical debt management, the procedures for retrospectively capturing technical debt from software repositories, a procedure for technical debt propagation model creation from these retrospectives, and a multi-national survey characterizing software development environments and their technical debt instances.
The thesis concludes that the tooling support, the technical debt propagation modelling, and the software environment and technical debt instance characterization describe an implementation approach to further eïŹcient technical debt management. Simultaneously, future work is implied as all previously described eïŹorts need to be continued and extended. Challenges also remain in the introduced approach. An example of this is the combinatorial explosion of technology-development-context-combinations that technical debt propagation modelling needs to consider. All combinations have to be managed if exhaustive modelling is desired. There is, however, a great deal of motivation to pursue these eïŹorts when one re-notes that technical debt is a permanent component of software development that, when correctly managed, is a development eïŹciency mechanism comparable to a ïŹnancial loan investment.Ohjelmistokehitys on uniikki tekniikan ala: kaikki ohjelmistorakenteet sĂ€ilyttĂ€vĂ€t muokattavuutensa â otaksuttavasti ainakin â asiakasjulkaisuun asti. YhdenkÀÀn projektiosakkaan tietĂ€mys ei kata koko projektia ja merkittĂ€vĂ€ osa tĂ€stĂ€kin tiedosta karttuu vasta projektin suorittamisen aikana. NĂ€mĂ€ ominaisuudet antavat ymmĂ€rtÀÀ, ettĂ€ ohjelmistokehitysala on sellaisten suunnitelupÀÀtösten kohde, joiden tiedetÀÀn olevan epĂ€tĂ€ydellisiĂ€âjoko tarkoituksella tiettyjen projektiosakkaiden intressejĂ€ painottavia tai tahattomasti projektia vahingoittavia puutteelliseen tietoon perustuvia. Tekninen velka on konsepti, joka huomioi nĂ€mĂ€ pÀÀtökset sekĂ€ niiden vaikutukset. Konseptin tarkoitus on havaita, seurata ja hallita nĂ€itĂ€ pÀÀtöksiĂ€ sekĂ€ tuloksena syntyviĂ€ teknisen velan vaikutuksen alla olevia ohjelmistorakenteita.
Edellisen kuvauksen valossa ohjelmistokehitysprojekteille on erityisen tÀrkeÀÀ huomioida tekninen velka sekÀ mahdollistajana ettÀ hidasteena. TÀmÀn vuoksi kyseinen vÀitöskirja perehtyy tehokkaan teknisen velan hallinnan fasilitointiin moninaisille ohjelmistokehitysprojekteille. VÀitöskirjassa tarkastellaan teknisen velan roolia osana ohjelmistokehitystÀ. Tarkastelu tuottaa joukon premissejÀ, joihin perustuen esitellÀÀn lÀhestymistapa teknisen velan hallinnan toteuttamiselle.
Viisi vĂ€itöskirjan alussa esitettyĂ€ motivaatiota kiinnittĂ€vĂ€t ne premissit,joille ratkaisu esitetÀÀn. Motivaatiot rakennetaan olemassa olevaan teknisen velan sekĂ€ ohjelmistotekniikan tutkimustietoon perustuen. NĂ€ihin lukeutuvat muun muassa subjektiivisuus teknisen velan estimoinnissa, teknisen velan informaatiolle nĂ€hdyt pĂ€ivitystaajuusvaatimukset sekĂ€ teknisen velan polymorïŹsmi. Havainnoista johdetaan kolme tutkimuskysymystĂ€. Ne tavoittelevat työkalutukea teknisen velan hallinnalle, velan propagoitumisen havainnointia sekĂ€ mallinnusta kuin myös ohjelmistotuotantoympĂ€ristöjen ja niiden velka instanssien kuvaamista. Tutkimuskysymykset implikoivat perĂ€kkĂ€istĂ€ suoritusta: tavoiteltu työkalu hyötyyâmahdollisesti automaattisesti arvoitavistaâteknisen velan propagaatiomalleista. Valmiin työkalun kĂ€yttöönottoa voidaan taas edistÀÀ jos kuvaukset kehitysympĂ€ristöistĂ€ sekĂ€ niiden velkainstansseista ovat kĂ€ytettĂ€vissĂ€ työkalun rÀÀtĂ€löintiin.
VÀitöskirjaaan sisÀltyy seitsemÀn julkaisua. VÀitöskirja esittelee ne kiinnittÀmÀllÀ julkaisujen taustatyön aikaisemmin mainittuihin motivaatioihin sekÀ niiden tulokset edellisiin tutkimuskysymyksiin. Tuloksista huomioidaan esimerkiksi DebtFlag-työkalu teknisen velan hallintaan, retrospektiivinen prosessi teknisen velan kartoittamiselle versionhallintajÀrjestelmistÀ, prosessi teknisen velan mallien rakentamiselle nÀistÀ kartoituksista ja monikansallinen kyselytutkimus ohjelmistokehitysympÀristöjen sekÀ nÀiden teknisen velan instanssien luonnehtimiseksi.
VĂ€itöskirjan yhteenvetona huomioidaan, ettĂ€ teknisen velan hallinnan työkalutuki, teknisen velan propagaatiomallinnus ja ohjelmistokehitysympĂ€ristöjen sekĂ€ niiden teknisen velan instanssien luonnehdinta muodostavat toteutustavan, jolla teknisen velan tehokasta hallintaa voidaan kehittÀÀ. Samalla implikoidaan jatkotoimia, sillĂ€ kaikkia edellĂ€ kuvattuja työn osia tulee jatkaa ja laajentaa. Toteutustavalle nĂ€hdÀÀn myös haasteita. ErĂ€s nĂ€istĂ€ on kombinatorinen rĂ€jĂ€hdys teknologia- ja kehityskontekstikombinaatioille. Kaikki kombinaatiot tulee huomioida mikĂ€li teknisen velan propagaatiomallinnuksesta halutaan kattavaa. Motivaatio vĂ€itöskirjassa esitetyn työn jatkamiselle on huomattavaa ja sitĂ€ kasvattaa entuudestaan edellĂ€ tehty huomio siitĂ€, ettĂ€ tekninen velka on pysyvĂ€ komponentti ohjelmistokehityksessĂ€, joka oikein hallittuna on kehitystehokkuutta edistĂ€vĂ€nĂ€ komponenttina verrattavissa ïŹnanssialan lainainvestointiin.Siirretty Doriast
- âŠ