38 research outputs found
情報量統計学的データ可視化ツール
本ソフト及び付随するドキュメントは、大学共同利用機関法人情報・システム研究機構統 計数理研究所が作成した仕様に基づいて、株式会社NTT データ数理システムが作成したも のである。著作権はNTT データ数理システムに属しているが、同社は統計数理研究所が配 布することに同意している。 (研究代表者 石黒真木夫@統計思考院@統計数理研究所 2015.2.15 科学研究費によって製作したソフトウェア モデル選択法による統計的推論へのデータ前処理組み込みに関する研究 平成23年度~ 平成26年度 挑戦的萌芽研究課題番号23650148 )現象解析の初期段階において多変数間の関係を見る道具として一般に広 く使われているものとして相関係数行列がある。しかし、それには次の2つ の大きな欠点がある。 (1) カテゴリ変数が扱えない (2) 非線形関数関係が扱えない 本ソフトは、変数間に成り立つモデルの赤池情報量基準( AIC )を考えるこ とにより (1)´ カテゴリ変数が扱える (2)´ 非線形関数関係が扱える ようにした汎用ソフトである。改定版では、ヒストグラムモデルの採用によって変数間のより複雑な関係が扱えるようになり、統計モデル可視化機能が追加されている。石黒真木夫@統計思考院@統計数理研究所 2015.2.15 科学研究費によって製作したソフトウェア モデル選択法による統計的推論へのデータ前処理組み込みに関する研究 平成23年度~ 平成26年度 挑戦的萌芽研究課題番号23650148 研究分担者 清水悟東京女子医科大学・医学部 種村正美統計数理研究所・名誉教授 三分一史和統計数理研究所・モデリング研究
OleksandrZadorozhnyiML/StMaRDI: Structure learning notebooks for MaRDI TA3 v3.
<p>This is a new release which stores the notebooks of statistical analysis in graphical modelling and causal inference (MaRDI Project, TA3)</p>
<p>Here we provide the short description of the available notebooks (5 notebooks are avalaible currently).</p>
<p><strong>Notebook_01</strong> - introduction to the problem of graphical modelling and causal inference. Implementation of the algorithms available in the package bnlearn on the dataset alarm which were collected from Zenodo repository.</p>
<p><strong>Notebook_02</strong> - introduction to the problem of subgraph (subset) selection in a "large" graph with available ground truth such that the independence relationships in the selected graph do not contradict to the ones in the supergraph. Implementation of different heuristics based on alarm graph and their benchmarking.</p>
<p><strong>Notebook_03</strong> - based on the data collected by telekom company estimate which covariate cause clients to become churn. Also a comparison to known methods which provide feature importance is presented. This notebook can also serve as an introductory notebook in the courses of statistical analysis.</p>
<p><strong>Notebook_04</strong> - benchmarking datasets from Tuebingen cause-effect repository by means of a statistical test which uses kernel-based HSIC methodology.</p>
<p><strong>Notebook_05</strong> - a notebook which introduces the problem of DAG estimation as a non-convex optimization problem over euclidean space. Implementation uses dataset from Bayesys collection collected and published earlier in zenodo community.</p>
Symposium Mathematical Processing of Cartographic Data (Tallinn, December 18-19, 1979). Summaries
Effect of Correlation Strength Uncertainty on Bayesian Calibration of Gas Turbine Simulator
Documenting and Evaluating Data Science Contributions in Academic Promotion in Departments of Statistics and Biostatistics
EPSRC HEED Data Repository: Surveys
The HEED project aims at understanding energy needs of refugees and displaced populations to improve access to clean energy. The focus of HEED is on the lived experiences of refugees living for protracted periods of time in three refugee camps in Rwanda (Nyabiheke, Gihembe and Kigeme) and internally displaced persons (IDPs) forced to leave their homes as a result of the 2015 earthquake in Nepal. As part of the project, an energy assessment survey of households in both countries was undertaken using quantitative and qualitative research methods with households living in different parts of the camps/settlements, entrepreneurs running small businesses, and those responsible for community facilities, such as schools and health clinics. In the first phase, a questionnaire-based survey targeting displaced populations was conducted with households living in three refugee camps in Rwanda and four displaced sites in Nepal (see tables 2.1 and 2.2 respectively). The second phase of the field research involved a series of interviews and focus group discussions with various stakeholders in Nepal and Rwanda. The surveys were designed and delivered between March and April 2018 by the project partner, Practical Action. In both countries, the enumerators for the survey received a two-day training on research methods, data collection and ethics. With regards to the household survey, the sample size was derived using Cochran’s formula as described by Bartlett et. al. in Organizational Research: Determining Appropriate Sample Size in Survey Research. A minimum sample size of 119 households was derived by applying a margin of error of 0.03 and an alpha of 0.5. A breakdown of the focal group and specific sites where the surveys were delivered in Rwanda and Nepal is shown in tables 2.1 and 2.2 respectively. In Rwanda, a total of 814 surveys including 622 households, 155 enterprises and 37 community facilities from across three sites were conducted. The sample distribution across camp shows 211 for Gihembe, 202 for Kigeme and 209 for Nyabiheke. In Gihembe more than half of the respondents (118, 55.9%) sampled were females with the remaining 93 (44.1%) being males. This is in contrast with Kigeme where almost equal numbers of both male (100, 49.5%) and females (102, 50.5%) were sampled. In Nyabiheke the sample covered more females (123, 58.9%) than males (86, 41.1%). In Nepal, the sample covered 181 households, 18 enterprises and 3 community facilities (see table 2.2). The household sample in Nepal covered more males (126, 69.6%) than females (55, 30.4%). Folder Structure: Surveys: Gihembe Community Facility Survey – Gihembe_CF.csv Gihembe Enterprise Survey – Gihembe_EN.csv Gihembe Household Survey – Gihembe_HH.csv Kigeme Community Facility Survey – Kigeme_CF.csv Kigeme Enterprise Survey – Kigeme_EN.csv Kigeme Household Survey – Kigeme_HH.csv Nepal Community Facility Survey – Nepal_CF.csv Nepal Enterprise Survey – Nepal_EN.csv Nepal Household Survey – Nepal_HH.csv Nyabiheke Community Facility Survey - Nyabiheke_CF.csv Nyabiheke Enterprise Survey – Nyabiheke_EN.csv Nyabiheke Household Survey – Nyabiheke_HH.csv Location Maps: Gihembe Community Facility Survey Map – CF_GIS_gihembe.csv Gihembe Enterprise Survey Map – EN_GIS_gihembe.csv Gihembe Household Survey Map – HH_GIS_gihembe.csv Kigeme Community Facility Survey Map – CF_GIS_kigeme.csv Kigeme Enterprise Survey Map – EN_GIS_kigeme.csv Kigeme Household Survey Map – HH_GIS_kigeme.csv Nepal Community Facility Survey Map – CF_GIS_nepal.csv Nepal Enterprise Survey Map – EN_GIS_nepal.csv Nepal Household Survey Map – HH_GIS_nepal.csv Nyabiheke Community Facility Survey Map - CF_GIS_nyabiheke.csv Nyabiheke Enterprise Survey Map – EN_GIS_nyabiheke.csv Nyabiheke Household Survey Map – HH_GIS_nyabiheke.csv The following information was gathered from each of the surveys: Households: The datasets contain information about household demographics, access to and use of electricity and lighting technologies, access to and use of cooking technologies and fuels, self-reported needs and priorities by the household, and ownership of energy products. Several key areas, such as solar lighting products and issues around fuel usage, are covered in more detail. Enterprises: The datasets contain information about the enterprise, their electrical and non-electrical lighting needs and supply, the usage of energy for ICT and entertainment, motive power, heating, and cooling applications, and their ownership of electrical appliances. Community facility: The datasets contain information about the community facility or institution, their electrical and non-electrical lighting needs and supply, the usage of energy for ICT and entertainment, motive power, heating, and cooling applications, and their ownership of electrical appliances. Community facilities offered healthcare services were presented additional questions about specific medical devices. The survey results together with other methodological tools including field visits, workshops - ‘Design for Displacement (D4D)’ and ‘Energy for End-Users’ (E4E) workshops have provided relevant data and contextual knowledge to inform the design of the various interventions associated with the HEED. The data sets and results have been compiled, organised and uploaded in the data portal for use by researchers, students and all both within and outside of the project consortium, during and beyond the project lifetime
Code and Dataset for Thesis "Bayesian Analysis of Spatial Log-Gaussian Cox Processes"
These files contain the relevant code and data to produce the results presented in the thesis titled "Bayesian Analysis of Spatial Log-Gaussian Cox Processes" by Nadeen Khaleel. These files contain the input data and the output results for the implementation of the models and exploratory analysis as well as the implementation of the Grid Mesh Optimisation method and the INLA within MCMC algorithms. Some of the input data corresponds to the processed crime data in US cities, in particular incidences of homicide and motor vehicle theft in Los Angeles, New York and Portland, aggregated to census-tract level or discretisation grids. The raw third party data is not included; however, a document detailing how to access the relevant data is provided and all of the code used to clean and extract the necessary data from the raw data is included. These files additionally contain the relevant code for the data tidying, manipulation and simulation as well as the code to implement the Grid Mesh Optimisation method and the INLA within MCMC algorithms
