FamilySearch holds one of the largest collections of linked family history data in the world. Nearly one billion records of individuals, both deceased and living, have been recorded and placed together into a common tree (“The Family Tree”). The study of this ancestral relationship graph consists of the largest family history network ever analyzed. We have found a number of interesting properties in the network using common graph analysis techniques. We examine the topology of the graph by calculating the connected components within the graph. The total network consists of one giant component consisting of many millions of records plus millions of very small components. We also describe how this topology has changed over time. The paper further describes how an analysis of the strongly connected components and the graph’s diameter can be used to assess the quality of the data. Finally, we describe a heuristic algorithm to determine the “connectedness ” of our patrons and find that those who have logged into the system are significantly more connected than those that have not. One third of the potential users are connected to the giant component while 80 % of the active users are. We discuss how this analysis could potentially be used to partition the graph to support scaling or distributing the system
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.