I’ve finished up my Population Biology core examinations and teaching requirements and am beginning my research. Below are some projects I’ve recently finished or am currently working on.

X Chromosome Genealogies

One simulation of an X-genealogy 10 generations back. Gray indicates genealogical ancestors that are not X ancestors, blue indicates male ancestors, red indicates female ancestors, and white indicates genealogical X ancestors that do not share genetic material with the present day individual. The transparency of the red and blue arcs is proportional to the genetic material shared that X ancestor and the present-day individual.

During summer 2015, Graham, Steve Mount, and I worked out a set of probability distributions that model the number of blocks and their lengths one shares with an X chromosome ancestor $k$ generations back in the past. Additionally, we modelled the number and lengths blocks one shares with a present-day relative on the X chromosome. As ancestry services like 23andme and ancestry.com use identical by descent blocks1 between individuals to infer shared ancestry, Graham and I were curious how much information about ancestry is in X chromosomes and what these block distributions look like.

X chromosome blocks shared between a present-day individual and their ancestors.

Since the heterogametic sex only has one X ancestor (e.g. in humans, males have only one X ancestor in the previous generation — their mother), one’s number of X ancestors $k$ generations back is curiously enough the $(k+2)^\text{th}$ Fibonacci number. Additionally, X chromosomes only undergo meiotic recombination in females, so the distribution of recombinational meioses varies depending on the lineage to a particular ancestor. This leads to some neat combinatorics that we use in our derivation of our distributions.

Graham, Steven, and I have written up this manuscript, and a preprint is now available bioRxiv. As an scientific outreach component, I’ve also written a blog post about X genealogies and recent ancestry. See also Graham’s terrific blog posts on one’s number of genetic ancestors, how much genome one shares with a particular ancestor, and the Thanksgiving question on everyone’s mind: how of my genome do I share with a particular cousin?

Demographic and Geographic Summaries of Pairwise Coalescent Time Distributions

Preliminary versions of this method (top figure), show that my low-rank approximations of older coalescent times (as captured by short shared genomic regions) capture older European population history, while low-rank approximations (bottom figure) of more recent coalescent times (very long shared genomic regions) capture more recent history between individuals from Albania and Kosovo.

This is the project I proposed in my NSF Graduate Scholars Fellowship application (funded).

Most species live in spatially structured populations, connected through a complex demographic history of population movements and splits, migrations, expansions, and declines. The demography of populations both reflects and impacts processes on evolutionary time scales (e.g. speciation, local adaptation, gene flow) and ecological timescales (e.g. competition, disease spread, predation). Population genomic methods (Novembre et al., 2008; Li and Durbin, 2011) seek to infer these demographic and geographic histories.

However, I see current methods as having two limitations: (1) they are unable to reconstruct recent demographic and migratory events over the past tens to hundreds of generations, and (2) these methods are not well positioned to utilize the hundreds to thousands of individuals currently being sequenced and genotyped.

Graham and I are currently looking into how to tackle this problem by summarizing pairwise coalescent time distributions across each pair in a sample, and then summarizing these coalescent times across various time slices (e.g. 0-10, 11-30 generations, etc). Over all pairs of individuals we obtain a matrix of the number of shared common ancestors for each time period, which captures individuals’ relatedness in particular time slices. However, each matrix is likely to be sparse (contain many zero entries), due to noise in coalescent time estimation — which we’re tackling with statistical approaches that smooth across adjacent time slices and across individuals.

You can read my full proposal on Yaniv Brandvain’s collection of successful GRFP applications.