Variance and variability

Today, Léo presented on the mathematical principles behind a Principal Component Analysis (PCA). It was a great refresher for myself, and amazing to give students a clear picture of what this technique does and why we use it so much for dimensionality reduction. Also, Léo prompted a discussion about why variance and variability are so important in understanding the world with these images of Mona Lisa (La Joconde), one of which is Mona Lisa with her Cat (its name is Zarathustra). I loved it! He explained that in terms of information theory,  variation is where the information resides. Information comes from comparison (of the two paintings), and this information is what makes the “message” important (there is a cat in Mona Lisa’s arms). This whole discussion about variance, comparison and variability is really in line with the major aim of the entire field of population genetics (my field !!), a field “that deals with genetic differences within and between populations, and is a part of evolutionary biology “, as formulated by Wikipedia. This is probably one reason why PCA is so widely used in analysis of biological data. Also, please note that deep learning methods can generate tons of art work with cats!

Professional bias

The featured picture from last week lab meeting is a figure from Justin’s presentation, made by Isabel. First, I like it because I am happy to see that trainees are collaborating! Second, it combines recombination hotspots with Cytochrome P450 (CYP) genes. It shows the fine-scale recombination rates, from genetic maps computed in different human populations from the 1000 Genomes data, in the CYP4F gene cluster.

Recombination is the process by which every child receives a unique mosaic of parental chromosomes. In most species, recombination occurs in narrow genomic segments, called recombination hotspots. My research until recently was mainly dedicated to the study of recombination hotspots, and the fascinating gene PRDM9, which I like to say is my favorite gene in the entire genome. It evolves very rapidly under strong positive selection, is implicated in disease, is critical for fertility. More recently, however, I have had another favorite gene family, the CYP genes. CYP enzymes are able to catalyze a considerable variety of oxidations for many structural classes of chemicals (including the majority of drugs), in all forms of life (bacteria, fungi, plants, birds, insect, reptiles and mammals). Similarly to PRDM9, they evolve quickly : these genes exhibit an exceptionally high number of mutations. Striking inter-individual and geographic differences in CYP allele frequencies are found in humans. The main hypothesis is that their evolution has first been influenced by interactions between animals and plants, and second, by diet and environmental pollutants impacting humans over thousands of years and differing between ethnic groups.

I am now fascinated by the evolution of these CYP families, but, by professional bias, let’s start by looking at the recombination landscape!


Further reading

Alves I, Houle AA, Hussin JG, Awadalla P. The impact of recombination on human mutation load and disease. Philos Trans R Soc Lond B Biol Sci. 2017 Dec 19;372(1736).

Paigen, K and Petkov P. PRDM9 and Its Role in Genetic Recombination. Trends in Genetics, 2018. 34(4): p.291-300.

Gonzalez, F.J. and D.W. Nebert, Evolution of the P450 gene superfamily: animal-plant ‘warfare’, molecular drive and human genetic differences in drug oxidation. Trends Genet, 1990. 6(6): p.182-6.

Nebert, D.W., Polymorphisms in drug-metabolizing enzymes: what is their clinical relevance and why do they exist? Am J Hum Genet, 1997. 60(2): p.265-71.

Mathematics of PCA

This week we had our first official OMICS group meeting : there was 12 of us around the table! After each group meeting, I will choose a picture I liked from the presenters’ slides and post it here. This week, it is a slide from Léo! He made a suberb PCA plot, that looks like a fly! Who can guess which dataset this is ?