Wednesday, December 14, 2016

A genetics primer

Below I want to present the significant (at least as I have tried to understand them) genetic results for the case of Jackson Zuber, as given to me by his mother Emily. While obviously not intended to be a whole primer on the genetics, there should be enough detail so that we ourselves, and any professional geneticist, protein experimentalist or modeler, neurologist, neurobiologist, or radiologist clinician might extract the fuller picture and hopefully generate a few additional lines of inquiry.

Jackson is the person the geneticists designate as the 'proband', meaning the one who initiated the study, in this case a one year old boy. Exome sequencing revealed variants in four genes that were of significant clinical interest:

NEB (nebulin) c.11450G>A; p.S3817N
PLP1 (proteolipid protein 1) c.194T>G; p.I65S
ERCC6 (excision repair cross-complementation group 1) c.2924G>A; p.R975Q
PGAP1 (post-GPI attachment to proteins 1) c.2525+4C>T

The doctors logically focused on the PLP1 gene (and initially diagnosed the associated Pelizaeus-Merzbacher disease or 'PMD') because it is an X-linked homozygous gene. This means that Jackson only has the one copy of the gene, and would be particularly suseptibilty to any deleterious mutations in that gene . The other three genes are on 'autosomal chromosomes', heterozygous, and would therefore not immediately be prime suspects by virtue of the fact that another functioning copy of the gene is present…

That is not to say the other genes can be fully discounted, particularly given the absence of a full genome sequence which would contain any potential exome regions not analyzed in the exome sequence, including regulatory regions generally at the beginning of genes. It is also possible that one gene copy simply does not supply a required threshold level of the protein, or that the defective protein itself causing some new pathology…

In order to look closer at the specific case of Jackson's PLP1 variant we first need to decode and disambiguate the genetic notation for the variant; 'c.194T>G; p.I65S'. I will need to verify what I write here because errors are readily made.

The initial 'c.' indicates that we are looking at cDNA or complementary DNA, as we are dealing with exome sequencing info. It refers to an mRNA transcript's sequence expressed as DNA (GCAT) bases rather than as RNA (GCAU) bases. Having a 'genomic sequencing' reference (g.) would be a little more informative here for many reasons, namely, the presence of multiple transcription initiation sites (promoters), alternative splicing, the use of different poly-A addition signals, multiple translation initiation sites (ATG-codons), and the occurrence of length variations. Potentially, if exome sequencing draws on mRNAs after they are edited, (either in nucleus-specific or cytosol-specific editing), this would be an issue too, although RNA editing (post-transcriptional modification of bases, mostly A to G or A to I substitutions in humans) is quite rare.

As I understand the notation 194T>G, the 194th base pair position in Jackson's cDNA for PLP1 has a G while most normal cDNAs would have a T. Because G (like A) is a purine and T (like C) is a pyrimidine, this substitution is called a 'transversion' as opposed to a 'transition' (which would occur in the case of a purine to purine or pyrimidine to pyrimidine switch). Since there are natural mechanisms in the cell which more readily convert one-ring purines to other purines, or convert two-ring pyrimidines to other pyrimidines, transitions are significantly more common than transversions.

Because Jackson would have inherited his X chromosome with the variant PLP1 from his mom, the grandparents were checked and it was found that gramps had the same variant. Because gramps is asymptomatic the docs more or less recanted the PMD diagnosis. One possible explanation of this situation is that gramps could be a 'mosaic'. In other words the mutation was not present at the level of the sperm but rather arose later in development (as a somatic mutation), in which case it is possible that the cells that gave rise to gramps' nervous system have a normal copy of PLP1, and he is therefore quite normal. Another possibility is that gramps himself inherited the variant but none the less was able to repair it in the cells of his nervous system.

Although it would be rare, it is also conceivable that Jackson has the same mutation as gramps but it was independently gained, ie. it arose again in the bloodline as de novo variant in Jackson. Perhaps not totally inconceivable when you imagine that whatever genetic or metabolic background the gramps mutation originated in, a similar background would be expected to be present in Jackson. More typically the conventional thinking is that 'spontaneous' mutations arise more or less randomly during events like DNA synthesis when there is some non-negligible error rate during copying that escapes proofreading mechanisms.

It is also possible that the mutation does not have much effect in gramps' genetic background, but does have significant effect when occurring in the context of Jackson's genetic profile, ie. a 'facilitative' mutation necessary but not sufficient for PMD. One curious feature of PMD is that up to 70% of the patients have a duplicated PLP1 gene—an extra copy. It looks like this was explicitly checked for with Jackson, as exome sequencing wouldn't see it, but he did not have a duplication…

Isoleucine is a hydrophobic amino acid and serine is a polar and uncharged amino acid. These are fairly different animals altogether and it is normally assumed that this kind substitution should have some significant effect on protein structure or function. The question is what effect? In checking some of the common software tools and databases for this kind of thing we find that 'PolyPhen2' says the substitution is probably damaging, 'MutationTaster' isn't happy with it either, and it is not recorded in either ExAC or 1000G.

The canonical membrane structures of some of the various splice variants of the normal PLP1 protein have been determined well over a decade ago. It is a highly conserved protein that is virtually identical in several species from mouse to man. More recently, a few 3-D protein conformations, the actual crystal structures, have also been determined, sometimes in combination with other bound proteins. The presumptive membrane topology is four transmembrane helices, with the position 65 serine (or thereabouts depending on where the amino acid start count is done) lying at the extracellular apex of the first membrane helix. While serine can be phosphorylated in various proteins this may not be likely in the observed position.

As alternative splicing of the PLP gene yields four products—the classic PLP and DM20 proteolipids, and the more recently described proteolipids, srPLP and srDM20, it is important to try to understand how much of these various products are getting made by various kinds of cells in the nervous system, and their effects on those cells…

The main question I think, at each instance, is whether there too much of this protein or not enough, and then also what is the effect of a poorly functioning, nonfunctioning or otherwise obstructive protein in each case? To this point, it is known that while transgenic mice that overexpress the PLP gene exhibit neuronal degeneration and axonal disintegration, perhaps paradoxically, the absence of PLP/DM20 in PLP null mice also causes axonal swellings. Because this protein is normally so abundant, around 50% of the total myelin protein, small changes can have large effects.

It is not known if the serine spot should affect splicing (but note nearby splice site in picture below), or affect any of the protein's cross-linked cysteines, or alternatively affect any critical cysteine palmitoylations, but further study would be needed.

1 comment:

  1. [From the link in the post. I believe this to be erroneous as applied to Jackson. T>G means that T has been changed to a G. With this transposition, and the transpositions that follow from it, the paragraph works.] To get a better idea of how this T>G could have arisen I spoke with cell biologist Carl Smythe, a professor in the Department of Biomedical Science at University of Sheffield, and also geneticist Shane McKee, clinical director of the Belfast Health & Social Care Trust.The 'T>G' doesn't necessarily mean that a G has been directly changed into a T in the gene. For example, such a transversion could arise as a consequence of a mutation from a C to A on the non-coding strand. G-A bases can pair quite well (as do some others, although normal pairing is A to T and G to C) without causing major structural issues between the coding (sense) and noncoding (antisense) strands. As a consequence of this, the A would have a T inserted in opposite strand in the next round of synthesis…