Plasmids for Genomic Integration of Psilocybin's Enzymatic Pathway in S. Cerevisiae
aka: trying to deconvolute an academic paper
Plasmids for Genomic Integration of Psilocybin's Enzymatic Pathway in S. Cerevisiae
aka: trying to deconvolute an academic paper
How I found/designed these Plasmids:
Now that I have the genetic sequences for each enzyme in the psilocybin metabolic pathway, I want to understand how to put these genes into plasmids that could be expressed in S. cerevisiae.
There are three strains from from Milne et al. 2020 that I am specifically interested in: the strain with just the Minimal Pathway, the strain with the Minimal Pathway and PcCPR, and the top producing strain. I'd like to know what plasmid(s) went into each of these yeast strains:
ST9327 - just the Minimal Pathway
ST9328 - the Minimal Pathway plus PcCPR (expressed with TEF1)
ST9482 - top producing strain
Table 1 of Milne et al. details the parental strain and what DNA element was added to create each strain. However, I find this table to be extremely convoluted and difficult to follow, so I made some figures that are over-the-top legible, to me:
There are four things to be aware of: the strain, the added DNA element, the integration site, and the gene that is contained on the DNA element. Look here if you don't know what these blobby colorful things are.
The Minimal Pathway:
Just the Minimal Pathway, ST9327, produces 2.2 ± 0.7 mg/L psilocybin, and is created with the iterative addition of four genetic elements: pCfB2312 which is episomally expressed, pCfB8794 at integration site XII-5, BB3923 at XI-3, and pCfB255 at X-2.
The genetic elements titled pCfBXXX are obviously plasmids, but I'm not sure if BB3923 is a plasmid so I'll keep an eye out to solve this confusion.
The Minimal Pathway plus PcCPR:
ST9328 produces 137.1 ± 8.3 mg/L psilocybin and is created with the addition of five genetic elements: pCfB2312 which is episomally expressed, pCfB8794 at integration site XII-5, BB3923 at XI-3, BB3939 at XI-1, and pCfB255 at X-2.
The Top Producing Pathway:
The top producing pathway ST9482 produces 200.5 ± 6.5 mg/L psilocybin, and 627 ± 140 mg/L of psilocybin with fed-batch fermentation. ST9482 is created with eight genetic elements: pCfB2312 which is episomally expressed, pCfB8794 at integration site XII-5, BB3923 at XI-3, BB3939 at XI-1, PR-23852 is used to knock out ric1, BB4020 at X-4, pCfB9074 at XII-4, and pCfB8796 at Ty4.
I don't know if I am the crazy one for needing to make diagrams like this to understand Table 1 of Milne et al. or if the people who find Table 1 legible are the crazy ones. Or if such people (other than the authors) actually exist. Also, there are more strain trees that could be made from Table 1, but I am only interested in these for now.
At this point, I have a bunch of questions and confusions and I just want to make a list of plasmids that I need to find from the previously listed kits.
Pathways and Plasmids of Interest:
These are not demonstrated in Milne et al. Below are two hypothetical trees of transformations that I think would be interesting.
The tree below is the Minimal Pathway plus PcCPR and multiple copies of PcPsiM. This is interesting to me because it does not require any knockouts or over expressions, just insertions, and I think could produce a substantial amount of psilocybin. The sequence of transformations requires:
pCfB2312
pCfB8794
BB3923
BB3939
pCfB8796
The tree of transformations shown below is the same as above, but includes the ric1 knockout and over expressions of ARO1 and ARO2. This is essentially the top producing pathway from Mine et al, except that it is missing the ARO4 and TRP2 mutants which were shown to decrease yield in Figure 4B of Milne et al. The sequence of transformations requires the same as above, plus:
PR-23852
BB4020
Find the Plasmids:
Now it is time to find the plasmids. Ideally, the paper would just give the sequences for each plasmid used, but of course, this is not the case, so I have to be a hunting dog and sniff them out. The Materials and Methods, section 2.2, Plasmid and strain construction, looks promising:
"Single integration plasmids were constructed using the EasyClone-MarkerFree system (Jessop-Fabre et al., 2016), and multiple integration plasmids were constructed using a modified version of the EasyCloneMulti system (Maury et al., 2016) using a backbone plasmid where multiple integration into the S. cerevisiae genome was achieved using a Kluyveromyces lactis URA3 gene (KlURA3) under control of a truncated 10bp KlURA3 promoter... A list of all primers, biobricks and plasmids used in this study can be found in the supplementary data (Supplementary Tables 1, 2 and 3)."
It seems most of the plasmid backbones can be found in the EasyClone-MarkerFree kit, and the precursor to the one multi-integration plasmid (for PcPsiM) can be found in the EasyCloneMulti kit. These are general cloning toolkits and won't have the psilocybin genes in them, so I'll to find which plasmids pCfB2312, pCfB8794, BB3923, BB3939, pCfB8796, PR-23852, and BB4020 originated from. A quick scan of Supplementary Table 1 reveals that it is horribly unobvious, but I can find some of the plasmids I need from the EasyClone Kit:
I know which gene corresponds to which integration site, so I can identify which plasmid used in this study originates from which plasmid in the kit. For whatever reason, the plasmids for XI-1, X-4, and Ty-4 are not in the supplement list, but are fortunately obviously found in each kit's contents. Mapping this all out is extremely annoying and I can't help but think I must be missing something, but I don't know what, so I'll continue. This is what I have so far:
pCfB2312 — Cas9 — in EasyClone-MarkerFree Kit
pCfB8794 — PsiM and PsiK at XII-5 — derived from pCfB2909 in EasyClone-MarkerFree Kit
BB3923 — CrTdc and PsiH at XI-3 — derived from pCfB2904 in EasyClone-MarkerFree Kit
BB3939 — PcCPR at XI-1 — derived from pCfB3036 in EasyClone-MarkerFree Kit
pCfB8796 — multiple PsiM at Ty-4 - derived from pCfB8221 — probably derived from pCfB2796 in the EasyCloneMulti Kit
PR-23852 — to knock out ric1
BB4020 — ARO1 and ARO2 at X-4
At this point, I don't care about ric1 and ARO1 and ARO2, I'm not gonna hunt for the plasmids that enable those edits, right now. This means I have the backbones for all the plasmids that could go into STNESS1:
Find the Promoters, Terminators, and Orientations:
The next step is to understand the gene cassettes that get cloned into these backbones. The cassettes, and what backbone they go into are as follows:
PsiM and PsiK, into pCfB2909
CrTdc and PsiH, into pCfB2904
PcCPR, into pCfB3036
multiple PsiM, into pCfB2796
Each of these genes must have a promoter before and a terminator after. I also need to know the orientation of each promoter-gene-terminator combination. After inspecting these plasmid maps from the kit, it looks like each has one or two terminators, which may, or may not, be used in the final plasmid designs. Supplementary Table 1 indicates the orientation for the PsiM and PsiK cassette. It also show the orientation for a PcCPR and PsiH cassette at X-1, but I'm not sure where that is used. This "Plasmids for integration in yeast genome" list is wildly incomplete and quite confusing.
Supplementary Table 2 indicates some of the promoters, and orientations used for the CrTdc and PsiH, and PcCPR cassettes:
This would seem to give us the promoters and orientation for:
However, on the hunt for which promoters are used for which genes, I found a breadcrumb in the main text of Milne et al:
in Figure 4's description:
"Strain expressing Crtdc, PcpsiH, PcpsiK, PcpsiM and Pccpr from the TEF1 promoter. "
I'm surprised that they used the same promoter to express all these genes, and confused because the supplementary information indicates otherwise: Supplementary Table 2 indicates PsiH is expressed off of the promoter pPGK1, and PsiK of off pTDH3, but Fig4's description indicates they are expressed with pTEF1. Supplementary Table 1 indicates there are two sets of back-to-back fused promoters:
< pTEF1 - pPGK1 >
< pTEF1 - pTDH3 >
I think it is potentially problematic, to have the same promoter back-to-back as in, < pTEF1 - pTEF1 >, and this is not listed anywhere in the Supplement. Furthermore, in the tree of plasmid transformations to create the top producing strain ST9482, BB3923 is used, which uses a pPGK1 promoter to drive expression of PsiH. Similarly pCfB8794, which uses pTDH3 to drive expression of PsiK , is also used to create ST9482 so despite what Figure 4's descriptor states, I will continue under the assumption that there is no < pTEF1 - pTEF1 >.
Oh, I just reread Figure 4's description, and I think they didn't mean all the genes were expressed off of TEF1, they just meant PcCPR was expressed off of pTEF1. Oops.
Anyways, after inspecting the Snapgene files for pCfB2909, pCfB2904, pCfB3036 it appears each backbone already contains two terminators: CYC1 and ADH1. This means the "gene₁ < promoter₁ - promoter₂ > gene₂" cassette is probably just dropped in between the terminators of the backbone. Now I have a pretty confident idea of what the plasmids will look generally look like:
CYC1 - PsiM < pTEF1 - pTDH3 > PsiK - ADH1, in pCfB2909
CYC1 - CrTdc < pTEF1 - pPGK1 > PsiH - ADH1, in pCfB2904
CYC1 - PcCPR < pTEF1, into pCfB3036
multiple PsiM, into pCfB2796
Below is a better representation of these schemas:
I'm still unclear on the general design for multiple copies of PsiM, but I'll figure that out later.
Now, I want to find the sequences for:
< pTEF1 - pTDH3 >
< pTEF1 - pPGK1 >
pTEF1 >
The Supplement indicates < pTEF1 - pTDH3 > originates from Partow et al. 2010, of course, this is paywalled, so I'll go to our beloved SciHub. There also doesn't appear to be supplementary information which often contains the sequences. At this point, the bidirectional promoter is not obviously given and no database is referenced in Partow et al. 2010, so am frustrated. I am extremely tired of sequences being not obviously given. Argh! I go to Addgene and in a mad rampage try various searches, eventually "pTEF1, pTDH3," results in p1977 which is a plasmid from Irina Borodina's lab (the authors of Milne et al) and contains a < pTEF1 - pTDH3 > bidirectional promoter. This plasmid leads me back to the EasyClone-MarkerFree kit and I realize just reading Jessop-Fabre et al. 2016, at the beginning of this process would have save me significant confusion. Both bidirectional promoters and pTEF1 are given in the kit.
< pTEF1 - pTDH3 > in p1977
< pTEF1 - pPGK1 > in pCfB975
pTEF1 > doesn't have a dedicated plasmid, but can be pulled out of pCfB2312
Unfortunately, the full sequence of pCfB975 is not provided on Addgene and < pTEF1 - pPGK1 > is not yet obviously pieced together.
< pTEF1 - pTDH3 >
ttgtaattaaaacttagattagattgctatgctttctttctaatgagcaagaagtaaaaaaagttgtaatagaacaagaaaaatgaaactgaaacttgagaaattgaagaccgtttattaacttaaatatcaatgggaggtcatcgaaagagaaaaaaatcaaaaaaaaaaattttcaagaaaaagaaacgtgataaaaatttttattgcctttttcgacgaagaaaaagaaacgaggcggtctcttttttcttttccaaacctttagtacgggtaattaacgacaccctagaggaagaaagaggggaaatttagtatgctgtgcttgggtgttttgaagtggtacggcgatgcgcggagtccgagaaaatctggaagagtaaaaaaggagtagaaacattttgaagctatggtgtgtgcatcagtagctataaaaaacacgctttttcagttcgagtttatcattatcaatactgccatttcaaagaatacgtaaataattaatagtagtgattttcctaactttatttagtcaaaaaattagccttttaattctgctgtaacccgtacatgcccaaaatagggggcgggttacacagaatatataacatcgtaggtgtctgggtgaacagtttattcctggcatccactaaatataatggagcccgctttttaagctggcatccagaaaaaaaaagaatcccagcaccaaaatattgttttcttcaccaaccatcagttcataggtccattctcttagcgcaactacagagaacaggggcacaaacaggcaaaaaacgggcacaacctcaatggagtgatgcaacctgcctggagtaaatgatgacacaaggcaattgacccacgcatgtatctatctcattttcttacaccttctattaccttctgctctctctgatttggaaaaagctgaaaaaaaaggttgaaaccagttccctgaaattattcccctacttgactaataagtatataaagacggtaggtattgattgtaattctgtaaatctatttcttaaacttcttaaattctacttttatagttagtcttttttttagttttaaaacaccaagaacttagtttcgaataaacacacataaacaaacaaa
< pTEF1 - pPGK1 >
TDB...
pTEF1>
catagcttcaaaatgtttctactccttttttactcttccagattttctcggactccgcgcatcgccgtaccacttcaaaacacccaagcacagcatactaaatttcccctctttcttcctctagggtgtcgttaattacccgtactaaaggtttggaaaagaaaaaagagaccgcctcgtttctttttcttcgtcgaaaaaggcaataaaaatttttatcacgtttctttttcttgaaaattttttttttgatttttttctctttcgatgacctcccattgatatttaagttaataaacggtcttcaatttctcaagtttcagtttcatttttcttgttctattacaactttttttacttcttgctcattagaaagaaagcatagcaatctaatctaag
Putting all the Parts Together:
I think it is time to look at Jessop-Fabre et al. 2016, which is the paper that details the EasyClone-MarkerFree Kit. I am hoping to understand how to insert promoters and new genes into the backbones that the kit provides. Instead of reading the paper, I am going to read the manual I found on Addgene.
This manual is wonderfully clear and I love it.
The first thing I learn is that each integrative plasmid must have a complimentary gRNA helper plasmid. So I track these down (I am still ignoring the multicopy PsiM plasmid):
CYC1 - PsiM < pTEF1 - pTDH3 > PsiK - ADH1, in pCfB2909 at XII-5 — pCfB3050(gRNA XII-5)
CYC1 - CrTdc < pTEF1 - pPGK1 > PsiH - ADH1, in pCfB2904 at XI-3 — pCfB3045(gRNA XI-3)
CYC1 - PcCPR < pTEF1, into pCfB3036 at XI-1 — pCfB3043(gRNA XI-1)
I make a note that they use an unfamiliar selection system, G418 and nourseothricin, I'll look into this more later, I'm just focusing on getting the plasmids together now.
Each backbone plasmid has an AsiSI (aka SfaAI) restriction enzyme site, which is used to linearize the plasmid and insert the gene cassette. However, AsiSI only has a two base pair overhang, so the linearized plasmid gets nicked by Nb.BsmI which creates overhangs. I think this is a strange design choice, why not just use an enzyme with better overhangs?
Each cassette is made up of "BioBricks" which have are amplified to complimentary overhangs with the digested backbone or next BioBrick.
The manual details a cloning process they call USER cloning, which involves Uracil-containing primers, and uracil compatible polymerase, and two restriction enzymes, one to cut and one to nick the backbone, and all this seems annoying so I am not particularly interested in learning it. I think I have enough information to assemble the plasmids in silico. Here are all the parts I need to do so, grouped by assembly:
In Snapgene, I copy and paste all these parts together, kinda haphazardly, and get an idea of what the plasmids will look like:
However, I'm worried that because I am not following their exact cloning schema, there might be some small detail that I am missing. So, very begrudgingly, I decide to go back and look at the USER Cloning method. Immediately, I see that this is a good choice because it seems that the Kozak sequence is added via primers:
This diagram is also useful to interpret:
Basically I end up adding the Kozak Sequence and the overhang sequence before the corresponding gene. I think the overhang sequence doesn't need to be there, but I add it anyways because I've heard the the space between the promoter and start codon could affect expression. Similarly, I don't think the overhang sequence at the end of the gene needs to be there, and my genes already have stop codons, but if these remnants from their cloning scheme matter in some small way, I want them in the designs, just to be safe. And that is it. Below are my best guess at the final plasmid designs used to get the enzymatic pathway into yeast.
Well, its not quite it, I still need to figure out how to get multiple copies of PsiM in as well, but I will continue to save that for later.