Throughout its 150th anniversary year, GSAS is foregrounding the voices of some of its most remarkable alumni and students as they speak about their work, its impact, and their experiences at the School.
Yekaterina “Kate” Shulgina is a graduating PhD student in Harvard’s Systems, Synthetic, and Quantitative Biology program. She talks about her work to identify variations in the genetic code that cells use to translate mRNA sequences into proteins, what her discoveries could mean for biological research, and the impact that her advisor, Ellmore C. Patterson Professor of Molecular and Cellular Biology Sean Eddy, has had on her development as a scientist.
Genetic Exceptions
In the cell, most functions are carried out by proteins. The instructions for how to make these proteins come from our genes, which are nucleotide sequences. When a protein is being produced, it's read off from mRNA in each group of three nucleotides—a codon. Each codon specifies a single amino acid in the protein that’s being built. So, the cell has this “lookup table” for the genetic code, and it uses it to figure out which codon means which amino acid.
What's really interesting is that the same genetic code is used for almost all known life forms from humans to bacteria to viruses. But there are some exceptions—organisms that use these “alternative” genetic codes where one or more codons have a different meaning than is typical. The goal of my PhD research is to computationally screen a lot of genome sequences to find more examples of these alternative genetic codes, study them, and answer questions about how and why they evolved.
A Deeper Understanding
Understanding how proteins work is critical to biomedical research. A lot of what we know about protein function comes from comparing sequences of the same protein across many different organisms. For this purpose, there are huge databases of protein sequences used by researchers around the world, including for disease research.
Currently, when someone sequences the genome of a new organism, they assume that it uses the normal genetic code. That assumption carries downstream; people take these genomes and then they predict the sequence of the proteins in that organism. But if you were to assume the wrong genetic code, then you’d be predicting protein sequences incorrectly as well and potentially depositing these incorrect sequences into protein sequence databases.
My techniques of analyzing a genome and confirming the right genetic code could help ensure the accuracy of protein sequence databases. Beyond that, studying how and why the genetic code changes will lead to a deeper understanding of this fundamental cellular process of producing proteins from information in mRNAs.
Confidence That Carries On
I think the part of my GSAS experience that will have the longest lasting impact on me will probably be the relationship I have with my advisor, Sean Eddy. He gave me a pretty unusual and incomparable level of independence and ownership on my project—from the initial idea to writing the software to planning out and doing the analyses to running experiments and writing the paper. That really helped build my confidence as a researcher, and that’s something that I’ll carry with me for the rest of my career.