Q&A with Dr. Justin Reese
Tell us about yourself – what is your background in and how did you end up in your current position?
I am a computational biologist at Lawrence Berkeley National Laboratory, and I’m interested in using biomedical data to generate actionable knowledge about human disease. I began my scientific career doing “bench science”, meaning test tubes, centrifuges, and lab coats. I studied biochemistry and immunology, and while I found this very interesting, I discovered that I like using computers to do research more than I like bench science. I went on to participate in several animal genome projects, but eventually I gravitated back to research on human biology and disease.
Tell us about your research in the Environmental Genomics and Systems Biology Division at Lawrence Berkeley National Laboratory?
My research at Berkeley Lab is focused on applying techniques such as machine learning to biomedical data, especially patient data, in order to glean new information that can help address human disease. For the past few years, I’ve focused mostly on COVID-19 research, but I’m also interested in other viral diseases, depression, and cancer.
How can electronic health records empower predictive care?
Using electronic health records (EHRs) for research is an interesting challenge, because by and large the data that we have access to is collected for the purpose of billing and not research. This means clinical information that might be very useful to have is often not collected or not available to us. It also means that the data differs importantly between hospitals, since their computer systems represent billing information in different ways.
The challenge then is to mine this billing data to learn something new about patients and disease. To do this, we frequently use statistics and machine learning. Basically, we are using computers to spot patterns in the data that have biological meaning. A few examples of this are our work to identify different subtypes of long COVID using EHR data, and our work that demonstrated that a drug frequently taken for diabetes (metformin) is associated with lower COVID-19 severity.
Can you tell us about the KG-COVID-19 Project and your involvement?
KG-COVID-19 was our first effort at contributing to the response to the COVID-19 pandemic. One of the things that we are good at is making knowledge graphs (KGs), which are a modern way of integrating data so that it is more than the sum of its parts. Knowledge graphs like KG-COVID-19 can then be used for machine learning, for example in order to prioritize drugs that might have some effect on COVID-19, and also for things like data visualization and browsing by researchers.
When the COVID pandemic began, we realized that since we are experienced at handling biomedical data, we could create and share a knowledge graph that integrated all data related to COVID-19. We used this knowledge graph ourselves, of course, and we also provided it to the research community to reuse and apply to new research about COVID and other conditions. We have since reused the software that we built to construct KG-COVID-19 to build other knowledge graphs, for research into other diseases such as depression and cancer.
What is drug repurposing and how is machine learning making the process easier?
Drug repurposing means using an existing drug to treat a disease that it was not originally developed to treat. It is very expensive and time consuming to develop a new drug, so reusing existing drugs is a big win. There are many thousands of drugs that have already been shown to be safe in humans, but obviously it isn’t feasible to try every drug to see what might work for a given disease.
Machine learning can help. A knowledge graph is a very elegant way of organizing and navigating this information. Essentially, if you provide the computer the right information in the right way, it can identify the drugs that are most likely to have some effect on a disease of interest. For example, in the case of COVID-19, you provide information about all known drugs, the human proteins that are targeted by these drugs, human proteins that the virus interacts with, the diseases that each drug treats, the symptoms of each disease, and so on. The computer can then learn patterns between all these concepts, and use these patterns to make educated guesses about what drugs are most promising for a given disease.
How has the Covid-19 pandemic changed the way healthcare data is handled?
It is our hope that the silver lining of the COVID-19 pandemic will be a shift toward more and better sharing of healthcare data for research. We are involved in a project called the National COVID Cohort Collaborative (N3C), which I think has made great progress toward this goal by integrating data from (currently) 77 different hospital systems, and providing this data to COVID-19 researchers like us.
We also are contributing to efforts that will make it easier to translate healthcare data (which are geared toward billing and administration) to structured vocabularies called ontologies (for example, the Human Phenotype Ontology, which describes physical manifestations for human disease, and Mondo, which describes all known human diseases). This is important, because ontologies are purpose-built for research, and can accelerate progress by allowing the findings from different studies to be combined.
What is your ‘path not taken’?
Good question! I have had something like 20 jobs in my life (including construction worker, restaurant cook, schoolteacher, and copy editor), so careerwise I think I may have immunized myself against the ‘path not taken’. I think I probably would be a writer if I were not doing science.
Anything else you would like to add?
Be kind and do improbable things!