Using machine learning to predict if cancer cells can repair themselves
Tempus has the largest set of DNA and RNA paired with clinical data in cancer patients. This dataset has attracted some of the best data scientists in biotech who have joined the company in order to develop algorithms that can improve outcomes for patients. Once an algorithm has demonstrated results, my team works with the data scientists and pathologists to ship it to patients. I’ll share the story of one of our algorithms - the HRD detection test.
The Problem: how do cancer cells develop?
Healthy cells have mechanisms that help them repair damage that naturally occurs over time. When a single strand of DNA breaks, cells can copy from the paired bases on the other strand of the same chromosome. But when both strands of the chromosome break, cells copy from the same gene on the other chromosome, in a process called homologous recombination repair, or HRR. The book Gene describes this well:
When DNA is damanged by a mutagen, such as X-rays, genetic information is obviously threatened. When such damage occurs, the gene can be recopied from the “twin” copy on the paired chromosome: part of the maternal copy may be redrafted from the paternal copy, again resulting in the creation of hybrid genes.
The pairing of bases is used to build the gene back. The yin fixes the yang, the image restores the original: with DNA, as with Dorian Gray, the prototype is constantly reinvigorated by its portrait. Proteins chaperone and coordinate the entire process- guiding the damaged strand to the intact gene, copying and correcting the lost information, and stitching the breaks together – ultimately resulting in the transfer of information from the undamaged strand to the damaged strand.
When homologous recombination repair fails to work, it’s known as homologous recombination deficiency, or HRD. The BRCA1 and BRCA2 genes are two of the more well known genes that regulate this, one of the central pathways that prevent cancer from developing. Mutations in the BRCA genes, among others, can trigger HRD. HRD is particularly common in Ovarian cancer, but it is present in many other cancer types as well.
If a cell presents with HRD, then the cell will have a hard time repairing a double-strand break – since the homologous repair pathway doesn’t work, the cell will likely die.
Patients whose tumors have HRD have a treatment available to them: PARP inhibitors. From this article:
If your tumour is positive for HRD, it means that certain treatments are more likely to be effective. One of those treatments are PARP inhibitors.
PARP (poly ADP-ribose polymerase) is a type of enzyme involved in many functions of the cell, including the repair of DNA damage. In healthy cells, when DNA is broken or damaged, PARPs act as a repair crew to help fix the damage. This means the cell lives.
PARP inhibitors are a type of targeted therapy that block the action of the PARP enzyme in cancer cells, which means they can’t repair DNA damage.
The combination of being HRD positive (which blocks the first repair mechanism) and taking PARP inhibitors (which blocks the second repair mechanism) means that PARP inhibitors can be more effective than in people whose tumours aren’t HRD positive.
For more information, this video describes the underlying process of HRD fairly well:
The model: Using Artificial Intelligence to identify HRD
If we can predict the presence of HRD in a tumor, then doctors can try a new treatment. At Tempus, HRD is a test to “predict the probability of a patient’s cancer having a phenotype characterized by the inability to repair DNA breaks via the homologous recombination repair (HRR) pathway, known as homologous recombination deficiency (HRD).”
There are two modalities of genomic data that we can use for the predictions: - DNA, which shows what mutations the tumor has accumulated - RNA, which shows how the tumor is behaving. Tempus published a preprint of a paper about our HRD test here: Validation of Genomic and Transcriptomic Models of Homologous Recombination Deficiency in a Real-World Pan-Cancer Cohort.
In that paper we described the use of ML to predict HRD. The simpler model is based on RNA. RNA is a window into the actions of a cell- unlike DNA, which is usually the same across the body (except for rare mutations), RNA expression changes dramatically based on the type of cell and what it is doing, what kind of proteins it is creating.
To use a computer analogy, DNA is like the source code, whereas RNA is the runtime. All cells in the body have more or less the same DNA, but they may express dramatically different proteins via RNA. In particular, a healthy cell will express different proteins than one where the homologous repair pathway is not working. We can use that to infer the health of the repair mechanism from cell behavior.
Training data includes RNA from more than 16,000 clinical smaples. For labels, we looked at the presence of a particular genetic variations that are known to be correlated with the HRD pathway (“Samples were labeled based on their BRCA1, BRCA2 and selected Homologous Recombination Repair (HRR) pathway gene (CDK12, PALB2, RAD51B, RAD51C, RAD51D) mutational status”).
Tempus had a team of data scientists that worked for years to develop the above model. They experimented with many different underlying mechanisms until they found a model that worked across cancer cohorts. The important factors we were looking for were:
- how will it generalize to other cancer cohorts?
- does it match the literature? is ite asy to udnerstand how it works?
- can it run on most samples? if we require more data then we may not bealbe to run it on samples that don’t have that data present
Shipping the model
It’s all well and good to create a fantastic AI model for predicting a disease, but how do we roll it out to patients? The steps are roughly:
- Collect and verify upstream inputs
- Run the algorithmic analysis
- Publish results as a report
- Pathologist signoff
- Doctor views results & takes action, if relevant
Once we had an HRD model built, it was time to develop it into a product. One of hte best parts of hte job is working with these incredible data scientists and pathologists to make our product go live.
The HRD report itself is generated as a PDF. The PDF contains carefully chosen details that allow the result to be interpreted correctly. It also ensures that we have alignment with the legal & regulatory requirements. We have built an internal microservice that allows pathologists to view and sign out reports in a way that complies with all relevant rules. The final report ends up looking like this: