Alice Rathjen has a modest proposal: a new method of studying and sharing genetic information that could accelerate our understanding of COVID-19 and other health threats.
Rathjen is integrating the worlds of DNA research and mapping technology. Her approach is intuitive yet novel, and she may be the first person in the world to actively pursue it.
In 2006, Rathjen was granted a seminal patent for mapping the genome in 2D and 3D, establishing her as one of the pioneers in the field of computational biology. However, computational biology software soon became the purview of universities, and while Rathjen pursued her entrepreneurial career, she remained something of an academic outsider. The innovation she shares here may be a byproduct of that duality—the insider who understands the field’s limitations and the outsider who sees unexpected ways to overcome them. She has launched a company—DNA Compass—to use the digital mapping technology GIS (geographic information systems) in a way that could disrupt the practice of genetics.
WhereNext spoke with Rathjen to learn about this paradigm shift, the hurdles it faces, and why she thinks a new model of mapping could help researchers develop COVID-19 treatments more quickly.
A brief genetics refresher: Each strand of DNA is composed of base pairs, which come in two varieties—guanine pairs with cytosine, adenine pairs with thymine. A strand of DNA makes up a gene. A strand of genes makes up a chromosome, and chromosomes make up the human genome. Each of the 23 human chromosomes is a different length. Chromosome one, for instance, is the longest, comprising 249 million base pairs.
A Worldwide Health Challenge
WhereNext: You’ve designed a groundbreaking way to map human DNA. Before we explore that, I want to understand what motivated your work. Why did you want to change the way people study and share genetic information?
Alice Rathjen: Mainly to speed up health outcomes. When we face a global health risk like COVID-19, researchers need to collaborate quickly to understand the disease and people’s genetic vulnerability to it. And they need to do that on a global scale, which means they need simple tools to collaborate across borders.
WhereNext: And that’s not happening with COVID-19 research?
Rathjen: It’s happening, but not at the speed and scale it could. Researchers in the US and Europe are working hard to understand genetic risk markers for COVID-19, but their data is very limited. At last check, there were only 18 entities worldwide that were willing to and capable of uploading COVID-19 patient genomes to US or EU servers for analysis. So this critical global initiative has fewer than 6,500 COVID-19 patient genomes to analyze—out of more than 20 million COVID-19 cases worldwide.
WhereNext: And the innovation you’re working on is meant to make it easier to share and analyze that data?
Rathjen: Yes. I want to turn patient genetic information into a kind of location data. That way, researchers could use existing GIS technology—instead of unweildy text files and pivot tables—to visualize risk factors. The GIS method would also help them exchange genetic data. Countries could keep their patient information, but give outside researchers permission to examine it.
Identifying COVID-19 Risk through Data
WhereNext: Let’s talk about how researchers look for disease risk factors in DNA.
Rathjen: With COVID-19, genetic scientists have identified about 700 locations on human chromosomes that they think are associated with different levels of disease severity. The trick is they need a lot more genomes to confirm that. The more genomes they have, the quicker they can find out what is statistically significant and pinpoint which locations are important for drugs or vaccine targets.
WhereNext: You envision using GIS technology to address this, which would be a significant shift. Can you explain that?
Rathjen: Where technology comes in critically is that with GIS, you can publish genomes as web services [data shareable via the internet], so that entities can make their genetic data accessible to each other without sending it across borders.
With GIS, you have enough controls in the software that researchers can set up role-based access. They can determine what raw data is shared and which annotations different users can add to genes. [GIS has] a collaboration layer that bioinformatics tools don’t have.
A New Way of Mapping DNA
WhereNext: So one of your goals is to make genetic information about COVID-19 and other health threats more sharable. The other is to make it easier for researchers to visualize genes. And to do that, you used GIS technology to map genes. Can you explain that?
Rathjen: All we did was pretty simple. We just created a micro-geocoding service.
WhereNext: That sounds like a new kind of location intelligence. Are you talking about creating addresses inside the human body? Why is that important?
Rathjen: I treated the cell as the area I was trying to map, because researchers organize genetic data by location. They [say] “On this chromosome, at this position, the person has an A, a T, a C, or a G.” Then they compare that with a reference genome, and where the person varies, [researchers note] a unique identifier for that location: “At this chromosome, at this position, there’s normally an A, and people that are sick have a T, and here’s what we think it means.” That’s how all genetic data worldwide is structured now. It’s location data. When you see all those identifiers mapped on one screen using GIS [see image below], you quickly understand a person’s genetic risk profile for COVID-19 and other diseases.
Finding COVID-19 Risk Locations
WhereNext: What are researchers doing to identify risk mutations for COVID-19?
Rathjen: As part of the COVID-19 Host Genetics Initiative, researchers are examining the genetic information of people who have died of COVID-19. As I mentioned, their work has been limited so far by the small number of genomes that are available to research.
But they have begun to identify certain genetic markers that may be associated with the severity of a patient’s response to COVID-19—locations that have to do with blood type, diabetes, obesity, immune response, and more. Researchers don’t understand what the mutation is doing that’s causing the higher risk, but they do have an early dataset showing those possible risk markers.
WhereNext: And why is it important to map those locations?
Rathjen: With COVID, we’re going to have vaccines and treatments soon, and there’s going to be a need to prioritize patient populations for treatment based on genomic risk. This is critical, life-saving information that needs to be rapidly distributed, but from authoritative channels in a controlled manner around the globe, by every entity based on their own priorities, their own ethics standards.
Researchers could compare COVID-19 risk markers against a patient’s genome and look for people that are high risk and prioritize them for vaccines, treatments, or isolation.
A New Address Service for DNA
WhereNext: That brings us back to your idea of creating a GIS address service, or micro-geocoding service. GIS technology has been used for decades to show where things are in the world, since everything has an X and Y coordinate. How did you apply that kind of mapping to mapping risk markers in our cells?
Rathjen: The trick was to create a coordinate system that would work for the genome. And I cut a corner here by adjusting the map of the world.
WhereNext: How so?
Rathjen: One DNA base pair is equal to 10 centimeters on the earth.
WhereNext: Okay. How did that help?
Rathjen: I did that because chromosome one is about 249 million units [base pairs] from the start to the end, and to get it to fit inside the coordinate system that we use for the earth, I had to divide the position on a chromosome by 10, and then I moved it 100 million units to the west. And lo and behold, now if I start chromosome one around Alaska, it fits inside the map of the Earth.
WhereNext: And that’s how you created the micro geocoding service you mentioned?
Rathjen: Right. If you think of the [GIS] geocoding services, it takes an address—in this case, the position of a base pair on a chromosome—and creates an XY coordinate for it. If we make that modification, then all the GIS mapping software that’s up and running around the earth can be used to analyze genomes.
WhereNext: Have you already applied this mapping to COVID-19 data?
Rathjen: Yes. We’ve taken the possible risk markers that researchers have identified for COVID-19 and mapped them in GIS.
(The image above shows what this genetic map looks like.)
A Roadmap for a New Genetic Map
WhereNext: Do you see other ways location technology might be used in genetic analysis?
Rathjen: I’ve been watching how GIS technology, AI, and aerial imagery have evolved to help us understand what’s happening on earth. [Applications range from wildfires to business opportunities.] In the genetics field, microscopes are getting more powerful than ever. We’re starting to see detailed photos of cells with elevation data and papers on the spatial analysis of genomes. The fields of bioinformatics and mapping software seem to be converging.
WhereNext: What comes next for your approach to genetic mapping?
Rathjen: We’ve tested our technique and believe it can help researchers worldwide. Since it’s a new approach, it would need to be validated by an agency like the FDA or the CDC before it can be used as a product.
WhereNext: Why are you optimistic about this approach?
Rathjen: I think the bioinformatics community is looking for a federated solution where data can stay in its country of origin and still help scientists address COVID-19 and other diseases. And the simplicity of visualizing genetic markers through mapping technology really intrigues me.
I feel like we could be on the verge of a new approach to disease and genetic research—and that we could support the work of scientists who are improving human health. That’s what I’m excited about.
Photo courtesy of the National Human Genome Research Institute