Data matters. A great example of a smart use of data is genetic sequencing. This involves 3 billion base pairs, although scientists only know what around 1% of these do. The arguably most important ones are to do with creating proteins. By looking at people with traits, diseases or ancestry, scientists have been able to pick out those sets of genes which seem to match with those attributes. For example, breast cancer risk is 5 times higher if you have a mutation in either of the tumour-suppressing BRCA1 and BRCA2 genes.
Due to science, there are now commercial providers of DNA sequencing available, such as 23andme. They market this as a way to discover more about your ancestry and any genetic health traits you might want to watch out for. To try this out, I bought a kit to see how they surfaced the data in an understandable way. The process itself is really easy, you just give them money and post a tube of your spit to them.
After a few weeks wait for them to process it, you can look at your results. Firstly, you have your actual genetic sequencing. This is perhaps really only of interest (or any use) to geneticists. As part of their service, 23andme pull out the “interesting” parts of the DNA which have been shown (through maths/biology) to correspond to particular traits or ancestry.
They separate this out into:
- Genetic risks
- Inherited conditions
- Drug response
- Traits (eg hair colour or lactose tolerance)
- Neanderthal composition
- Global ancestry (together with a configurable level of “speculativeness”)
- Family tree (to find relatives who have used the service too)
Part of what is smart about this service is that while it uses DNA as underlying data, it almost entirely hides this from the end user. Instead, they see the outcome for them. They have realised that people don’t care about a sequence like “agaaggttttagctcacctgacttaccgctggaatcgctgtttgatgacgt” but they do care about whether they have a higher risk of Alzheimer’s. Because some of these things are probabilistic, they also put a 1*-4* scale of “Confidence”: again this is easy to read at a glance. It isn’t very engaging, but it looks something like this:
Perhaps more visually interesting is the ancestry stuff. Apologies that my ancestry isn’t very exciting:
I hope this has been interesting. Commercial DNA sequencing is a real success story not just for biochemistry and genetics, but also for the industrialisation of these processes and the mathematics and software that makes it possible. The thing that is especially cool, according to me at least, is the ability to make something as complex as genetics accessible, understandable and useful.