The Banfield Lab at UC Berkeley
It’s easy to think that we’ve discovered most of the species on the planet. In fact, the booming field of metagenomics is using big data to help scientists better understand new and vast unexplored regions of the natural world: the microbiome. In the last few years, the price of genetic sequencing has plummeted to the point where scientists can now study ecosystems in their entirety, not just the parts of it that they already know how to identify.
The Banfield Lab at Berkeley recently collaborated with Stamen to bring their data to life, using advanced data interaction and interactive visualizations to make it easier to understand these vast new landscapes of genetic diversity.
What follows is an excerpt from an interview with Brian Thomas and Jill Banfield at the Banfield Lab. You can read the entire interview here.
What is metagenomics?
A metagenome is a collection of DNA sequences from the organisms present in an environment at the time a sample is taken. Metagenomics is the study of these genomic sequences. Metagenomics has been around in various forms since the early 2000s. Initially, the approach was referred to as “community genomics” because the sequencing approaches were used to study natural microbial communities. It wasn’t until the mid 2000s that it acquired its “meta” label. In the first studies, both genome reconstruction-based approaches and analyses of collections of genes without organism affiliation were used. Both of these methods are distinct from investigations of specific marker genes that had been used previously to phylogenetically “fingerprint” environments.
The primary difference is that genome sequences provide some insight into what all the organisms might be doing. Fingerprinting methods mostly tell us how closely organisms are related to each other.
This ecosystem was ideal because most of the DNA sequences came from a few abundant organisms, a feature that simplified the problem of working out which DNA sequences came from which organisms and made it possible to recover genome sequences from organisms outside of the laboratory — for the first time.
Over the next 10 years, this AMD system became the most extensively studied ecosystem ever, with over 80 publications on topics ranging from how AMD organisms accelerate acid formation to the first measurements of evolutionary rates of bacteria in their natural settings. It was the “bedrock” research topic of dozens of PhD students and post-doctoral fellows, the majority of which are now faculty members at several institutions in the USA and other countries (Luxembourg, Sweden, Australia) .
What role does visualization play in helping to do this science?
Visualization is critical, especially in metagenomics. You can’t see this life directly (without some pretty powerful microscope technology, and even then you don’t know who you’re looking at!). Having metagenomes from an environment however, allows us to understand what organisms are doing and have the potential to do. We use a visualization called a “genome summary” extensively.
This visualization displays all the information we’ve collected about a sample in a clear, expressive manner, that promotes hypothesis generation and experimentation. The amount of data that goes into a genome summary display is extensive. The system we’ve developed tracks it all, and this feeds the visualization. Metagenomics is definitely big data, and the genome summaries allow us to look at all the recovered organisms and all metabolic reactions of these organisms simultaneously.
How is the work that Stamen built for you helping you do your work?
Stamen helped us to recreate the genome summary to allow us to visualize even more data simultaneously. Additionally, these new genome summaries are FAST. This is a common theme in biology (in all science these days really) — there’s more data out there, and investigating it requires adding more and more to your experiments. The enhancements Stamen provided brought new variations and interactions with the data that have help users to query the data in a new ways. Including more data in the visualization allows us to ask bigger/more comprehensive questions of the data. Which organisms in the environment contribute to carbon or nitrogen flow through the system, can now be surveyed from bigger samples.
"Stamen helped us to improve our existing data visualizations to allow us to study even more data simultaneously. Additionally, these new visualizations are FAST. This is a common theme in biology—in all science these days really. There’s more data out there, and investigating it requires adding more and more data to your experiments. The enhancements Stamen provided brought new variations and interactions with the data that help users to query the data in a new ways. They allow us to ask bigger and more comprehensive questions of our data than we could before."
—Brian Thomas, Technical Director, Banfield Lab, UC Berkeley