As a participant of the Tara Oceans pan-oceanic marine plankton exploration campaign, I strive to make the resulting genomics data available to the research community.
Because of the sheer volume of the resulting genomics datasets (Tara Ocean is over 60 terabases and growing), transforming such data into knowledge is difficult for biologists without high performance computing skills and hardware. With my teammates of the Mediterranean Institute of Oceanology, we therefore built the Ocean Gene Atlas (OGA), an online marine metagenomes exploration tool which allows instant data mining of the Tara Ocean treasure trove with nothing more than a web browser. In fact, manipulating the full dataset (which combines sequence data, read count abundance estimates and environmental metadata) is tricky enough for bioinformatics experts that by popular demand we have now added an API for programmatic access to the OGA toolset. More about the data, methods and visualizations can be found in the NAR paper “The Ocean Gene Atlas: exploring the biogeography of plankton genes online”.
The Ocean Gene Atlas is the end product of a decade long project, starting with a 3 year campaign at sea, followed by half a dozen more years of data acquisition (DNA/RNA extraction, sequencing), primary data analysis (assembly, annotation), and integration into usable products (analyses, papers, datasets, tools). I was privileged to take part in all stages, including two legs as a science crew member (Mediterranean Sea and South Atlantic) in charge of bacterial and virus sampling, and one leg as chief scientist (Arctic Ocean along the Siberian coast).
I guess this goes to show being a bioinformatician doesn’t necessarily mean being glued to a computer screen all day…