Listen to the podcast
On episode 5, by popular demand, we discuss the theme of reproducibility in scientific computing. Today we hit on a lot of topics, but don’t be saddened by the current state-of-affairs! We also cover tools, standards, and attitudes that provide new hope for a brighter future.
The moral: Conversations on reproducibility need to continue.
If we were to record this episode again its hosts would include:
- Neil Best (special guest)
- Katy Huff (inspiration from episode 1)
- Luis Ibanez (special guest)
- Anthony Scopatz (moderator)
Joining us this week is Luis Ibanez, who is a Technical Leader at Kitware, Inc. He has been one of the main developers of the Insight Toolkit (ITK) for the past ten years of the project. He is one of the Editors of the Insight Journal, the only Journal with reproducible papers in the medical imaging field.
Additionally, in a special double-guest treat, we also have Neil Best on the show. Neil is a database manager at the University of Chicago’s Computation Institute and a graduate student in Geography & Environmental Studies at Northeastern Illinois University. He is applying the reproducible research paradigm to the creation of a hybrid land-use/land-cover data set from multiple geospatial data sets using R and Sweave. The product of this work will initialize models of economy/agriculture/climate change interactions that will run in high-performance computing environments at the CI.
Intro/Outro Music: ‘Ten Thousand Strong’ -Iced Earth
Show Links:
- CDE (http://www.stanford.edu/~pgbovine/cde.html)
- Elsevier Executable Paper Challenge (http://www.executablepapers.com/)
- Sumatra, which we forgot to talk about (http://neuralensemble.org/trac/sumatra/wiki)
- PASS: Provenance Aware Storage Systems (http://www.eecs.harvard.edu/syrah/pass/)
Open Access / Reproducible Research Journals You Should Know About:
- Reproducible Research (http://reproducibleresearch.net/index.php/RR_links)
- Insight Journal (http://www.insight-journal.org/)
- Open Research Computation (http://www.openresearchcomputation.com/)
- Journal of Statistical Software (http://www.jstatsoft.org/)
- Theory of Computing (http://theoryofcomputing.org/)
Trap Marshall
2011/04/07
I wish yall had touched on Scientific Workflows used mostly in bioinformatics. They encapsulate provenance, reproducibity, and ease of adoption. I’ve not implemented such a system, but I’d love to hear more from folks who’ve used them.
See myexperiment.org
Thanks for making a show on this issue!
Trapier Marshall
2011/04/07
I wish y’all had touched on Scientific Workflows used by the bioinformatics community. I haven’t implemented one, but I’d love to hear more from those who have. Workflows are an attempt to encapsulate provenance:
- Visual coding style which implies top level documentation.
- Can package data with code.
- Have a community for data and code swapping. See MyExperiment.com.
While workflow engines have captured some important processes, they don’t yet seem well equipped in terms of revision control. Here I too fall prey to vested knowledge.–If it can’t beat git, then I can’t commit.
Thank you so much for talking about these issues. Positively therapeutic. I wish the episode had been many times longer.
Where else are scientists using the internet to chew on this stuff?
Anthony Scopatz
2011/04/07
Agreed. The topic of ‘Scientific Workflows’ well warrants its own episode (though I would not want to restrict it solely to bioinformatics). As you mention, there simply wasn’t time to cover it in this episode.
I am glad that you found inSCIght cathartic. It is for me as well. Otherwise, I feel like I just complain a lot to my peers, but when you turn those criticisms into a podcast…
Unfortunately, there aren’t many outlets for this kind of thing. While we are not the only venue out there on the internet, we are one of the more centralized ones. There is good convore group for scientists working in Python () and the occasional blog post pops up. Science podcasters has a lot of other interesting podcast (http://www.sciencepodcasters.org/). But in terms of having a broad topic, language independent forum we might be it. Some one *please* correct me if I am wrong!