Episode 13: Data Array Summit – The (Near) Future of NumPy

Posted on 2011/05/18 by Anthony Scopatz

Listen to the podcast

On episode 13, we recap the events of the Data Array Summit that took place last weekend at Enthought HQ in Austin, TX. The summit was a chance for NumPy developers from the community to meet face-to-face and talk about the ‘labeled array’ or ‘data array’ concept that seems to crop up just about everywhere you look.

The current state of affairs in python scientific and statistical computing are compared to other languages, like R. The major topic of discussion is the API to give numpy arrays increased functionality in a sufficiently pythonic syntax and , of course, while retaining performance.

Today’s hosts include:

Wes McKinney (special guest)
Travis Oliphant (special guest)
Fernando Perez (special guest)
Anthony Scopatz (moderator)

Wes McKinney is a PhD student in Statistical Science at Duke University, focusing on Bayesian methods for time series and other dynamic processes. After undergraduate, he worked for three years at AQR, a quantitative hedge fund, where he developed many research and production systems in Python. Part of his work at AQR was released as the open source project pandas, which he continues to actively develop. He is dedicated to building tools to enhance the use of Python for statistical computing applications, especially those relating to time series and financial applications. Outside of his academic work in statistics, he also does Python consulting work in the financial industry.

Travis Oliphant received the PhD from Mayo Graduate School in Biomedical Engineering and taught Electrical Engineering at Brigham Young University for 6 years before devoting himself full-time to developing scientifically-related software and managing customer relationships at Enthought. He is one of the original authors of SciPy and a major NumPy contributor and enjoys reading about neuroscience.

Fernando Perez is a research scientist working on the development of algorithms and computational tools for neuroscience at the at the University of California, Berkeley. After a PhD in particle physics and a postdoc in applied mathematics developing numerical algorithms, he currently works at the interface between high-level scientific computing tools in Python and the mathematical questions that arise in the analysis of neuroimaging data. He started the IPython project in 2001 and continues to lead it, now as a collaborative effort with a talented team that does all the hard work. He regularly lectures about scientific computing in Python.

Intro/Outro Music: ‘The Fear’ -Lilly Allen

Show Links:

Corrections:

Our misguided host mistakenly declared Fernando as ‘Research Assistant’ when in fact he is a ‘Research Scientist.’ All apologies!

Posted in: Episode

4 Responses “Episode 13: Data Array Summit – The (Near) Future of NumPy” →

Dejan

2011/05/19

Just downloaded this podcast, and would like to comment (more on form then content) before I actually listen to it

Download speed was < 30 KB/s, so really slow or server loaded

OTOH missing scientific pragmatism by providing 128 Kb/s CBR stereo MP3, when same (or perhaps better) quality can be achieved with 4 times smaller file, by encoding to mono ~ 64 Kb/s AAC

Recent public listening tests: http://listening-tests.hydrogenaudio.org/igorc/Public%20Multiformat%20Listening%20Test%20@%2064kbps.htm

Regards

Reply
- Anthony Scopatz
  
  2011/05/19
  
  Sorry about the download rate being so slow. I’ll look into it. How far way from Austin are you?
  
  Unfortunately, your link to the audio tests wasn’t that informative. Could you send other information? (No quantitative information was presented
  
  Reply
Dejan

2011/05/19

Hi Anthony,
thanks for your reply and moderation in connecting interesting people in this podcast

The link provided information about settings for tested samples and beyond carefully selected samples it showed interested artifact that majority of people could not ABX first three ranked codecs at 64 KB/s versus original also it opened door to new low delay codec: CELT (http://people.xiph.org/~xiphmont/demo/celt/demo.html) from xiph which was released just in time before this test was organized by members from hydrogenaudio.org and was ranked as best. More info into various aspects you can find on hydrogenaudio.org portal.

Results are in the first link from previously linked page or here: http://listening-tests.hydrogenaudio.org/igorc/results.html

I meant to suggest AAC as MP4 is broad standard, so is Ogg Vorbis to some extent and they both are far superior over MP3 simply because MP3 wasn’t designed for such use. There was attempt in building low bitrate MP3 encoder based on L.A.M.E. but I guess it can compete with features available in AAC or Aoyumi’s version or Vorbis encoder

CELT was ranked best in this tests, but unfortunately it’s new in the game without supported decoders. Actually foobar2000 it the only player that released CELT decoder AFAIK

BTW, am at the other side of globe

Regards

Reply

1 Trackback For This Post

A Roadmap for Rich Scientific Data Structures in Python | Quant Pythonista →
July 21st, 2011 → 12:51
[...] The topic is important enough that Enthought hosted a gathering this past May in Austin, the DataArray Summit, to talk about these issues and figure out where to go from here. It was a great meeting and we [...]

Episode 13: Data Array Summit – The (Near) Future of NumPy

Listen to the podcast

Show Links:

Corrections:

Leave a Reply to Dejan Cancel reply

Sponsors

Archives

Episode 13: Data Array Summit – The (Near) Future of NumPy

Listen to the podcast

Show Links:

Corrections:

Rate this:

Share this:

Like this:

Leave a Reply to Dejan Cancel reply

Sponsors

Archives