Prabhu Ramachandran: 2008

Sunday, September 21, 2008

Python list (in Cython) vs. NumPy

Taking my previous benchmark a little further I decided to see how well iterating over a Python list of doubles compares with using NumPy arrays. Here is an extremely simple example that implements the sum function in Cython and compares the result with NumPy's sum method.

Here is the Cython code:


# --- csum.pyx ---
def csum(list array):
  cdef int i, N=len(array)
  cdef double x, s=0.0
  for i in range(N):
      x = array[i]
      s += x
  return s

Here is a setup.py to build it:


# --- setup.py ---
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(cmdclass={'build_ext': build_ext},
    ext_modules = [Extension("csum", ["csum.pyx"])])

To time it in IPython I created a simple file called test.ipy like so:


# --- test.ipy ---
import csum
import numpy
for i in [10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 250000000]:
  print '-'*80
  print 'N =', i
  a = numpy.linspace(0, 1, i)
  b = a.tolist()
  print "Cython:",
  %timeit csum.csum(b)
  print "NumPy:",
  %timeit a.sum()

I run it using IPython and here are the results (formatted a little):


N         Cython       NumPy
10          534 ns       10.1 micros
100         1.76 micros  10.8 micros
1000        15.3 micros  19.3
10000       150 micros   101 micros
100000      1.75 ms      933 micros
1000000     19.7 ms       9.24 ms
10000000    198 ms        92.1 ms
25000000    499 ms       231 ms

This was done on a P4 3 Ghz machine and clearly lists do quite well. At small sizes they outperform NumPy and at really large sizes they are about twice as slow. This is very good considering how general lists are.

Saturday, September 20, 2008

Python vs. Cython vs. D (PyD) vs. C++ (SWIG)

In April 2008 there was a thread on the scipy-dev list regarding the inclusion of Cython code in SciPy. In that thread, I mentioned a particular use case of interest to me -- creating and manipulating an array of objects (rather than arrays of elementary data types) and being able to do that with Cython easily and efficiently.

The problem I was considering is a simple one. I create a list of "Vortex" objects and compute (naively) the velocity of a collection of these particles on one another. This is an O(N^2) computation since every particle influences every other. The idea was to create simple OO code to be able to perform these computations. Here is an outline of the Python code for doing this:


class Vortex(object):
 def __init__(self, pos=0.0, strength=1.0):
    # ...
 def eval_velocity(self, pos):
    return -1j*self.strength/(2*pi*(pos - self.position))

class VortexManager(object):
 def __init__(self, vortices=None):
    # vortices is a list of vortex objects.
    self.vortices = vortices

 def set(self, pos, str):
    # ...

 def velocity(self):
    for va in self.vortices:
        vel = complex(0, 0)
        for vb in self.vortices:
            if va is vb:
                continue
            else:
                vel += vb.eval_velocity(va.position)
        va.velocity = vel

Very straightforward code. Now, back in April I implemented this in pure Python, C++ and wrapped the C++ code to Python using SWIG. I also implemented it in D and wrapped that using PyD. I found that D was about 1.7 times slower than C++. C++ was about 300-400 times faster than the pure Python version.

I attended Robert Bradshaw's Cython tutorial at SciPy08 and really liked it. About 10 days ago I finally found the time to create a Cython version and the winner is ...

Cython!

I've put up all of the code here. To use the code, untar the tarball and do the following:


$ cd cpython_d_cpp
$ python setup.py build_ext --inplace

This requires SWIG, numpy and Cython to build. If you have PyD installed it will build the PyD extension also. To test the performance do this:


$ python test_perf.py

This produces the following output for me on a P4 (3 Ghz):


dee(4000): 1.87730193138
(1411.53285812+1411.53285812j) (945.091286479+945.091286479j)
swig(4000): 1.10782289505
(1411.53285812+1411.53285812j) (945.091286479+945.091286479j)
cython(4000): 1.15034103394
(1411.53285812+1411.53285812j) (945.091286479+945.091286479j)
Pure Python(200): 1.14771318436
# N SWIG Cython Ratio
1000 0.071 0.069 0.967
2000 0.283 0.274 0.968
3000 0.638 0.619 0.970
4000 1.135 1.100 0.970
5000 1.767 1.720 0.973
6000 2.517 2.473 0.983
7000 3.474 3.370 0.970
8000 4.541 4.403 0.970
9000 5.698 5.575 0.978
10000 7.000 6.879 0.983

The first few numbers just test one single case of 4000 particles. D is slower than both C++ and Cython. Python is dog slow (or donkey slow as I like to say it)! For some reason I was getting segfaults when I tried to test the D wrapper for more than 3000 particles. On my Macbook the Cython version is actually 30% faster than the C++ version and on a Sempron 2800 Cython is about 25% slower. So different machines produce different numbers. However, C++ and Cython are both in the same ballpark.

What I loved about the Cython code is that I use a Python list to manage the Vortex objects. This shows that we can use the normal Python containers to manage objects. This is extremely convenient. This isn't very surprising either since Python containers are also heavily optimized. "cython -a" was a huge help when getting things done. For more details on the Cython code look at the tarball.

Clearly, if you are building code from scratch and need speed, Cython is an excellent option. For this I really must congratulate the Cython and Pyrex developers.

Mayavi Screencast now on blogger

Finally, I've managed to upload the screencast I posted last about on blogger.
If you are on a Mac or on Windows and downloaded the file I put up earlier, you might need to install the theora component from here.

I have no idea how it will eventually show up with the processing that goes into it. Depending on how this goes I'll put up new screencasts either here or perhaps at Showmedo. Anyways, here it is:

Friday, September 19, 2008

Mayavi2-3.x screencast

Here is a screencast (16 Mb Ogg Theora) of some of the new Mayavi2 features in the 3.x series.

http://www.aero.iitb.ac.in/~prabhu/tmp/videos/mayavi2-3-screencast.avi

Its an ogg theora video. This is my first screencast so its going to be a little rough. There is too much noise in the sound recording and I'll try and fix that next time. BTW, I used the excellent istanbul for the video recording. I recorded the sound track separately and mixed the two with a gst script I got from here.

I was unable to upload this video on this blog because the IITB network at this time is unbelievably slow. If you know of another place I should consider uploading the video or would like to host the video elsewhere please let me know. Feel free to download it and host it elsewhere.

Thursday, September 18, 2008

Announcing the Mayavi2-3.x series

This is a long overdue announcement. ETS-3.0.0 was released just before the SciPy conference in August 2008. Mayavi-3.0.0 was released as part of this. There are a huge number of significant changes to Mayavi as compared to 2.x. Note that we are still calling this Mayavi2-3.x.y since the Mayavi2 represents a departure from the older Mayavi-1.x series and Mayavi2-3.x is simply the next version of Mayavi2. The full details of the changes from the 2.x series are documented in the CHANGES.txt file in the mayavi documentation directory. Here is a summary of the major changes to this series thus far. The current release is 3.0.3.

Core Mayavi:

I've added all the modules and filters that were available in Mayavi1 into Mayavi2. The only module I didn't port is the Locator module which didn't seem very useful. Mayavi2 now has more modules and filters than Mayavi1 had. Now there isn't an excuse to continue using Mayavi1.5.
Users can now right-click on the nodes on the tree view to create new sources, filters and modules.
The menu entries for the modules and filters (on the app and on right-click) are all context sensitive. So if your data doesn't support a particular module you shouldn't be able to add it from the UI.
The file->open menu is far cleaner and exposes just one "Open" item that automatically lets you open any supported data.
Added a toolbar to the engine view that offer icons to make it easy to add new sources/filters and modules. Special "Adder nodes" are added to the tree view when a scene/source is empty that makes it easy for new users to use the mayavi pipeline.
Added a -o/--offscreen option to the mayavi2 application so you can run mayavi offscreen if your VTK version supports it. This in combination with the -x command line option makes for a powerful combination.
New and much easier extension mechanism for the mayavi library and app via a user_mayavi.py and site_mayavi.py.
Added a tvtk_doc module/script that lets you search through the TVTK classes (with and/or keyword support), this is similar to Mayavi1's vtk_doc script.
Added a GenericModule that makes it very easy to put together a bunch of components/filters to create a new module.
Added Optional, Collection filters that let you easily build filters out of combinations of existing components or filters.
Added a new SetActiveAttribute filter that lets you choose the active scalar/vector/tensor attribute, this lets you do neat things like plot iso-contours of one scalar on top of the iso-contour of another, see examples/mayavi/contour_contour.py for an example.
Gaël sphinxified the documentation to make it look much nicer and fully searchable.
Better and more complete testing, these are unfortunately integration tests currently and will slowly be made into proper unit tests.
The mayavi2 application and plugins are now ported to use Envisage3 which is much cleaner and nicer to work with than Envisage2.
There is now a full-fledged preferences framework for Mayavi (to access the preferences use, from enthought.mayavi.preferences.api import preference_manager).
Some parts of the API and file organization has been cleaned up. This is mostly related to the location of some modules, the core scripting API hasn't really changed.
The project is now called Mayavi and not MayaVi as before. This avoids unnecessary confusion on how to pronounce the name and avoids any comparison with either Maya or Vi.
ETS itself is reorganized into a much smaller set of packages unlike the 40 odd packages in the ETS-2.x series. This makes dependency handling, packaging and installing much easier.

Mlab:

The enthought.mayavi.mlab.pipeline is complete and can be used to fully script mayavi.
The mlab API has changed to be more consistent with the naming style used in ETS, for example isosurface has become iso_surface, extractedges becomes extract_edges etc.
Added a show() function and decorator to allow users to easily create standalone scripts.
mlab.pipeline.open lets you open any supported data.
The mlab API can now take either engine or figure keyword arguments. This allows to avoid the use of the global sate set in the mlab engine. Mlab also now exposes a set_engine function.
It is easy to change visualized data using the .mlab_source attribute on objects created from mlab. This makes it very efficient and easy to create animations from mlab. See here for more details.
Mlab by default uses a MayaviScene that features a convenient Mayavi icon which brings up the engine view using which you can edit the pipeline from the UI (using the toolbar or right clicks). This gives mlab the full power of mayavi.

Apart from these significant feature additions there have been the usual round of bug fixes and new bugs introduced.

As you can see this is a very significant release that marks a very important phase for mayavi2. All the additions made at the sprint went into this release.

Currently it is probably easiest to install mayavi via either enthought's EPD or Python(x,y). Gaël has made available Ubuntu packages that are available at a link he mentions here. Dave Peterson has been making all of ETS available from PyPI, however you'll have to get all the dependencies installed (numpy, VTK, wxPython or Qt4). The best place to look for installation instructions is here, https://svn.enthought.com/enthought/wiki/Install

Enjoy.

Friday, July 11, 2008

Mayavi sprint July 2008: a summary

The first Mayavi sprint was held between 2nd to 9th July at Enthought's offices in Austin Texas. It was a great success in my opinion. A large amount of work was done. Gaël Varoquaux and I were heading the sprint team. Several Enthought employees participated as did many of the interns. Jarrod Millman was also at the office at this time and it was great having him around as well. Overall it was great fun and I personally had a wonderful time.

Here is a picture of part of the sprint room at the Enthought office on day one (2nd July).

Part of the room on 8th.

We had identified various things to do and these are all put up on the Mayavi sprint wiki page. I'll summarize these in short below in no particular order (more details on the wiki page):

Generating the menu items in Envisage involved a ton of boilerplate code. In addition we wanted to be able to generate pure TraitsUI menus when a user right clicked on a node on the mayavi engine tree view. This has been implemented so we have automatically generated envisage and traits UI menus for all the sources, filters and modules.
Mayavi now supports context sensitive menus. Suppose for example you have loaded a file that contains unstructured grid data. In this case the mayavi menus for say a GridPlane module will be greyed out since it is not possible to use that module for unstructured grid data.
The file->load data menu has been cleaned up. There is only one "Open file" entry for all supported mayavi datasets. This is also true for the command line option. One may load any file using simply
```
mayavi2 -d file.ext
```
where the file extension ext is any supported data file extension (vtk, wrl, 3ds, jpeg, png etc.)
Gaël led a group (Field C, Ilan S and Chris C) which built a data visualization wizard that makes it easy for a user to create a TVTK dataset given numpy arrays. The dialog looks really cool and is not finished yet but getting there real fast.

Judah, Vibha and Eric worked very hard on improving the UI of several of the dialogs. They made modifications so it is easier to customize views and made several of the views much nicer.
Robert Kern ported the text editor plugin and the logger plugin to Envisage3. The text editor plugin is now used in mayavi.
Evan P fixed a few bugs with the scrollbars on Traits UI's.
Chris G fixed many packaging issues. He also removed the unnecessary multiple copies of the sphinx generated docs that were being checked in.
Jeff W and Patrick helped write scripts to generate the sphinx docs and have the mayavi docs show up on the web.
Ilan S. helped find several bugs and helped both Gaël and me in various areas.
Dave Peterson started some work on automatic scripting of mayavi based on recording UI actions but there wasn't enough time for this to get done considering Dave's many other responsibilities.
Dave Peterson pushed for an ETS-2.8.0 release during the sprint.
Dave Morrill helped fix several traits bugs.

My apologies if I missed someone in the list of contributors above. As you can see, a huge number of things got done. It was great fun. Enthought hospitality was, as always, outstanding.

Gaël and I thank all those who participated! Our thanks to Enthought who funded it and helped make it a success.

Given the success with our first sprint, we plan to hold a two day sprint at SciPy 2008. Please do join us!

Gaël and I are also teaching a two hour tutorial this year at SciPy 2008. Hurry, 11th is the last day for early registration!

Sunday, March 30, 2008

Uniform deviates on the surface of a sphere

Here is a simple example of how to generate points randomly on the surface of the sphere that are uniformly distributed. The reason I'm posting it here is that it shows how convenient mayavi's mlab is for this sort of thing.

Anyone who has studied pseudo random numbers should know that the following will not generate points uniformly on the sphere. Here is a demonstration of the fact:


$ ipython -wthread
In [1]: from numpy import *
In [2]: from enthought.mayavi import mlab
In [3]: p = random.random(10000)*2*pi
In [4]: t = random.random(10000)*pi
In [5]: x, y, z = sin(t)*cos(p), sin(t)*sin(p), cos(t)
In [6]: g = mlab.points3d(x, y, z, z,  mode='point')
In [7]: g.actor.property.point_size = 2

The point size is made larger so you can see the points more clearly. Here is the picture it produces:

Notice the clustering at the poles. OTOH, the following works nicely,


In [3]: p = random.random(10000)*2*pi
In [10]: mlab.clf()
In [12]: t = arccos(random.random(10000)*2.0 - 1.0)
In [13]: x, y, z = sin(t)*cos(p), sin(t)*sin(p), cos(t)
In [14]: g1 = mlab.points3d(x, y, z, z,  mode='point')
In [15]: g1.actor.property.point_size = 2

This is the picture this one produces:

As you can see the transformation has worked and generates what appears to be a uniform distribution on the surface of a sphere. Visual testing isn't good enough uniformness but that isn't the point here.

Using virtualenv under Linux

Mayavi and TVTK are part of ETS (Enthought Tool Suite). The stable version of ETS is 2.7.1. ETS-2.7.1 ships with Traits2 and Envisage2. However, the latest traits version is 3 and Envisage3 is the current development Envisage version. These are part of ETS-3.0. Thus far mayavi was being developed in the branches. Earlier, in order to test mayavi with both ETS-2.7.1 and ETS-3.0, I used to manage a bunch of directories with symbolic links to switch between the ETS versions. It has been a while since I did that and yesterday I wanted to try out virtualenv to see if it would solve my problem easily. I'm using a Gutsy i386 system. In short, virtualenv worked like a charm. This is what I did.

I first needed to clean up my older setup. In the past I used a .pydistutils.cfg file that was in my home directory to tell easy_install to use my custom directories (which were in my PYTHONPATH). I removed this. Then I downloaded the virtualenv tarball from pypi and did the usual python setup.py install dance.


$ cd virtualenv-1.0
$ sudo python setup.py install --prefix=/usr/local

Then I did the following:


$ mkdir -p ~/usr/virtualenv
$ cd ~/usr/virtualenv
$ virtualenv ets_stabe # for ETS-2.7.1
$ ln -s ets_stable/bin/activate .
# edit .bash_profile and add "source ~/usr/virtualenv/activate" (or whatever for your particular shell) to it.

Now, either login afresh or source ~/usr/virtualenv/activate to use the newly created virtual environment.

Now you are all set. I then installed ETSProjectTools and installed ets==2.7.1 and everything else I needed in the stable environment (using either python setup.py [option] or via easy_install). You just need to make sure you use the corresponding virtual environments python and easy_install.

To switch to the truk I created another virtualenv like so:


cd ~/usr/virtualenv
virtualenv ets_trunk

Then I installed the trunk related packages here. To switch between virtual environments I have a small shell script that looks like this:


#!/bin/sh
# switch_ets.sh
cd ~/usr/virtualenv
rm activate
ln -s $1/bin/activate .

So to switch I simply run:

$ switch_ets.sh ets_trunk

and I'm all set to go when I start a new shell.

Mayavi sprint in July 2008

This is to announce a Mayavi sprint between 2nd July to 9th July, 2008.
The sprint will be held at the Enthought Office, Austin Texas.
Here are the details:

Dates: 2nd July 2008 to 9th July 2008
Location: Enthought Office at Austin, TX

Please do join us. Both Gaël and myself will be at the sprint on all
days and there will be developers from Enthought joining us as well.
Enthought is graciously hosting the sprint.

The agenda for the sprint is yet to be decided.

This was announced earlier on the enthought-dev and mayavi-users lists.

Saturday, March 29, 2008

Debugging Python scripts

Elementary debugging

I've noticed that many students who submit assignments in Python for a course I teach don't know how to use a debugger. Here are some simple notes on debugging Python scripts to get people started. Lets say you have the following trivial code:


# -- junk.py ---
print "hello"
foo
a = 1
print a+1

Importing this from IPython produces this::

In [1]: import junk
hello
---------------------------------------------------------------------------
         Traceback (most recent call last)

/tmp/ in ()

/tmp/junk.py in ()
  1 print "hello"
----> 2 foo
  3 a = 1
  4 print a+1
  5

exceptions.NameError: name 'foo' is not defined

If you have a non-trivial program you can debug this right away using IPython's %debug magic.


In [2]: %debug
> /tmp/junk.py(2)()
  1 print "hello"
----> 2 foo
  3 a = 1

ipdb>

This drops you into ipdb which is an enhanced, pdb. At this point you can print variables and step through your code, go up the stack of execution etc. For more details on ipdb read the pdb documentation.

If you use the %pdb magic then IPython automatically drops you into the debugger whenever it sees an error. This is often very convenient.

Debugging a VTK-Python script

This is obviously a little more complicated (because VTK is implemented in C++ and you want to step through C++ code to figure out what is wrong) and there are a few ways of doing this. In the following I assume that you are doing this on a *nix/Linux/Mac machine. Suppose you run a Python script that segfaults and need to figure out where it crashed you can do the following (assuming you have gdb installed of course):


$ gdb python
(gdb) run your_script.py
[gdb messages]
Segmentation fault (or whatever)
(gdb) backtrace

The backtrace should typically give you more information about the crash.

Several weeks ago I needed to debug a VTK related problem where a window was not being resized if the off screen rendering mode was set. Debugging this can again be achieved by running the script using Python started from gdb and then setting a breakpoint in gdb. However, I thought I'd illustrate a very handy gdb feature of debugging running applications. In my particular case, I had a simple Python script that was demonstrating a VTK bug. At the appropriate location wher, I added a simple raw_input('DBG:') to give me a point where execution would be paused. I then ran the script like so:

$ python offscreen.py

On another shell I did the following:

$ ps uax | grep python

Using the PID of the process (30708) I did the following:


$ gdb
(gdb) attach 30708
[tons of output]
Loaded symbols for /usr/lib/libwx_gtk2u_gl-2.8.so.0
Reading symbols from /usr/lib/libGLU.so.1...done.
Loaded symbols for /usr/lib/libGLU.so.1
0xffffe410 in __kernel_vsyscall ()
(gdb)

On the gdb prompt I specify a break point where I want to stop execution::

(gdb) br vtkXOpenGLRenderWindow.cxx:1185
Breakpoint 1 at 0xb52846a0: file /skratch/prabhu/vtk/cvs/VTK/Rendering/vtkXOpenGLRenderWindow.cxx, line 1185.
(gdb) c
Continuing.
[Switching to Thread -1210595664 (LWP 30708)]

Breakpoint 1, vtkXOpenGLRenderWindow::SetSize (this=0x99bfb78, width=800,
height=800)
at /skratch/prabhu/vtk/cvs/VTK/Rendering/vtkXOpenGLRenderWindow.cxx:1185
1185      if ((this->Size[0] != width)||(this->Size[1] != height))
Current language:  auto; currently c++
(gdb)

At this point I can step through the code and try and figure out what is going on. This is extremely convenient and handy.

Friday, January 25, 2008

Getting the current development snapshot of Mayavi

Hello world!

Mayavi is part of the Enthought Tool Suite (ETS). Enthought has reorganized its SVN repository in order to break up the various parts of ETS into smaller chunks with a very clear set of dependencies. This should make it easy for people to get parts of ETS rather than have to get it all as part of ETS (which is quite huge). Ideally users will install released versions of ETS using prebuilt eggs or native packages (debs or rpms). However, if one wants to get the bleeding edge from SVN things can get messy. This is because mayavi depends on several other ETS packages and manually getting the right versions of each package from SVN can be a huge pain. In order to make life easier in this regard Dave Peterson has created a package called ETSProjectTools that let a user/developer checkout a particular package from ETS along with all its dependencies as well. Let us pick mayavi as an example to demonstrate how this works.

Say you want to get the latest, greatest version of mayavi. The current version that you should look to get would be the up and coming ets-2.7.0b1 release. The release isn't out yet but will be soon. Here is how you get it:

Get hold of ETSProjectTools like so:

svn co https://svn.enthought.com/svn/enthought/ETSProjectTools/trunk ETSProjectTools

Build and install it:

cd ETSProjectTools
python setup.py install

This will give you several new scripts that will let us checkout the sources etc. etsco, etsup, etsdevelop etc. More information on these is here https://svn.enthought.com/enthought/wiki/SVNScripts
Now checkout ets==2.7.0b1 like so:
```
etsco "ets==2.7.0b1"
```
Be patient, the above will checkout all of the ETS-2.7.0b1 (that is yet to be released).

Now get the egg_builder script (this should soon be part of ETSProjectTools but isn't yet) like so:

cd ets_2.7.0b1
svn cat https://svn.enthought.com/svn/enthought/sandbox/egg_builder.py > egg_builder.py

Build the eggs for ets-2.7.0b1 (the command will build the eggs and put them in a "dist" directory under the current directory):
```
python egg_builder.py
```
Now install mayavi:
easy_install -f dist -H dist enthought.mayavi

This will install enthought.mayavi from the built eggs. If you want to keep tracking enthought.mayavi as it develops in SVN you may want to instead do:

cd enthought.mayavi_2.0.3a1
python setup.py develop -f ../dist -H ../dist

This will install all the dependencies for mayavi as eggs and make an egg-link for mayavi alone so you can keep track of it via SVN.

If you want to keep track of every package you could always do:

cd ets_2.7.0b1
etsdevelop .

This will do a setup.py develop in each and every subdirectory of ets_2.7.0b1.

Note that the ets* scripts only get the sources of other ETS packages and not everything else! Building ETS requires several other external packages like a compiler, swig, numpy, etc. For a complete list check the pages here:
https://svn.enthought.com/enthought/wiki/Build

The dependencies needed for Ubuntu Gutsy are listed in the ubuntu specific page for the last release (2.6.0b1).

As you can see, all the complexity of getting the ETS sources is made easy via etsco and the other scripts. In the future, as mayavi grows at its own pace, these tools will let us specify dependencies within ETS and thereby make it easier to cut new releases. This also makes it relatively easy to develop different branches that have entirely different dependencies.

Phew! That was a massive first blogspot post!
Enjoy,
Prabhu

Prabhu Ramachandran