time for((c=1;c<=20;c++)) do
./test2to3.sh -f isinstance -f raise -f zip -f raw_input -f buffer \
../django-3k/django/core/servers/basehttp.py
done yielded: real 0m10.324s user 0m9.709s sys 0m0.292s The results of a control run with the original code were these:
real 0m11.237s user 0m10.701s sys 0m0.308s
(These were actually the best out of 10 runs, given time's inconsistency; results were still lower on average. I'll redo the tests with profile when I'll have more time)
That's only a slight improvement and there's a catch, basehttp.py is quite a large file. But I'm happy with the fact that it can at least run with roughly the same speed.
An issue I'm currently concerned with is that really common subpatterns may add a significant overhead when they are matched. For example )->trailer->power may be matched a few hundred times; this means that we'll have to update the match sets of perhaps 20-30 fixers, and check if the fixer matches every single time.
If we remove common subpatterns, it will probably speed things up(for example, for the case of raw_input we can throw away all subpatterns except the one starting with 'raw_input'); but then the matcher won't be 100% accurate(although only in very few cases on well-formed code). It can however send candidate nodes to the existing matcher(that's what it does anyway right now to fill the pattern variables) which can reject them. Plus this could also deal with negated patterns; just ignore them and the original matcher will do the work. I'll give it a try next week to see if it justifies the added complexity.
Plus I changed the blog's template to something that won't hurt your eyes as much.