Wednesday, June 30, 2010

Optimization changes, somewhat promising test results

I replaced the KeyError exception with a conditional inside the matcher's main loop, turns out it's actually slower. I also added a flag on each node to avoid revisiting higher nodes on each run, which was actually a significant speedup since it reduces the comparisons by ~30%! Some new test results now look promising. I'm currently focusing on single-file runs since they are simpler. Running the command below:

time for((c=1;c<=20;c++)) do 
./test2to3.sh -f isinstance -f raise -f zip -f raw_input -f buffer \
  ../django-3k/django/core/servers/basehttp.py
done

yielded: 

real 0m10.324s 
user 0m9.709s 
sys 0m0.292s  

The results of a control run with the original code were these:
real 0m11.237s 
user 0m10.701s 
sys 0m0.308s  

(These were actually the best out of 10 runs, given time's inconsistency; results were still lower on average. I'll redo the tests with profile when I'll have more time)

 That's only a slight improvement and there's a catch, basehttp.py is quite a large file. But I'm happy with the fact that it can at least run with roughly the same speed.

An issue I'm currently concerned with is that really common subpatterns may add a significant overhead when they are matched. For example )->trailer->power may be matched a few hundred times; this means that we'll have to update the match sets of perhaps 20-30 fixers, and check if the fixer matches every single time.

If we remove common subpatterns, it will probably speed things up(for example, for the case of raw_input we can throw away all subpatterns except the one starting with 'raw_input'); but then the matcher won't be 100% accurate(although only in very few cases on well-formed code). It can however send candidate nodes to the existing matcher(that's what it does anyway right now to fill the pattern variables) which can reject them. Plus this could also deal with negated patterns; just ignore them and the original matcher will do the work. I'll give it a try next week to see if it justifies the added complexity.

Plus I changed the blog's template to something that won't hurt your eyes as much.

No comments:

Post a Comment

Followers