Wednesday, July 21, 2010

Support for all default fixers and some great results!

The "characteristic path" approach proved excellent. Not only it proved sufficient for alternatives support, it also greatly simplified and speeded up the matcher. For one thing it eliminated the need for wildcards; it turns out that any meaningful pattern always seems to have a constant part that can be matched instead. Negated patterns are skipped as well. The results from running the new 2to3 on django-3k were:

real 1m39.585s
user 1m36.218s
sys 0m0.660s

The control run took:

real 3m18.960s
user 3m6.820s
sys 0m0.920s

That's around a 50% speedup. However there are still some bugs, and the output from the old and new algorithm won't match 100%(although in relatively few cases). Most of them look minor and I'll try fixing them this week. Re-transformed code is kind of an issue, and the current approach uses the old matcher for the transformed nodes(which adds around 10sec). And in some cases the old algorithm seems to be in error, changing for example x.values() to list(list(x.values())) instead of list(x.values()) (although I may be wrong here,  I'll check again with other versions).

'any' at higher levels is not supported, so the two(non-default) fixers that use won't work. I'm not sure if they are worth it, since supporting them may complicate the code and slow down the matcher. I'll look into that too.

I've  uploaded the code to the repository. You can test it using the instructions from the previous post(although there are now no debugging messages). The code is a mess of course and I'll try to clean it up the next days. But overall I'm really satisfied with these results, I think I've made some good progress and there is still plenty of time for more optimizations. Off for a beer.

No comments:

Post a Comment

Followers