Monday, June 7, 2010

Some benchmarking results, and an implementation plan

Well I'm still not able to run profile for building django-3k and I'm not sure what it is that I'm doing wrong. But anyway, at least I got it to work for a single file. I used feedgenerator.py from the the django-3k codebase and here is the result, using gprof2dot:


No surprises here, the program spends most of its in the pattern matching functions, as it was already known(the green boxes down-left). Since a significant overhead in python is function calls, I'm still confident that a more iterative pattern matching algorithm that will avoid the number of recursions will be able to speed things up. Anyway, as a first reference here are the results of time:
Command being timed: "/usr/local/bin/python3 /home/giorgos/dev/gsoc/django-3k/setup.py build"
User time (seconds): 386.96
System time (seconds): 0.87
Percent of CPU this job got: 97%
Elapsed (wall clock) time (h:mm:ss or m:ss): 6:37.14
....
Note that I've built python with threads disabled, because they often mess up the profiler. But it shouldn't matter, results are always relative and threads are not used internally in 2to3.

Now, analyzing the algorithm I came up with, I'm pretty sure that theoretically it should offer no benefit compared to the existing one, i.e. I believe it runs the same number of comparisons. Of course preprocessing a few parts may help overall, but still. On the other hand, as I mentioned above I think a major issue in the existing algorithm is its recursive nature; the algorithm I propose may benefit from being a bit more "linear".

Anyway, running a few tests it seems to me that the most (practically) computationally intensive part is obtaining the bottom-up strings, and doing so every time the tree is altered. Profiling showed that it can take up to 0.5 secs to obtain the full set; that's quite a lot, and usually 1-2 fixers are applied per file so multiply that by 2. That's not so bad, but I'll see in the future if there is a faster way to obtain it.

So a portion the algorithm in pseudocode currently is something like this:
get_bottom-strings:
#Bottom strings will be lists of the format[(NodeId, TOKENTYPE), ...].
  for leave in leaves:
    return leave.type + parent.bottom_string() #recursively obtain bottom-up strings

get_pattern_dict:
  for fixer in fixers:
    reduce_repeaters(fixer.pattern_tree) # reduce repeaters to obtain a constant tree
        -> WildcardPattern is reduced to its MIN; this may actually create nodes
    for leave in pattern_tree.leaves:
      obtain bottom-up string
    return pattern_dict
Then we use the pattern_dictionaries to build the AC automaton. Now simulating the AC automaton in python may be too slow for our purposes here, so perhaps a C version could be considered in the future if that's an option. The automaton is constructed by adding each string at a time as detailed in the Aho-Corasick paper, and nodes will be represented by a ACNode type with a dictionary as a transition table. On each final node callbacks will be added to mark the position of the match. Finally a match dictionary with nodeId as the key will be used to check whether at any point a fixer has been matched.

So that pretty much covers it. I'm at the stage of planning how to fit all these in the existing code. I hope I'll have some time this week(my term exams start on Monday) to push this further.

No comments:

Post a Comment

Followers