Monday, July 12, 2010

Instructions for running the new matcher algorithm so far

I've added some logging messages to the bottom matcher module to illustrate the matching process.

Now, in order to test the new module:
1) Checkout  a local copy:
svn checkout http://2to3-speedup2.googlecode.com/svn/trunk/ 2to3
2) Create a small file for testing. Add a few statements plus a statement that matches a fixer.
For example:


#test.py
import sys


buffer("Hello World")


(we'll try to match fix_buffer). More complex files can be tested of course, but it's harder to keep track of the process for a larger file.
3)Run ./test2to3.sh with -f options for the current working fixers(buffer, raw_input, isinstance, raise etc.). Add the -v option to get the tracing messages.


For example run:



./test2to3.sh -f buffer test.py -v

First the automaton will be printed in DOT format. You can paste the result in http://graph.gafol.net/ to get a quick rendering.

This is the output from running the above command(some is omitted):



RefactoringTool: Adding transformation: buffer
RefactoringTool: Refactoring test.py
RefactoringTool: Adding fixer patterns to the automaton
RefactoringTool: The automaton is printed below in DOT
digraph g{
0 -> 1 [label=buffer] //[]
1 -> 2 [label=power] //[(0, 1)]
0 -> 3 [label=8] //[]
3 -> 4 [label=trailer] //[]
4 -> 5 [label=power] //[(0, 2)]
0 -> 6 [label=7] //[]
6 -> 7 [label=trailer] //[]
7 -> 8 [label=power] //[(0, 3)]
}



The constructed automaton looks like this:

 Now the automaton has been constructed and we start matching from each leaf up to the root of the AST.


RefactoringTool: Running automaton for each leaf
RefactoringTool: -------------------------
RefactoringTool: Leaf ID:0, content is Leaf(7, '(')
RefactoringTool: -------------------------
RefactoringTool: Leaf ID:1, content is Leaf(8, ')')
RefactoringTool: -------------------------
RefactoringTool: Leaf ID:2, content is Leaf(1, 'power')
RefactoringTool: Matching failed; moving automaton back to state 0
RefactoringTool: Matching failed; moving automaton back to state 0
RefactoringTool: Matching failed; moving automaton back to state 0
RefactoringTool: Matching failed; moving automaton back to state 0
RefactoringTool: -------------------------
RefactoringTool: Leaf ID:3, content is Leaf(20, '<')
RefactoringTool: Matching failed; moving automaton back to state 0
RefactoringTool: Matching failed; moving automaton back to state 0
RefactoringTool: All nodes above have been checked; moving on to next leaf
RefactoringTool: -------------------------


The above leaves produced no results; the first couple are orphans, and the rest reset the automaton to the start state. The automaton won't even try matching higher when it's been reset, since there is no way a match can be produced. The process goes on until it gets some matches:


RefactoringTool: Leaf ID:44, content is Leaf(1, 'buffer')
RefactoringTool: Name 'buffer' matches : Moving from state node 0 to state node 1
RefactoringTool: Token power matches : Moving from state node 1 to state node 2
RefactoringTool: Subpattern #1 for fixer #0 matches for node 49
RefactoringTool: Matching failed; moving automaton back to state 0
RefactoringTool: -------------------------
RefactoringTool: Leaf ID:45, content is Leaf(7, '(')
RefactoringTool: Token 7 matches : Moving from state node 0 to state node 6
RefactoringTool: Token trailer matches : Moving from state node 6 to state node 7
RefactoringTool: Token power matches : Moving from state node 7 to state node 8
RefactoringTool: Subpattern #3 for fixer #0 matches for node 49
RefactoringTool: Matching failed; moving automaton back to state 0
RefactoringTool: -------------------------


While matching the automaton changes it's state until it reaches a matching state; in that case a subpattern's id is added to the current AST node and the node is tested for a whole fixer's match. For example, node 2 of the automaton contained a trigger for a subpattern #1 and node 8 for subpattern #3. Now node 49 is marked with both of them; if we can match subpattern #2 on the same node we get a full match.


RefactoringTool: -------------------------
RefactoringTool: Leaf ID:47, content is Leaf(8, ')')
RefactoringTool: Token 8 matches : Moving from state node 0 to state node 3
RefactoringTool: Token trailer matches : Moving from state node 3 to state node 4
RefactoringTool: Token power matches : Moving from state node 4 to state node 5
RefactoringTool: Subpattern #2 for fixer #0 matches for node 49
RefactoringTool: SUCCESS: All subpatterns of fixer #0 match for node 49.
RefactoringTool: The node's content is: 
Node(power, [Leaf(1, 'buffer'), Node(trailer, [Leaf(7, '('), Leaf(3, '"Hello World"'), Leaf(8, ')')])])
 which is effectively: 
buffer("Hello World")


The last subpattern for fix_buffer was matched for node #49, so the whole fixer matches. The node is sent to the old matcher for further processing.


RefactoringTool: Refactored test.py
--- test.py (original)
+++ test.py (refactored)
@@ -1,3 +1,3 @@
 import sys

-buffer("Hello World")
+memoryview("Hello World")
RefactoringTool: Not writing changes to test.py
RefactoringTool: Files that need to be modified:
RefactoringTool: test.py


This is roughly the whole process of the algorithm. The key point to note here is that the AST is traversed only once, using a combined-pattern automaton to match all fixers in a single pass. From an algorithmic point of view, the time complexity is O(n). Of course the extra effort needed to mark subpatterns and check the match sets for each fixer reduces somewhat this benefit, but for a sufficiently large number of patterns there should be a significant speedup.


On the issue of alternatives, I must admit I'm not very happy with my progress, as the task has proved much harder than I thought. The code is still a mess and I had to rewrite a large portion. Unfortunately I'll need some more time to get this over with, but hopefully it will be done soon.

No comments:

Post a Comment

Followers