This week I've been working primarily on speeding up the analysis script. Since the initial version of the code was used to examine critical section change, it checks out the old version of the code as well as the diff content and runs extensive analysis on both of these. Professor Lu and I determined that such an extensive code reconstruction wouldn't be necessary but removing this forced me to make a lot of changes to the structure of the rest of the script.
I came up with the idea to write a separate script that determines which revision numbers contain synchronization-related content and then use this output as input for the analysis script. This way, the script doesn't waste time reconstructing any irrelevant code. This script took a couple days to write because I wanted to be extremely thorough in checking for all the synchronization-related commands (since the script takes a long time to run on even the faster server, I don't want to have to run it more times than necessary).
Besides major refactoring to improve performance, I also reworked the script a bit after analyzing the output it was having. It counted many lines as MOVES because certain lines were removed and then added back in the same spot just in different context (i.e. the code surrounding certain lines was changed and the developer just happened to choose to retype that line instead of working around it). While technically the line was moved, I decided this should be counted as a separate change category. However, it was surprisingly difficult to pinpoint this type of change via the script so it took a while to get this worked out.
I came up with the idea to write a separate script that determines which revision numbers contain synchronization-related content and then use this output as input for the analysis script. This way, the script doesn't waste time reconstructing any irrelevant code. This script took a couple days to write because I wanted to be extremely thorough in checking for all the synchronization-related commands (since the script takes a long time to run on even the faster server, I don't want to have to run it more times than necessary).
Besides major refactoring to improve performance, I also reworked the script a bit after analyzing the output it was having. It counted many lines as MOVES because certain lines were removed and then added back in the same spot just in different context (i.e. the code surrounding certain lines was changed and the developer just happened to choose to retype that line instead of working around it). While technically the line was moved, I decided this should be counted as a separate change category. However, it was surprisingly difficult to pinpoint this type of change via the script so it took a while to get this worked out.