TASKS I'VE COMPLETED THIS WEEK:
- developed algorithm in script to find entire blocks of added/removed content containing condition variable routines, then compare them to each other to distinguish MOVEs from ADD/REMOVE pairs. This is useful because in certain revisions, there are lines which are added and removed in multiple locations so if we have entire blocks of code (more than a single consecutive line) added in one place and removed in another, we know this most likely can be classified as a move. I also wrote the part of the script that classifies single lines which are added and removed as MOVEs only if the single line is just added once and just removed once. Otherwise, it is too ambiguous to be classified as a MOVE and these cases will just require manual evaluation.
- wrote a separate script that tracks the number of references made to every single condition variable in the HEAD revision of the Apache server project, i.e. if the line apr_cond_signal(this_cond_var, this_mutex) is encountered, this would count as a reference to this_cond_var. The purpose of this script's findings is to help us later check whether the more commonly used condition variables are changed more in development. We (obviously) predict that this should be the case.
- the findings of the above script were a bit surprising because I found that there are only 19 condition variables, with 5 of them actually being pointers (so really we only have 14 actual global condition variables) with a total of 44 uses throughout the project. I believe this seems a bit light so I've looked back over the script, which seems to be correct, and considered whether I had missed any condition variable routine APIs in my initial surveying of the Apache Portability Runtime library but I don't see anything. Although these results are surprising, I have noticed in the manual analysis of the code repository I've done so far that Apache's source code is organized such that all handling of the request and response queues are conducted in only three different directories and not spread over many files. Since this is the part of the server that primarily uses the signal/wait routines, I can see how it's possible that not many condition variables would be initialized. The next step Professor Lu and I have planned is for me to write a script that searches explicitly for the initialization of global condition variables just to check the findings of this other script.
- Unfortunately, my main script tracking ADDs/REMOVEs/MOVEs/MODIFs has still not managed to run the entire way through the ~2 million revisions without encountering errors, so I've been continuing to address these as they come up. Since it often gets through ~50,000 revisions before encountering an error, addressing them all is a slow process but today I put a try/except block around the call to my script so that instead of stopping its progress it will move on and just print the error for me to address later. This way I can go address all the remaining errors at once and have some results to analyze in the meantime (not sure why I didn't think of this fix earlier, but now it'll be just a couple days until I have all the results and errors and can really get close to finishing this initial version of the script)!
- developed algorithm in script to find entire blocks of added/removed content containing condition variable routines, then compare them to each other to distinguish MOVEs from ADD/REMOVE pairs. This is useful because in certain revisions, there are lines which are added and removed in multiple locations so if we have entire blocks of code (more than a single consecutive line) added in one place and removed in another, we know this most likely can be classified as a move. I also wrote the part of the script that classifies single lines which are added and removed as MOVEs only if the single line is just added once and just removed once. Otherwise, it is too ambiguous to be classified as a MOVE and these cases will just require manual evaluation.
- wrote a separate script that tracks the number of references made to every single condition variable in the HEAD revision of the Apache server project, i.e. if the line apr_cond_signal(this_cond_var, this_mutex) is encountered, this would count as a reference to this_cond_var. The purpose of this script's findings is to help us later check whether the more commonly used condition variables are changed more in development. We (obviously) predict that this should be the case.
- the findings of the above script were a bit surprising because I found that there are only 19 condition variables, with 5 of them actually being pointers (so really we only have 14 actual global condition variables) with a total of 44 uses throughout the project. I believe this seems a bit light so I've looked back over the script, which seems to be correct, and considered whether I had missed any condition variable routine APIs in my initial surveying of the Apache Portability Runtime library but I don't see anything. Although these results are surprising, I have noticed in the manual analysis of the code repository I've done so far that Apache's source code is organized such that all handling of the request and response queues are conducted in only three different directories and not spread over many files. Since this is the part of the server that primarily uses the signal/wait routines, I can see how it's possible that not many condition variables would be initialized. The next step Professor Lu and I have planned is for me to write a script that searches explicitly for the initialization of global condition variables just to check the findings of this other script.
- Unfortunately, my main script tracking ADDs/REMOVEs/MOVEs/MODIFs has still not managed to run the entire way through the ~2 million revisions without encountering errors, so I've been continuing to address these as they come up. Since it often gets through ~50,000 revisions before encountering an error, addressing them all is a slow process but today I put a try/except block around the call to my script so that instead of stopping its progress it will move on and just print the error for me to address later. This way I can go address all the remaining errors at once and have some results to analyze in the meantime (not sure why I didn't think of this fix earlier, but now it'll be just a couple days until I have all the results and errors and can really get close to finishing this initial version of the script)!