Throughout my progress in manually classifying the changes sampled by my script, I've met a few setbacks. Having the opportunity to take in-depth looks at the new and old version code as well as the corresponding DIFF content for many revisions allowed me to evaluate the way my original analysis script was classifying certain things. For example, it refrained from determining a certain line to be a MOVE if that line was added and subtracted in multiple areas throughout the DIFF content (for one revision). Although in general this was a pretty good idea since certain lines were added and subtracted too frequently to be sure of where the actual intent was to move them, I realized that the DIFF content is conveniently divided into sections of code change (individual change blocks), which should be a more accurate limiting area for an upper limit on adds/subtracts within it. So, instead of considering something an ADD+REMOVE instead of a MOVE if it appears added and removed multiple times within the entire DIFF content, I do this now only if it is added and removed multiple times within one change block. So far this has helped tighten up the classifications a bit, but figuring out a better system for this was a bit of a setback since I didn't want my final graphical data to have a falsely large number of adds and removes and hardly any moves (which was previously the case and HAD been some cause for concern).
After figuring out a better system for that, I continued to perform manual analysis on the different changes my script had found. I found unfortunately that a majority of the changes the script has found are refactoring changes. Around revision 91,000 it appears that a new API is introduced to replace the POSIX pthread_cond library. I already knew about the existence of this API (it's the APR_cond API I mentioned in the beginning weeks) however, this change from one condition variable API to another created a sudden influx of removes and adds that really hardly count as adds or removes but more as replaces. However, since the APIs are slightly different and take different parameters, considering them REPLACES is kind of iffy. So, I have a meeting scheduled at the beginning of next week with Professor Lu where we will speak about this trouble.
Besides this problem, the other type of change I've found very commonly is small moves to avoid race conditions. For example, often a signal will be moved from after a mutex unlock to before the mutex unlock, which makes sense since it's generally good practice to hold a lock when signalling on a condition. So any of the changed I've found that actually seem to be debugging changes are making sense to me-- there just unfortunately don't seem to be as many as we had hoped in comparison to refactoring changes!
After figuring out a better system for that, I continued to perform manual analysis on the different changes my script had found. I found unfortunately that a majority of the changes the script has found are refactoring changes. Around revision 91,000 it appears that a new API is introduced to replace the POSIX pthread_cond library. I already knew about the existence of this API (it's the APR_cond API I mentioned in the beginning weeks) however, this change from one condition variable API to another created a sudden influx of removes and adds that really hardly count as adds or removes but more as replaces. However, since the APIs are slightly different and take different parameters, considering them REPLACES is kind of iffy. So, I have a meeting scheduled at the beginning of next week with Professor Lu where we will speak about this trouble.
Besides this problem, the other type of change I've found very commonly is small moves to avoid race conditions. For example, often a signal will be moved from after a mutex unlock to before the mutex unlock, which makes sense since it's generally good practice to hold a lock when signalling on a condition. So any of the changed I've found that actually seem to be debugging changes are making sense to me-- there just unfortunately don't seem to be as many as we had hoped in comparison to refactoring changes!