I wrote code that enabled me to read SQL 2000 deadlocks from the log by converting it into text that showed the tables and indexes involved and the process that was running. Certainly was a bug we couldn't reproduce but with the log we were able to figure out the odd locking order that was causing it with a read and an update. The update locked the table page and then tried to lock the index. The read had tried to use the index to read the table. The update couldn't lock the index, the read couldn't look at the data page. Altered the read to retrieve just the clustered PK field values. (A covering index read) Then read the table using the PK values so it is delayed by the update, not deadlocked.
PS Your sign-in procedure exposed a bug. I wrote a comment, did not have a Google account, so signed up for it. Google sign-up procedure was successful, but then blew up when I tried to return to submit my comment. Had to rewrite the comment after going back in web history to your article. So, did I just find an unreproducible bug in your blog?
Most "unreproducible bugs" are caused by using the execrable "C" language, with its inbuilt design flaws. Anyone programming in "C" (for a safety critical application) should be indicted for crimes against humanity. The total cost to the world for avoidable bugs directly as a result of employing "C" code or libraries possibly exceeds several billions of dollars. C++ is a willing accessory to these crimes against humanity.
Well written Anthony - thought provoking... I have been in testing for more than 6 years now and it happens all the time when a reported bug doesn't reproduce especially when a developer updates you about it...
This is a great list of guidelines to follow... thanks for sharing...
Interesting. We recently conducted an empirical study to characterize non-reproducible bugs. If you're interested, the paper is available here: http://salt.ece.ubc.ca/publications/docs/mona-msr14.pdf
I'm not going to agree with Kingsford Gray on the C language bias standpoint at all. That's not the topic. But it is the paradigm! Some parts of a testers job do involve understanding the specific vulnerabilities the software under test has, based not only on historical knowledge of the kind of bugs your dev team inject, but also on the kind of bugs the software production process itself injects. As well as the compiler-toolchain and finally the language chosen to implement the software. So back to taking out locks and synchronization, when it comes to test-code I'm going to disagree with the author Mr Vallone. Test code and instrumentation code introduces synchronisation windows in the software being tested a lot of the time. Normally the "window" is quite wide like when turning on logging all threads end up having to access one log file. The tester who knows the exact affect this has on the system as well as about all the other synchronization points in the software will be able to exploit them to find timing bugs. And on this point I am your keen follower Anthony, being able to test timeouts properly as if they are simply another axis to your test driven approaches or simply another table-driven test test-case can uncover lots of bugs the devs might take years to find. Great guide, packed with nuggets :)
Simply stated and enlightening. I am certainly going to incorporate these as soon much as possible. Thanks for sharing your thoughts.
ReplyDeleteI wrote code that enabled me to read SQL 2000 deadlocks from the log by converting it into text that showed the tables and indexes involved and the process that was running. Certainly was a bug we couldn't reproduce but with the log we were able to figure out the odd locking order that was causing it with a read and an update.
ReplyDeleteThe update locked the table page and then tried to lock the index. The read had tried to use the index to read the table. The update couldn't lock the index, the read couldn't look at the data page.
Altered the read to retrieve just the clustered PK field values. (A covering index read) Then read the table using the PK values so it is delayed by the update, not deadlocked.
PS Your sign-in procedure exposed a bug. I wrote a comment, did not have a Google account, so signed up for it. Google sign-up procedure was successful, but then blew up when I tried to return to submit my comment. Had to rewrite the comment after going back in web history to your article.
ReplyDeleteSo, did I just find an unreproducible bug in your blog?
Hi Ken, Thanks for letting us know. Actually, this might belong in the reproducible category :) I reported this bug to the blogger team.
DeleteMost "unreproducible bugs" are caused by using the execrable "C" language, with its inbuilt design flaws.
ReplyDeleteAnyone programming in "C" (for a safety critical application) should be indicted for crimes against humanity.
The total cost to the world for avoidable bugs directly as a result of employing "C" code or libraries possibly exceeds several billions of dollars.
C++ is a willing accessory to these crimes against humanity.
Good one Ken :D
ReplyDeleteWell written Anthony - thought provoking... I have been in testing for more than 6 years now and it happens all the time when a reported bug doesn't reproduce especially when a developer updates you about it...
ReplyDeleteThis is a great list of guidelines to follow... thanks for sharing...
Regards,
Aman
Great article, thanks for sharing it!
ReplyDeleteInteresting. We recently conducted an empirical study to characterize non-reproducible bugs. If you're interested, the paper is available here: http://salt.ece.ubc.ca/publications/docs/mona-msr14.pdf
ReplyDeleteI'm not going to agree with Kingsford Gray on the C language bias standpoint at all. That's not the topic. But it is the paradigm! Some parts of a testers job do involve understanding the specific vulnerabilities the software under test has, based not only on historical knowledge of the kind of bugs your dev team inject, but also on the kind of bugs the software production process itself injects. As well as the compiler-toolchain and finally the language chosen to implement the software. So back to taking out locks and synchronization, when it comes to test-code I'm going to disagree with the author Mr Vallone.
ReplyDeleteTest code and instrumentation code introduces synchronisation windows in the software being tested a lot of the time. Normally the "window" is quite wide like when turning on logging all threads end up having to access one log file. The tester who knows the exact affect this has on the system as well as about all the other synchronization points in the software will be able to exploit them to find timing bugs.
And on this point I am your keen follower Anthony, being able to test timeouts properly as if they are simply another axis to your test driven approaches or simply another table-driven test test-case can uncover lots of bugs the devs might take years to find. Great guide, packed with nuggets :)