Saturday, April 18, 2015

Hindenbugs, Heisenbugs and other types of software bugs

Different Types of Software Bugs

    Sorry, I haven't gotten around to creating a new post in close to a month (yikes!). I have just been super busy at work and have been trying to complete about 3 out of several not-quite-finished personal projects that I will be releasing on GitHub and blogging about

    1) A software bug that disappears or alters its behavior by the action of attempting to debug it.
    Possible sources: The effect of debuggers on the run-time code or JIT can be one source. Another might arise because of process scheduling or threading and race conditions. Or both, as stepping through the code with a debugger will affect the timing of threads.
    Derivation: Werner Heisenberg and his Uncertainty Principle.

    1) A bug whose causes are so complex it defies repair, or makes its behavior appear chaotic or even non-deterministic. Other symptoms include core dumps that, when viewed in an editor, form complex but repeating patterns or designs of characters and symbols.
    Possible sources: Operating system environment or configuration, the presence or absence of file system objects, time or scheduling-dependent code. Also, a very complicated state machine esp. one with more than one variable that hold state information and where transitions may not exists for every possible permutation of one state to another.
    Derivation: Benoît Mandelbrot and his research in fractal geometry.

    1) A bug that manifests itself in running software after a programmer notices that the code should never have worked in the first place.
    Possible sources: A combination of naughty coding practices such as coding by coincidence and something else that allows for unexpected behavior such as a the effect of custom configurations or compiler optimizations, code obfuscation, or the JIT or on code. Also, a dependency on some arcane piece of technology that no one truly understands or employment of a black-box system or library, or some ancient, dinosaur system with a highly specific configuration that is so fragile that no-one dares update it because said updates could likely or is known to break functionality and where even a reboot is considered hairy.
    Derivation: Named for Erwin Schrödinger and his thought experiment.

    1) In contrast to the Heisenbug, the Bohrbug, is a "good, solid bug", easy to hunt down, or easily predicted from the description, esp. when a bug is the result of a classical programming mistake, or 'a rookie move'.
    Possible sources: Failing to initialize a variable or class to anything other than null, failure to RTFM, misuse of pointers or general abuse of memory, not coding defensively, coding by coincidence or not truly understanding how (or why) your code works.
    Derivation: Named after the physicist Niels Bohr and his rather simple atomic model.

    1) A bug with catastrophic consequences, esp. one that actually causes the server to burst into bright, hot flames.
    Possible sources: Code that dynamically builds and executes SQL scripts, esp. scripts that make use of the DELETE FROM command, code that is self modifying or replicates, or code that controls large machines or a physical system such as pumps, belts, cooling systems, aggregate pulverizer, fans, or any sort of thing with large, rotating blades. Also, any software that controls, monitors, tests, or is in any way whatsoever, wired up to a tactical nuclear defense system. In fact, lets just include anything with 'nuclear' or 'pulverize' in the name.
    Derivation: Refers to the Hindenburg disaster. The Hindenburg Blimp was, in turn, named after Paul von Hindenburg, the then-president of Germany from 1925->1934.

    Kishor S. Trivedi has a great talk/slideshow about Software Faults, Failures and Their Mitigations. He posits that it is not realistic to write software that is 100% bug free, or have 100% up-time. He displays the downtime in terms of minutes per year (MPY?) of several reliable corporations to support his claims, and it is true that as far as I am aware,there have not been any software ever written that did not have some downtime, even NASA.
    However, Trivedi shares a quote by one E. W. Dijkstra: "Testing shows the presence, not the absence, of bugs". Aww, logic. Now that's something I can get behind. Indeed; It is impossible to provably show that any software (of sufficient complexity) is absolutely error-free.
    Following this, he suggests we should not strive for the virtually impossible task building fault-free software, but rather aim to build instead fault-tolerant software.

Software Faults -> Software Errors -> Software Failures

Software Faults lead to Software Errors that lead to Software Failures.

Trivedi defines faults, errors and failures:
      -Software Failure occurs when the delivered service no longer complies with the desired output.
      -Software Error is that part of the system state which is liable to lead to subsequent failure.
      -Software Fault is adjudged or hypothesized cause of an error.Faults are the cause of errors that may lead to failures

    So, how does one test their tactical nuclear defense system code? VERY carefully, and it probably wouldn't hurt to comment out the Launch() method as well.

No comments: