Joe Ganley

I make software and sometimes other things.


Due to Blogger's termination of support for FTP, this blog is no longer active.

It is possible that some links from here, particularly those within the site, are now broken. If you encounter one of those, your best bet is to go to the new front page and hunt for it from there.

Most, but not all, of the blog's posts are on this page; the archives are here.

NP-completeness isn't just a theoretical notion of interest only to pointy-headed computability nerds. When faced with a problem, it is important to determine whether it is NP-hard (or not) because if it isn't, you don't want to write some heuristic to produce nonoptimal solutions when it is possible to produce optimal solutions efficiently. Conversely, you don't want to waste your time trying to devise optimal solutions to an NP-hard problem, because you will surely fail.

Often, the reduction from your problem to a known NP-complete problem is obvious. When it isn't is when things get interesting. If you have trouble proving a problem NP-complete, it is often - but certainly not always - because the problem in fact isn't NP-complete. After you've done a lot of these proofs, you start to get a feel for how many degrees of freedom in the solution space might still be apprehended in polynomial time, and how many will probably lead to NP-completeness. Often, the magic boundary is between 2 and 3 degrees of freedom: 2-coloring a graph can be done efficiently, while 3-coloring is NP-complete, and likewise bipartite matching is in P while three-dimensional matching is NP-complete. However, some of the NP-completeness proofs that have been devised are incredibly complicated and nonobvious, so don't take your difficulty in proving NP-completeness as strong evidence that the problem is not NP-complete.

A productive approach is often to bounce back and forth between trying to devise an efficient algorithm and trying to prove NP-completeness. Your failures at proving NP-completeness can produce inspirations about how to attack the problem algorithmically, and your failures at devising an efficient algorithm can lead to insights into how to establish NP-completeness. One of my favorite examples from my own work was to solve a system of rectilinear orientation constraints. You had a bunch of objects in two dimensions that could have one of the eight axis-parallel orientations, and a set of constraints between pairs of them: That they must have the same orientation, have orientation mirrored about one or the other axis, have orientation rotated 90 degrees from one another, or any compatible combination of these. My initial attempts to prove it NP-complete focused on looking at it as an 8-coloring problem, but the problem didn't seem quite general enough to encode a generic graph coloring problem as a system of these constraints. Ultimately these failed efforts to prove NP-completeness led to the crucial insight to solving the problem efficiently: That the constraints could be viewed as three independent systems of constraints with respect to the three elements of the orientations: mirroring about x, mirroring about y, and 90-degree rotation. Each of those could be formulated as a graph 2-coloring problem independent of the other two problems, and thus efficiently solved. The independence of these three 2-coloring problems was the source of the difficulty in proving NP-completeness, and I'm certain I made that cognitive leap as a result of my attempts to prove the problem NP-complete.

Once you've determined that your problem is NP-complete, that still doesn't necessarily mean it is hard in practice. NP-completeness means that the problem is hard in the worst case and for general inputs. There are some NP-complete problems for which algorithms have been devised that, while technically heuristics -- they cannot guarantee to find optimal solutions for all problem inputs -- nonetheless in practice almost always do find optimal solutions to problem instances that arise in the real world. This is true, for example, of the satisfiability problem: Determining whether there is an assignment of values to boolean variables that makes a particular logic formula evaluate to true.

Further, it is often the case that the inputs your problem will be presented with in whatever real-world situation you are trying to solve will not, in fact, be fully general. Often, an NP-complete problem is no longer NP-complete if the inputs are known to have some kind of structure more restricted than the general case. A column by David Johnson, for example, addresses what graph problems turn out to be efficiently solvable if the input graphs are restricted to specific classes of graphs, e.g. trees, planar graphs, chordal graphs, bipartite graphs, and many more. If you know that your inputs are restricted in some way, sometimes a problem that is NP-complete in general will turn out to be efficiently solvable for those kinds of inputs.

Subsequent posts will address what to do if your problem does turn out to be hard in practice for the kinds of inputs you really see.

Labels: , ,

Comments (0)
The comment stream over at Coding Horror has gotten quite toxic, with those defending the original post seeming to say that since it was just supposed to be a lay introduction, it doesn't need to be exactly correct. I find that attitude pretty incomprehensible, so I thought I'd try to write a really simple intro to NP-completeness that is actually right.

There is a class of problems known as NP-complete, as well as a similar class called NP-hard that are at least as difficult; for this discussion you can consider those terms interchangeable. These problems have the property that if any of them can be solved efficiently (that is, in polynomial time), then they all can be. Further, despite much effort by many smart people, no efficient algorithm has ever been devised for any of them. This leads to the prevailing opinion that none of the NP-complete/NP-hard problems can be efficiently solved. You can know that a problem you face is NP-hard if a magic oracle that solved your problem instantly could be used to solve another problem that is known to be NP-hard in polynomial time (hundreds of such problems are known). If this is true, you have a number of options (which I will elaborate upon in future posts), but the most likely one is to devise a heuristic that will produce [hopefully] good, but not necessarily optimal, solutions.

(Aside: The title of this post might be my most obscure ever; +1 to anyone who can identify it without consulting the web.)

Labels: , ,

Comments (0)
Recently Jeff Atwood posted about NP-completeness. I and a few other commenters complained that he made a bit of a mess of it. It's easy to be a critic, so I thought I'd give it a go myself. I intend to write a series of posts; this first one will define NP-completeness (correctly), and subsequent posts will talk about various ways of attacking NP-hard problems.

The executive summary is that there is a large class of problems that are NP-complete, all of which probably cannot be solved on general inputs more quickly than in time exponential in the input size. There is no known polynomial-time algorithm for any of them, and smart money says none exists. This is what is meant by the "P=NP" debate; if a polynomial-time algorithm for any NP-complete problem were found, that would imply that P (the class of polynomial-time solvable problems) and NP (the class of NP-complete problems, which we'll define in a moment) are the same. This almost certainly isn't true, though, which in practice means that none of these problems can be solved in time that is not exponential in the size of the input. For a few reasons, this isn't quite as grim as it sounds, as will become clear in subsequent installments.

For a problem to be in the class NP, it must be possible to verify the validity of a solution to the problem in polynomial time. Mostly in NP we see the decision versions of the optimization problems we actually care about: "Does there exist a traveling salesman tour for these points whose length is less than 100?" rather than "Find me the shortest traveling salesman tour for these points." You can easily see that the optimization problem can be used to solve the decision problem, and is thus at least as hard. Often the decision problem is in NP (given a set of points and a tour, it's trivial to determine whether the tour is shorter than the given bound), while the optimization problem is not (there's no polynomial-time way to verify that a tour is the shortest possible, unless P=NP).

To be NP-complete, a problem must be in NP, and it must also be possible to transform any instance of some known NP-complete problem into an instance of our problem, and to transform a solution to our problem back into a solution to that one. This is where we get the all-or-nothing nature of the P=NP conundrum: If you had a polynomial-time algorithm for any NP-complete problem, you could use it to solve every other NP-complete problem via these transformations. Devising such a transformation is at the heart of proving a problem NP-complete.

A problem that can be transformed to an NP-complete problem, but isn't (or isn't known to be) in NP, is referred to as NP-hard rather than NP-complete. The optimization versions of many of the NP-complete decision problems fall into this category.

After you've seen a number of NP-hard problems, you start to get an intuition for whether a new problem is likely to be NP-hard or not. Be careful, though; sometimes problems that 'feel' NP-hard actually aren't, and vice-versa.

A few of the classic NP-complete problems are:

  • Satisfiability (aka SAT): Given a boolean formula, is there a set of assignments to its variables that makes the formula true?
  • Vertex cover: Given a graph, is there a set of fewer than n vertices such that at least one endpoint of every edge is in the set?
  • Bin packing: Is it possible to pack some sized objects into a set of sized bins?
    See Wikipedia for a longer list, though many of them aren't defined. I may have to fix that.

    Garey and Johnson's Computers and Intractability: A Guide to the Theory of NP-Completeness is the book to read if you want more; it's one of the better-written CS books I've ever encountered. (I believe somewhere my wife has a photo of me sitting on the beach reading this book. I'm such a nerd.) David Johnson also wrote a column on NP-completeness for Journal of Algorithms for a number of years; it is also excellent reading, and is available online!

    Labels: , ,

    Comments (0)
    Via the Beautiful Code blog, a couple of nice sorting results: First, the proportion extend sort, a recursive sort that allegedly outperforms quicksort in both theory and practice. (Warning: Their code is pretty hard to understand, undercommented and obfuscated.) Second, the Python distribution contains a very nice writeup of the sorting routing Tim Peters used to implement listsort. The filename is listsort.txt, and it describes a nicely engineered adaptive mergesort with some elegant tricks to make it perform better on lists that already contain some order.

    Labels: ,

    Comments (0)
    Consider the problem of choosing a random element from a linked list whose length you do not know. Obviously, you could count the length of the list, then pick a random number between 1 and length, and then walk to that element, but there is a clever way to do this in one pass.

    The algorithm is as follows:

    Element choice;
    int count = 0;
    for (Element e = head; e != NULL; e = e->next) {
       if (rand01() <= (1.0 / ++count)) {
          choice = e;

    You can easily show that when this finishes, choice is equal to any given element with probability 1/n, where n is the number of elements: The probability of setting the ith element to be the new choice is 1/i. The probability of not selecting any of the subsequent elements to be the new choice is the product as j goes from i+1 to n of (1 - 1/j), which if you work out the math ends up as i/n. Multiply the two probabilities together and you get 1/n.

    Labels: ,

    Comments (0)