Labels: kids art robin personal
|
Our friend Robin gave us these wonderful paintings of our two younger daughters for Christmas. Thank you, Robin!
Labels: kids art robin personal
I've finally gotten around to reading Norman's book The Design of Everyday Things.
This book probably belongs on the list
of books programmers haven't really read, which is a pity; it's a fantastic book. It covers many
of the same complaints I made about alarm
clocks, and it also brought to a head another of my biggest design complaints: iTunes.
I love Apple. By and large, their products are the very model of good design: Simple, intuitive, and elegant. However, even after eight versions, they've completely fumbled with iTunes; almost every time I use it, I find something new to dislike about it. First, simply from a gestalt point of view, it's a mess. Some functionality is in the menus (including the dreaded Advanced menu), and some is hidden behind buttons bearing cryptic icons, many of which provide little clue what they do. Within the iTunes window, these icons are all over the place; when I need to perform a task, there is little affordance as to where I should look for the widget that makes it happen. The functionality of a music player is intermingled with that of a music-library manager in a most haphazard way. And the Podcasts pane is even worse than the main one. Playlists are simply a broken model for how to organize music. Consider my family: My wife, my oldest daughter, and I each have an iPod. To try to subdivide our music library into songs that belong on each iPod, I have to have a playlist for all of us, one for my wife and I, one for my wife and daughter, one for my daughter and I, and one for each of us alone. When my two youngest girls get iPods, this is going to exponentiate; I won't need all 31 possibilities, but the number will at least double. And this doesn't even cover the case of subdividing by mood or genre. The right model is tags. I could then specify that, for example, my daughter's iPod should be stocked with songs tagged megan first, and the remaining space filled with (joe and not explicit) (she likes a lot of the same music I do). I know some of this can be done with smart playlists, but it wouldn't be easy and simple and effortlessly "just work" like it could. Also, this would further allow visualization and manipulation of my collection via tag clouds and similar devices like you see on tag-based sites like Flickr. Initially building a library is always an unpleasant experience in iTunes. Recently I installed a new hard drive for music. I reset iTunes to point there, and asked it to consolidate my library, copying songs from anywhere on the machine to the new drive, and organizing them its own way. This took forever, and many hours into it, I got a dialog informing me that the drive was full, and that iTunes would point to the song in its original location. The buttons on this dialog are OK and Cancel. Presumably OK just acknowledges and dismisses the dialog, but what does Cancel mean? Don't add this song to the library, or abort the entire consolidation? So many hours in, I don't want to abort the whole operation, so I hit OK. Now I get the same dialog again, presumably for the next song (it doesn't tell me what song it is complaining about). I hit OK again, and again, and again. Finally, frustrated, I hit Cancel, but I continue to get these dialogs a bunch more times, and then they stop. I can't tell now if I aborted the consolidation, or if it finished. I see no recourse other than to erase everything and start over, and to hit OK for every one of those dialogs (there ended up being a few hundred of them). There is no "Yes to all" or "Don't show this again" option. Part of the reason the disk is full is duplicates: Songs that appear in multiple places on my machine. For some reason, iTunes copies and inserts in the library every copy. I can think of no reason to include duplicates in a library, especially when iTunes is making its own copies of the songs. Mind you, I'm not talking about songs with the same title or filename; I'm talking about songs that are bitwise identical on the disk. I don't want these taking up disk space or appearing multiple times in my library (though you can turn off the latter), but I know of no way to avoid it. Similarly, the dialog you get if you try to add songs to a playlist that are already in it. You are asked whether you want to add them multiple times, or to skip the duplicates. They should have done this the Apple way: Simply don't enable the rare corner case where someone might want the same song in their playlist multiple times, and skip them always. Or at least give the dialog a "Don't ask this again" option. As an extra added bonus, one of its automatic upgrades deleted all of my ratings data. It's not just me; if you Google "itunes sucks" you get almost 18,000 hits, which cover most everything I've complained about here and many other equally legitimate issues. I'm going to move my whole collection to a Linux box running some open-source music manager (I haven't figured out which one yet). I hardly ever buy music from iTunes anyway, because I can get it DRM-free from Amazon without paying extra for the privilege. I'll miss the free songs from Starbucks, but not enough to put up with this piece of garbage any longer. Given how hard it is to move a music collection to a new platform, you can guess how likely I am to ever come back to iTunes, even if they make it awesome. Congrats, Apple - your sorry usability has lost you a customer from the iTunes ecosystem for life. It'd be shameful even for a company who doesn't have such a long history of great design. Maybe I should send them my copy of Design of Everyday Things; apparently the iTunes team hasn't read it.
Well, not really. But I did just discover that my Lisp in JavaScript interpreter is part of Google Chrome's test suite.
Labels: chrome, google, javascript, lisp, testing
NP-completeness isn't just a theoretical notion of interest only to pointy-headed computability nerds. When faced with a problem, it is important to determine whether it is NP-hard (or not) because if it isn't, you don't want to write some heuristic to produce nonoptimal solutions when it is possible to produce optimal solutions efficiently. Conversely, you don't want to waste your time trying to devise optimal solutions to an NP-hard problem, because you will surely fail.
Often, the reduction from your problem to a known NP-complete problem is obvious. When it isn't is when things get interesting. If you have trouble proving a problem NP-complete, it is often - but certainly not always - because the problem in fact isn't NP-complete. After you've done a lot of these proofs, you start to get a feel for how many degrees of freedom in the solution space might still be apprehended in polynomial time, and how many will probably lead to NP-completeness. Often, the magic boundary is between 2 and 3 degrees of freedom: 2-coloring a graph can be done efficiently, while 3-coloring is NP-complete, and likewise bipartite matching is in P while three-dimensional matching is NP-complete. However, some of the NP-completeness proofs that have been devised are incredibly complicated and nonobvious, so don't take your difficulty in proving NP-completeness as strong evidence that the problem is not NP-complete. A productive approach is often to bounce back and forth between trying to devise an efficient algorithm and trying to prove NP-completeness. Your failures at proving NP-completeness can produce inspirations about how to attack the problem algorithmically, and your failures at devising an efficient algorithm can lead to insights into how to establish NP-completeness. One of my favorite examples from my own work was to solve a system of rectilinear orientation constraints. You had a bunch of objects in two dimensions that could have one of the eight axis-parallel orientations, and a set of constraints between pairs of them: That they must have the same orientation, have orientation mirrored about one or the other axis, have orientation rotated 90 degrees from one another, or any compatible combination of these. My initial attempts to prove it NP-complete focused on looking at it as an 8-coloring problem, but the problem didn't seem quite general enough to encode a generic graph coloring problem as a system of these constraints. Ultimately these failed efforts to prove NP-completeness led to the crucial insight to solving the problem efficiently: That the constraints could be viewed as three independent systems of constraints with respect to the three elements of the orientations: mirroring about x, mirroring about y, and 90-degree rotation. Each of those could be formulated as a graph 2-coloring problem independent of the other two problems, and thus efficiently solved. The independence of these three 2-coloring problems was the source of the difficulty in proving NP-completeness, and I'm certain I made that cognitive leap as a result of my attempts to prove the problem NP-complete. Once you've determined that your problem is NP-complete, that still doesn't necessarily mean it is hard in practice. NP-completeness means that the problem is hard in the worst case and for general inputs. There are some NP-complete problems for which algorithms have been devised that, while technically heuristics -- they cannot guarantee to find optimal solutions for all problem inputs -- nonetheless in practice almost always do find optimal solutions to problem instances that arise in the real world. This is true, for example, of the satisfiability problem: Determining whether there is an assignment of values to boolean variables that makes a particular logic formula evaluate to true. Further, it is often the case that the inputs your problem will be presented with in whatever real-world situation you are trying to solve will not, in fact, be fully general. Often, an NP-complete problem is no longer NP-complete if the inputs are known to have some kind of structure more restricted than the general case. A column by David Johnson, for example, addresses what graph problems turn out to be efficiently solvable if the input graphs are restricted to specific classes of graphs, e.g. trees, planar graphs, chordal graphs, bipartite graphs, and many more. If you know that your inputs are restricted in some way, sometimes a problem that is NP-complete in general will turn out to be efficiently solvable for those kinds of inputs. Subsequent posts will address what to do if your problem does turn out to be hard in practice for the kinds of inputs you really see. Labels: algorithms, np-completeness, theory
The comment stream over at Coding Horror has gotten quite toxic, with those defending the original post seeming to say that since it was just supposed to be a lay introduction, it doesn't need to be exactly correct. I find that attitude pretty incomprehensible, so I thought I'd try to write a really simple intro to NP-completeness that is actually right.
There is a class of problems known as NP-complete, as well as a similar class called NP-hard that are at least as difficult; for this discussion you can consider those terms interchangeable. These problems have the property that if any of them can be solved efficiently (that is, in polynomial time), then they all can be. Further, despite much effort by many smart people, no efficient algorithm has ever been devised for any of them. This leads to the prevailing opinion that none of the NP-complete/NP-hard problems can be efficiently solved. You can know that a problem you face is NP-hard if a magic oracle that solved your problem instantly could be used to solve another problem that is known to be NP-hard in polynomial time (hundreds of such problems are known). If this is true, you have a number of options (which I will elaborate upon in future posts), but the most likely one is to devise a heuristic that will produce [hopefully] good, but not necessarily optimal, solutions. (Aside: The title of this post might be my most obscure ever; +1 to anyone who can identify it without consulting the web.) Labels: algorithms, np-completeness, theory
Recently Jeff Atwood posted about NP-completeness. I and a few other commenters complained that he made a bit of a mess of it. It's easy to be a critic, so I thought I'd give it a go myself. I intend to write a series of posts; this first one will define NP-completeness (correctly), and subsequent posts will talk about various ways of attacking NP-hard problems.
The executive summary is that there is a large class of problems that are NP-complete, all of which probably cannot be solved on general inputs more quickly than in time exponential in the input size. There is no known polynomial-time algorithm for any of them, and smart money says none exists. This is what is meant by the "P=NP" debate; if a polynomial-time algorithm for any NP-complete problem were found, that would imply that P (the class of polynomial-time solvable problems) and NP (the class of NP-complete problems, which we'll define in a moment) are the same. This almost certainly isn't true, though, which in practice means that none of these problems can be solved in time that is not exponential in the size of the input. For a few reasons, this isn't quite as grim as it sounds, as will become clear in subsequent installments. For a problem to be in the class NP, it must be possible to verify the validity of a solution to the problem in polynomial time. Mostly in NP we see the decision versions of the optimization problems we actually care about: "Does there exist a traveling salesman tour for these points whose length is less than 100?" rather than "Find me the shortest traveling salesman tour for these points." You can easily see that the optimization problem can be used to solve the decision problem, and is thus at least as hard. Often the decision problem is in NP (given a set of points and a tour, it's trivial to determine whether the tour is shorter than the given bound), while the optimization problem is not (there's no polynomial-time way to verify that a tour is the shortest possible, unless P=NP). To be NP-complete, a problem must be in NP, and it must also be possible to transform any instance of some known NP-complete problem into an instance of our problem, and to transform a solution to our problem back into a solution to that one. This is where we get the all-or-nothing nature of the P=NP conundrum: If you had a polynomial-time algorithm for any NP-complete problem, you could use it to solve every other NP-complete problem via these transformations. Devising such a transformation is at the heart of proving a problem NP-complete. A problem that can be transformed to an NP-complete problem, but isn't (or isn't known to be) in NP, is referred to as NP-hard rather than NP-complete. The optimization versions of many of the NP-complete decision problems fall into this category. After you've seen a number of NP-hard problems, you start to get an intuition for whether a new problem is likely to be NP-hard or not. Be careful, though; sometimes problems that 'feel' NP-hard actually aren't, and vice-versa. A few of the classic NP-complete problems are: See Wikipedia for a longer list, though many of them aren't defined. I may have to fix that. Garey and Johnson's Computers and Intractability: A Guide to the Theory of NP-Completeness is the book to read if you want more; it's one of the better-written CS books I've ever encountered. (I believe somewhere my wife has a photo of me sitting on the beach reading this book. I'm such a nerd.) David Johnson also wrote a column on NP-completeness for Journal of Algorithms for a number of years; it is also excellent reading, and is available online! Labels: algorithms, np-completeness, theory A New York Times article mentions offhand that a Presidential election could be won with just 22% of the popular vote, and Kottke wonders if this could possibly be true. It is. For example: Sort the states by decreasing electoral votes per voter, then greedily remove states from the bottom of the list until you are left with 270 electoral votes. If a candidate won each of those states (the ones shown in red) by just one vote, he/she would win the election with just 21.56% of the popular vote (note that this requires more than half the votes in Maine and Nebraska, which aren't winner-take-all). This might be the source of the NYT mention, but the scenario there requires about 700,000 more votes than mine, and he also missed the Maine/Nebraska subtlety. I used the same raw data as in that article.
In today's Washington Post was what is perhaps my favorite infographic I've ever seen. It plots, as a timeline, GDP growth, unemployment, and who was president from Hoover forward. There's an enormous amount of information in it, but you can read it like a book; indeed, I spent about an hour today doing so. Unfortunately, WaPo doesn't seem to have posted the graphic online; the only place I could find it requires registration (it looks like they require money, but if you register you can see a few articles for free).Labels: design, tufte, visualization
I have long wanted to be able to create in-cell sparklines in OpenOffice calc. There are a few commercial tools to do this, but they are made for Excel, and also they are proprietary. I've put together enough of a solution to prove the concept. The idea is to create a font containing bars, lines, etc., and then an OpenOffice macro generates strings of characters in this font from the spreadsheet data. To create the font, I used SVG, since it allows fairly simple specification of fonts. I generated the SVG using a Python script. I then used FontForge to convert the SVG font to a TrueType font. Then I wrote some simple OpenOffice macros to generate characters in this font from the spreadsheet data.
To use it, install the font on your system, and import the macros into OpenOffice. Then, in a cell where you want a sparkline, do This is pretty raw; like I said, it's just a proof of concept. I figured I'd go ahead and put it out there because I know there's a demand, and I'm hopeful that someone with a little more time to spend on it might be able to polish it up. Labels: openoffice, sparklines, spreadsheet
A little scrap of JavaScript/SVG that I threw together, with an obvious inspiration:
Note that if you're running Internet Explorer, what you're seeing is an image and not the actual script-generated SVG, so if it is no longer 02008, it'll be wrong. This is because IE is a piece of crap, or more specifically, because it doesn't have native SVG support and doesn't allow scripts to interact with plugins. Labels: clock, javascript, long now, svg
My daughter brought home a fun math problem last week; it took a little mental effort to solve, and I haven't found a solution on the web, so I thought I'd post one.
The problem is this: What is the largest n such that 1005! is evenly divisible by 10n? Put another way, how many 0's does the expansion of 1005! end with? Highlight the following blank space for the solution: Imagine writing out the expansion of 1005!, and then replacing each number with its prime factorization. The prime factors of 10 are 5 and 2, so our question is equivalent to asking how many (5,2) pairs this expansion contains. It is easy to see that it contains far more 2's than 5's, so the question becomes: How many 5's are in the expansion? Every number in the original expansion of 1005! that is divisible by 5 contributes a 5 in the prime-factorization expansion; there are 201 of these. However, every number that is divisible by 52, i.e. 25, contributes a second 5; this adds 40 more. Every number that is divisible by 53, i.e. 125, contributes a third 5; add 8 more. Finally, the one number that is divisible by 54, i.e. 625, contributes a fourth 5, adding one more and bringing the total count to n = 250.
I've long had a good handle on the core JavaScript language, but the other night I set out to understand the DOM and the event-handling model, and the result was this toy sudoku app. If any more seasoned JavaScript programmers see anything stupid in there, please comment.
Labels: games, javascript, programming
Paul Graham just posted a history of the T language written by Olin Shivers, and it got me reminiscing. T was a Lisp/Scheme dialect. I took a compilers class my first year in grad school (ulp - 17 years ago!), and the semester project was to write a compiler for a subset of Pascal (all that was missing, IIRC, was user-defined types) into object code for a hypothetical microprocessor (for which we were provided a virtual machine). At the time, I was ramping up on a master's research track around optimizing the compilation of purely-functional languages like Haskell for doing numerical computations. My C chops were very strong, I figured I wasn't going to learn much more about C programming from this project, and my research would need me to know T, so I got the bright idea of writing the compilers project in T. I brought this idea to my compilers professor; he thought this was a very bad idea, and tried to talk me out of it, but ultimately capitulated with the understanding that I would be free-climbing - if I fell down, he wasn't going to be sympathetic. (This is a man who once said that the only legitimate excuse for turning in an assignment late is your own death.) I'd never really programmed in Lisp before, so learning the language (as any Lisp programmer knows, a major change in mindset) while under the gun on a long, difficult assignment was quite a challenge. However, ultimately I got it done; as far as testing could determine, it worked perfectly, and it had about 30% as many lines of code as my classmates' C implementations. I had fun, I learned a lot about both Lisp programming and writing compilers, and it brought me some small renown in the department (I presented on the project to a number of compilers and programming languages classes). Later, when I told this story in job interviews, my interviewers seemed to appreciate the accomplishment, but especially the attitude involved in bucking the system in this way. Not long thereafter, my would-be advisor left the department, and I ended up doing algorithms research instead; this would end up being a career, and I love it, but I still feel a little wistful about languages/compilers research. Still today, T is my favorite language I've ever written in, though Python is closing fast. BTW, in the unlikely event that anyone is interested, here is the source code for the compiler.
Labels: compilers, programming
Paul Graham lists startup ideas that he would like to fund, but the one I'd most like to see isn't on the list: personal information management. I'd like a service that puts all of my info in one place - contacts, bookmarks, notes, photos, etc. There would be a wide variety of ways to get information into the system: By email, as bookmarks, by phone, as paper, etc. It would be indexed by content, and also optionally by tagging it manually. There would be a (login-protected) search engine that I could use to find my information, and it would also be mirrored onto my local machine so that I could access it when offline. It would also serve me, via an email or RSS digest, changes to my information that it finds elsewhere, and content it finds on the web that might be interesting based on similarity to my own info. One could cobble together something that does most of this from various bits and pieces already available, but if someone consolidated all of that functionality into a simple, unified user experience, I'd pay for it.
From The Economist, another interesting infographic ruined by an incredibly distracting background image.
Update: Here's another, only worse: This time, the 'background' graphic is covering part of the chart.
Wordle creates really nice-looking tag clouds. Here is one using the entire text of my dissertation as input:
(Click to enlarge.) Labels: text, visualization
A trendy thing these days seems to be to represent your résumé as a tag cloud. Just for kicks, here's mine:
advanced algorithm application approximate architecture attribute automation award cad cadence california challenge chapter circuit code computer conference configuration congestion constraint contribution cost c++ cs custom data database design developer diagonal discrete dissertation doctoral electronic engineer european feature generate geometry graph heuristic image implemented inc included integrated interconnect international journal layout maintain manager measure member minimal model multiple nominated optimal partitioned patent phd physical placed placement platform precomputing presence problem processing product programming project quality received rectilinear region research routing science senior simplex software staff steiner student system team technical technology thesis tool tree university unix virginia vlsi wiring work wrote created at TagCrowd.com
Labels: text, visualization, work
The other day my daughter was watching me send a text message, and how when two successive letters are on the same key I had to pause a moment until it timed out, and she wondered aloud what was the longest word that did not have two successive letters on the same key. Searching the web, I was surprised not to be able to find an answer, so I fed the 480K-word dictionary on my Linux box through a few
tr and grep commands and a little Python code to find the answer, plus a number of other special words.
Labels: trivia
I just had a post on Slashdot about finding great programmers for the company where I work. My plug for the company got (understandably) dropped, so I'll put it here where hopefully some /. readers will end up.
The company is White Oak Technologies. It's a great place to work, and we're hiring. Be surrounded by really smart people, work on hard, interesting problems, and get better-than-market compensation and benefits. What we do is data exploitation work on very large volumes of data. It's really interesting stuff. If you're really good, exactly what languages you prefer aren't all that important; we have a lot of flexibility in that regard. However, among the technologies we use most are Python and C++. Not to put too fine a point on it, our standards are very high, so bring your A game. We are in the DC Metro area. Due to the nature of the business, telecommuting is not possible. Also, you must be a US citizen and will be subject to a clearance investigation. If you're interested, check out our openings (be sure to tell them how you found us!) or send me email. Labels: work
I saw where someone had posted their academic genealogy, so just for fun I looked up my own. Here it is, working backward: me - James Cohoon - Sartaj Sahni - Ellis Horowitz - George Collins - Barkley Rosser - Alonzo Church.
Via the Beautiful Code blog, a couple of nice sorting results: First, the proportion extend sort, a recursive sort that allegedly outperforms quicksort in both theory and practice. (Warning: Their code is pretty hard to understand, undercommented and obfuscated.) Second, the Python distribution contains a very nice writeup of the sorting routing Tim Peters used to implement listsort. The filename is listsort.txt, and it describes a nicely engineered adaptive mergesort with some elegant tricks to make it perform better on lists that already contain some order.
Labels: algorithms, programming ![]() Labels: funny
We went to Las Vegas last week (my first time there), and the one casino game that surprised me was War. Yes, just like the one you used to play as a kid until you realized that it is boring. I walked past a table, and then doubled back in disbelief that this was what I thought it was. This is about the most brain-dead card game I can think of, far dumber than anything else in the casino (except, of course, the slot machines).
![]() Labels: usability
Years ago, the first time I worked at Cadence, my username was "ganley". I left to go to a startup, where my username was "joe", and when Cadence acquired that startup I kept the username "joe". I quickly discovered that this was a mistake, since spammers send to name@domain for every common name. Only much later did I discover that "ganley" would have been better for another reason: I worked on an open source project, and my username still appears in some of the CVS tags in the source code. It would be much cooler if it was "ganley" instead of "joe" - after all, "joe" could be anyone, right?
Labels: software_development
Consider the problem of choosing a random element from a linked list whose length you do not know. Obviously, you could count the length of the list, then pick a random number between 1 and length, and then walk to that element, but there is a clever way to do this in one pass.
The algorithm is as follows:
You can easily show that when this finishes, choice is equal to any given element with probability 1/n, where n is the number of elements: The probability of setting the ith element to be the new choice is 1/i. The probability of not selecting any of the subsequent elements to be the new choice is the product as j goes from i+1 to n of (1 - 1/j), which if you work out the math ends up as i/n. Multiply the two probabilities together and you get 1/n. Labels: algorithms, programming
If I were inclined to start a company, something that I've always thought about is a company that applies Apple-style design sensibilities to relatively mundane consumer products. My company's first product would be an alarm clock. I've probably owned a dozen alarm clocks in my life, and used a couple of dozen more, and every single one has had an atrocious interface. Even the best of them lacked what I would consider some basic requirements. Few of them leave me confident that I've set the alarm properly and it is actually going to go off when I intended it to. It is difficult for me to turn them off in the dark, much less be able to actually set the alarm in the dark. For those of us not inclined to snooze, I've never seen one with a disable button as readily accessible as the snooze bar, but that kills the alarm until the same time tomorrow. And if there are multiple alarms or weekday-only alarms or such, forget it - the interface is so difficult that you'll probably never use those features. Someday...
Update (May 02007): Hmmm, WidgetStation might be just the thing. Looking forward to its release. Labels: design
A scary article in which the author takes an airline ticket stub out of the trash and uses it to find all sorts of personal data on its owner. I'm pretty paranoid about shredding, but I admit to occasionally considering tossing my airline stub. And I routinely toss the tag that they put on your luggage, and I bet that contains much (all?) of the same data. No more!
I disagree with the author's claim that all of this data collection is bad, however; the data itself isn't the problem, it's the fact that this data is used as authentication of who you are. Experts have long derided security through obscurity, and security techniques that rely on your personal data being secret are doomed in today's world. Labels: privacy
A number of times I've been faced with what I call the "Chinese menu enumeration" problem - you know, one from column A, one from column B, etc. Given an array num of N entries where each entry contains the number of items in that column, enumerate [the indices of] all of the possible meals. Equivalently, list every N-digit number in which the i'th digit is less than num[i]. This is not a hard problem, but it's all too easy to write ugly code for it. My latest attempt ended up like this:
Pretty succinct, but I can't shake the feeling that it could be
Labels: programming
One of the more comprehensive collections of bit-twiddling hacks I've seen.
Labels: programming
Yesterday I was reading Paul Graham's site, and wondered what other Lisp hackers might be around, so I went to one of the Lispier pages and Googled 'similar links.' To my surprise, one of the first hits was my friend Lawrence. That led me to revisit my friend Brad; he was on the programming team with me, and he and Lawrence were roommates (of each other, not of me). Somewhere in here I also lost half an hour to Drew's site; I don't know him at all, but he's a friend of Lawrence's. (I particularly liked the manually-raytraced sphere.)
The good that comes of this sort of jaunt is that Lawrence, Brad, and I are making plans to get together when I'm in California next month; it's been an overly large number of years since I've seen them. Labels: personal
A coworker and I were discussing the relative rarity of people like me, namely PhDs who are great software engineers and who are good at actually getting things done. He conjectured that there would be a strong correlation between those traits and how quickly one finished their PhD. Certainly that fits in my case; I did my PhD in 2.5 years. I'm curious as to how strong this correlation might be, and I'll be collecting data about it going forward.
Labels: software_development
Reader Bennet Yee pointed out that my Lisp in JavaScript interpreter failed on his Y-combinator code, as follows:
((lambda (x y) (x x y))
(lambda (me n)
(cond ((< n 1) 1)
(t (* n (me me (- n 1))))))
4)
< n 2", and thus it terminates before n reaches 0.
This task also led me to revisit the Y combinator, which is really beautiful. For those who don't want to wade through the details, it's a clever mechanism for implementing a recursive call to a lambda (i.e. nameless) function. Labels: lisp, programming
My wife called me into the room the other day because the Numb3rs episode that she was watching had mentioned Steiner trees (the principal subject of my Ph.D. dissertation).
As usual for that show, they got the math half wrong and half ornamented with extraneous jargon, but still it was a tiny thrill to see Steiner trees mentioned on TV.
Annoyed at having to look up, yet again, one of the couple of C++ operator precedence rules that I don't have memorized (this time, whether equality or ternary is tighter), I took the C++ precedence rules from here and formatted them into a 3x5 reference card. Here it is in PDF form, or in Word if anyone wants to tweak it. (My graphic-design skills are fairly meager, so if anyone makes substantial improvements, please let me know.)
Labels: c++, programming
Recently I wrote some code in which a templatized class declared one of its template parameters a friend. This worked fine in Visual C++, and I was surprised to find that GCC gave the rather unambiguous error, Template parameters cannot be friends. I was even more surprised to find that, indeed, this is not GCC being overly picky - it is not legal C++! There are extremely crufty workarounds, but in general it's clearly a bad idea to rely on platform-specific violations of the C++ standard in production code.
Labels: c++, programming It struck me the other day that fuse beads could be a good way to teach my daughter about cellular automata, so we used them to build this snowflake cellular automaton.Labels: cellular_automata, math
An interesting essay about why developers leave. In particular, I found the analysis in the morale section very interesting. This is one place where I think Google really has it right: Giving your people time to work on projects of their choosing pays huge dividends in morale. (Via particletree.)
Labels: software_development
Joel Spolsky's latest essay is on coding standards, and in particular why Hungarian notation is good. I've always liked Hungarian and similar notations, but as I read this article I found myself thinking that these classifications could be not only denoted but also enforced by making them classes instead of just variables of the same type with differently-denoted names. This requires a little finesse with simple types such as int, but it can be done. The result not only denotes the same information as Hungarian (and less cryptically) in the class names, but the compiler even enforces the relationships for you. No need to "watch" for types with mismatched notation (as in Spolsky's essay); it's part of the type system. Which is not to say that I no longer find Hungarian notation useful, but that in many situations and in many languages it should be supplemented (or in some cases replaced) by type definitions.
Labels: programming
Maciej Ceglowski writes an excellent, half-facetious rebuttal to Paul Graham's Hackers and Painters essay. I didn't buy this analogy when I read it, and still don't, and Ceglowski does a great job of enumerating why it doesn't fly. I've always thought of programmers more like 'design and build' contractors. The best of them are both good architects and good builders, and like programmers who are good at both of these things, they are rare. Many more are either good architects and spotty builders (their buildings look nice and function well but are poorly constructed) or vice-versa (their buildings are well put-together but don't flow well or are unattractive), and of course some aren't good at either.
Labels: software_development
I just ran across an Eric Sink article that I've somehow previously missed called The Hazards of Hiring. There are many good points in this article; in particular, he does a good job on one of my favorite points: in his words, The "very best" people never stop learning. ... They know their own weaknesses, and they're not insecure in talking about them. Many people seem afraid to say "I don't know." I love to say that, because it means we've just identified a gap in my knowledge, and almost always, it means that we're just about to fill that gap. That is the single most fulfilling thing in life, work-wise anyway.
Another interesting point in that article is his skepticism toward people with advanced degrees. I see this a lot, and in fact I spend a lot of my time in interviews trying to convince people that despite my having a Ph.D., I am not some sort of ivory-tower computer science researcher. First and foremost, my love is for writing software. Sure, I love to sink my teeth into a really hard CS problem from time to time, but there is also fun to be found in all of the other facets of the software development process, even those that many consider mundane. I really enjoy positions that offer a lot of variety, from hard algorithmic problems to user interface design to library architecture to low-level infrastructure. Labels: software_development
Among the entries in Google's recent usenet timeline is the original Tom Duff post describing Duff's Device, a loop unrolling technique that exploits a surprisingly legal C quirk. I remember seeing this for the first time in a job interview in 1990 or so, when my interviewer wrote the code on the board and asked me what it did. I still think it's quite cool; though compiler optimization technology has probably rendered this unnecessary, it's still a great illustration of the sorts of things that are made possible by the "anything goes" design philosophy of C.
Labels: programming
Somewhat in the same spirit as that last post, another great engineering story, this one from Damien Katz. This story illustrates perfectly why it makes me so crazy when job postings demand, "10 years experience with XYZ." Clearly if you can get a really great person who also has XYZ experience, that's optimal, but if you have to choose, you want a really great programmer (who doubtless can learn XYZ very quickly) over a mediocre programmer with lots of XYZ experience. (Link via Ned Batchelder.)
Labels: software_development
I just love this story of the Macintosh Graphing Calculator. It was part of a project that got cancelled, but its authors continued to sneak into Apple to work on it for free, and eventually got it shipped with MacOS. This is an extreme and wonderful illustration of the dedication a good engineer can have to an important project. I just wish it was available for Windows!
Labels: software_development
Ned Batchelder tells a C++ debugging story, the punchline of which is this:
Labels: c++, programming
Via Ned Batchelder I learned about Darcs, which is a fully peer-to-peer source code control system. This sounds ideal for home hobby work, as it doesn't require a dedicated server machine. Also, it surprised the heck out of me that it's written in the [almost] purely functional language Haskell!
Labels: software_development
I agree pretty much 100% with these Hallmarks of a Great Developer. In particular, the first point ("plans before coding") resonated with me. This is a former weakness of mine, and the area where I've made the most improvement in the past couple of years, since joining my current project. The point is even better made by Michael Abrash in his Graphics Programming Black Book (chapter 10) and by Ned Batchelder in his diamond cutter analogy.
Labels: software_development
Red Hat's Alan Cox on writing better software. There is a lot of good stuff in there, but my favorite quote is, "If it does everything it's complicated, and if it's complicated, it's broken."
Labels: software_development
Some interesting observations from Brian Cantrill on The Economics of Software.
Labels: software_development
In last month's C/C++ Users Journal, there was a clever article that described using templates to write loop constructs that are unrolled at compile time. In my last project I wrote some multiobjective optimization code that was full of loops that went from 0 to 2 or 3; this mechanism would've been great for that. On the other hand, five years ago I highly doubt that all the compilers I supported would have processed this kind of fancy template code correctly.
Labels: c++, programming
Michael Abrash has a series of articles appearing in Dr. Dobb's currently. The subject of the articles is code optimization, but the thing I found most interesting is the code being optimized. It is a software-only rasterizer. I wouldn't have thought it possible (even for Michael Abrash) to get DirectX7 performance in software. But most of all, it's noteworthy that state-of-the-art graphics hardware and drivers still have so many bugs and portability problems that Abrash finds it worthwhile to spend a lot of time and sacrifice two generations off the state of the art in order to avoid them. I certainly feel his pain, and I'm jealous that he is able to focus on a single type of processor and thus achieve this level of optimization.
The other thing that caught my eye was an illustration of how on modern processors, optimization can be extremely nonobvious. The example was a subproblem I've solved before: Given a value x between 0 and 1 and an array of ascending values in the same range, figure out which pair of values x falls between. I used a binary search. It turns out that on the x86, binary search is predicted poorly by the processor's branch prediction logic, while in a linear search every compare but the last one is predicted correctly. A mispredicted branch is very expensive. So, for a small number of bins a linear search is 30-50% faster (!), and the problem size has to reach 64 bins before the improved asymptotic efficiency of the binary search actually beats the linear search in real life. My problem had at most a few tens of bins, and the search was deep in an inner loop, so this would have been useful information to have known. Labels: graphics, programming
Ned Batchelder discusses rewriting
assert conditions to make them more readable. Since one of the purposes of assertions is as documentation, this is good advice.Labels: programming |
|