Hey, what smells like blue?: December 2009

Wednesday, December 30, 2009

Blame Yourself First

Why is it that the stupidest, most obvious bugs are the ones you end up spending half an hour on? Did I say half an hour? Make that several hours... maybe even a day or more, sometimes. Consider the following JavaScript code (written with jQuery):


$(function() {
  $(".clickme").click(function() {
    var link = $(this);
    link.css("color", "gray");

    finish = function() {
      link.css("color", "blue");
    };

    $.ajax({
      url: "/my/remote/url",
      success: function(data) {
        // ... process the result ...
        finish();
      },
      error: function(data) {
        // ... process the error ...
        finish();
      }
    });

    return false;
  });
});

I wrote code very similar to this recently, albeit a bit more complicated, but with the same bug. Can you spot the bug? What if I renamed the finish function variable to iAmABonehead? Well, if that doesn't make it obvious, you might want to brush up on JavaScript, if you plan on using it much. Instead of creating a local reference to a closure named finish, I was creating a global reference to a closure named finish that gets overwritten every time the click handler fires. Thus, a quick couple clicks in a row, and you will end up with some permanently gray links.

Here's the funny part, though. I did what every good software engineer does... I saw the behavior in my browser of choice (Chrome at the moment), and thought... gee, that's an odd bug in jQuery with Chrome. My code surely doesn't cause that problem. How can I figure out a way to work around this odd bug?

Thankfully I caught myself, and gave myself a good reprimand:

Self! Don't be such an idiot! You are surely the cause of this bug... drill down a centimeter and you will find it!

So I took this quite excellent advice and decided to give the same bug a shot in Firefox. My reason being that surely our version of jQuery is thoroughly tested with Firefox... if the bug comes up again, it's either jQuery, or me. Except that it probably isn't jQuery.

A quick test indicated that ~~jQuery~~ I was the likely cause of the problem. A couple minutes of a closer inspection of the code and I found the accidental global and fixed my bug.

I guess this is the "Science" aspect of Computer Science. When dealing with bugs, you need to treat it like a (hopefully) repeatable experiment. Form a hypothesis on why you are seeing your experimental results, then conduct further tests and experiments to drill down until you have found and squelched the bug.

For your first hypothesis, don't think about which of your frameworks is likely causing the bug. Eliminate them as the possible issue. The bug is 99.99942% likely to be in your code. When I come across a bug in fresh code, it's weird how the frameworks and platforms I am using are the first my mind blames. It takes all those times being wrong with such accusations before I finally started to stop myself and reconsider which part of my code might exhibit the behavior.

The sooner you blame your own code, the sooner you will find the bug. It helps to take a breather and come back with fresh eyes, too. Usually the bug isn't just in your code, it's something so blatantly obvious that your eyes skip over it and can't see the semicolons for the braces.

Wednesday, December 23, 2009

Better Estimation

Your time estimations suck. I don't care how accurate you are sure they are, they suck. The reason they suck is not because you are a bad developer, or you don't have experience estimating... they suck because you don't know the whole story yet.

It is impossible to predict everything that will come up during the course of your development iterations, and many things will come up that you had no way of predicting. Even if you break down your problem to the tiniest, bite sized pieces, you will have emergencies come up. Your colleague's dog will get sick causing him or her to be out half a day, making it impossible for you to complete the planned integration between your 2 services. A user won't be able to input their Woozit into your Whatzit page because of some overly zealous input validation you shipped last month, and it will take a day to track it down and fix it. You will meticulously estimate each minute detail of how to build the new Bazzle Integration Service, and you will get it all perfectly done, but forget to leave some extra time to do some dry runs at the end, only to realize their system doesn't quite match up with yours the way you expected, and 2 more days will be lost patching up the differences.

I guarantee you, no matter what you do, your estimates cannot be perfect. So stop thinking they can be, and figure out a way to adjust for the unadjustable.

I can't tell you how much I would love to see Joel's Evidence Based Scheduling implemented at my workplace. However, it's always something to tackle next iteration. Beyond the lack of commitment from the whole team, it seems like a lot of work to track the time you spend on everything, even though the results, I am sure, are spectacular.

If you can't commit to such effort, I suggest a more low-tech technique. For every work item that you are a little unsure of, add in a task to figure things out. Break your work items down to the most meaningless, mindless 20 minute tasks you can, and set them as half hour tasks... hour tasks... or even 2 hour tasks. You will have to test that 20 minute's worth of coding, after all. Understanding exactly what you are building will help prevent missing key aspects that might have been forgotten. More things will come up as you are in the details of your implementation, but I find it helps to try to think of as many contingencies as possible.

When you are done with your estimation, however you do it, double the final amount you have. This gives you significant buffer to help you reach your milestone, and it will help address a lot of the unknowns that will crop up between now and the milestone. Then, adjust this 2x multiplier each iteration based on how long it took the last iteration. If your initial estimates for the last iteration was 1 month of work, and you doubled it to 2 months, but you finished in 1.5 months, then multiply the next amount by 1.6, to be safe. Keep track of your multiplier at every milestone, and over time you should be able to develop a multiplier that works for your team to hit almost every milestone.

This is a much cruder form of Evidence Based Scheduling, but consequently, much easier to employ. It may not result in perfect results, but if you are missing every deadline, it should help better than continuing down the same path over and over and over, like a broken record.

Monday, December 21, 2009

Protect Your Environment, Part 2

Last time I talked about separating your environment, and automating it such that almost nothing needs to be customized outside your source control. Today, I want to expand on that thought a bit and specifically talk about your build environment. If you aren't using a build environment like Cruise Control, you really should be. By automating your build, you can quickly detect when you've changed code that causes a regression test to fail, or if you've otherwise unintentionally broken things.

I'm not really trying to espouse the benefits of an automated build system, though, so I'm going to assume you have one set up. The next step is to protect your build environment.

With the build environment, it's especially important to keep everything in source control, and contained within the root directory. If you can't have 2 parallel checkouts of your source simultaneously running your build without failing in some way, you are doing it wrong.

This means a few things. Primarily, do not depend on external files or libraries. If your code looks for a file in /usr/share/myconfig.properties, then move that file to within your root checkout and change your code to point to that properties file with a relative path instead. Commit every jar or other runtime library into source control, and point your build to load the files locally.

What this gains you is flexibility to have parallel development. You can have a branch that represents what is currently deployed, and a branch that represents your current development. You can update that external jar you depend on in your current development branch and it won't affect your currently deployed branch. This means you can have 1 build server constantly running your tests on both branches, and neither will need to worry that they share the same file system.

If you have any kind of branching going on, you run a much higher risk in hitting build issues if you depend on something outside what is committed to source control. If you have a test that needs to look for a file in a special location on your file system, there will be a problem the moment that file diverges between your branches.

If you must depend on some custom environment variables, prepare those environment variables within the build. Just like diverging files, it's entirely possible a separate branch will need to tweak those variables. Once you need to, your other branches will be hosed.

So, once again, once you've prepared your environment to be bulletproof, protect it vigorously. Don't let your build depend on anything outside what is committed to your source control repository, and your automated build server will thank you.

Friday, December 18, 2009

Protect Your Environment

Recently, I have been thinking of the environment a lot. Not global warming. Not the O-Zone disappearing. Not the ocean. Not the humpback whale or the spotted owl. I'm talking about your build environment. Your development environment. And, of course, your production environment.

These environments need protection just as much as our physical environment around us. Well, I suppose our physical environment should matter more, but as someone who lives on the computer, my computer environments matter a whole heck of a lot to me.

I'm here to tell you to set up an environment that automates just about everything you do, store it in your source control system, and then protect it vigorously. The moment someone strays and tries to let strands of wild growth choke your environment with custom tweaks, special path settings, and other such nonsense, trim those vines and get back to a clean garden. It's really not that hard.

I can't vouch for any fancy tools that manage this stuff automatically, because thus far I've stuck with living within my basic build environment. I've heard of tools that can manage your environment for you, but sometimes simple will work just as well. At my day job, we use Ant to run our builds, and little by little I've been molding our kudzu of a system into a manageable environment that requires little more than a subversion client, and a few additional packages. Ant can manage the classpath, so there's really no need to keep throwing jars into your shell's CLASSPATH variable.

My most recent addition was a simple custom Ant task to mimic Rail's environment setup. I created an environments directory and stored all the custom files that differ between our production environment, development environment, and any environment in between. It will hold custom properties files that are loaded at runtime, and Tomcat's context.xml so we don't have a local diff pointing to our development database. Within the environments directory is a directory for each environment, and the ant task will figure out which environment you are using, then copy over files from the appropriate environments directory to the correct location. This is much like Rail's environment.rb, with the different scripts in the environments directory.

A colleague recently set us up with Rails migrations, and I can't express how direly we needed this. It is important to version control all your database changes, but if you don't have a very orderly process to migrate between one release to the next, you are in for a world of hurt. Not knowing what has been run on your live database means you might miss something that could have drastic consequences with the code you are releasing along with it. Forget an important index and your queries will grind to a halt. Forget a key table, and your code could fail in an unexpected manner that might trigger other unintended failures, create corrupted data, or lose unrecoverable data.

By keeping your environment consistent, you can be more confident that deploying your new code will work exactly as you tested. By automating your environment, removing all of those special files you need to move to the right location or custom environment variables you need to tweak, you make it a lot easier to work within your team. You also can then bring in new colleagues and have a new development environment up and running for them faster.

The moment you tweak some kind of configuration that everyone in your team should do, automate it and put it in source control. If your tweak should only be in development or test, set up a simple environment switch in your build system, and grab the right file automatically. The production version and the development version should both be checked in.

So there... protect your environment. Who wants to be working in a jungle, anyways?

Wednesday, December 16, 2009

A Simple Ruby Pattern

By being a multi-paradigm language, Ruby provides numerous possible styles to write your programs with. I am a big fan of metaprogramming in Ruby, and one style strikes my fancy especially. By defining methods in your Class instances, your subclasses start to act like a domain specific language. You end up setting up a structural framework for how your subclasses behave, then declare the behaviors specific to each specific subclass.

To illustrate this, consider a simple calculator application. You could tackle such an application any number of ways, but we can come up with an interesting solution using class methods.

For starters, we need our base CalculatorBase class, which will provide the plumbing for dealing with input, handling if the input is numeric versus operations, and overall flow. I am not interested in these details for this post, so I will leave them as a homework problem for you to play with. Don't worry, it's fun!

First, your base calculator needs to expose a way to define operations:

class CalculatorBase
  def self.operation(op, arity, &block)
    define_method(op.to_sym) { |*args|
      unless args.size == arity
        raise "Wrong number of arguments"
      end
      block.call *args
    }
  end
end

From this, you can declaratively define what operations your calculator may support:

class Calculator < CalculatorBase
  operation(:+, 2) { |x, y| x + y }
  operation(:-, 2) { |x, y| x - y }
  operation(:/, 2) { |x, y| x / y }
  operation(:*, 2) { |x, y| x * y }
  operation(:sin, 1) { |x| Math.sin x }
  operation(:cos, 1) { |x| Math.cos x }
  operation(:tan, 1) { |x| Math.tan x }
end

Now, you can invoke operations with ease:

calc = Calculator.new
calc.+ 2, 3
calc.sin 3.14159

However, I feel we can do better than this. There is a lot of repetition going on with the arity. To make our calculator implementation more domain specific, let's allow binary and unary operators to be defined easier:

class CalculatorBase
  def self.operation(op, arity, &block)
    define_method(op.to_sym) { |*args|
      unless args.size == arity
        raise "Wrong number of arguments"
      end
      block.call *args
    }
  end

  def self.binary(op, &block)
    operation op, 2, &block
  end

  def self.unary(op, &block)
    operation op, 1, &block
  end
end

With these new methods, we can make our operation declarations a bit easier:

class Calculator < CalculatorBase
  binary(:+) { |x, y| x + y }
  binary(:-) { |x, y| x - y }
  binary(:/) { |x, y| x / y }
  binary(:*) { |x, y| x * y }
  unary(:sin) { |x| Math.sin x }
  unary(:cos) { |x| Math.cos x }
  unary(:tan) { |x| Math.tan x }
end

This isn't a particularly difficult approach in Ruby, but I really like the results whenever I am able to employ it. It can remove a lot of repetitive method declaration, and make your main class very clearly declare the behaviors it employs.

Of course, with great power comes great responsibility. It is easy to obfuscate the behavior you are creating with this pattern. Used appropriately, you can design an API and then speak the language of your domain in a much easier to comprehend fashion.

Sunday, December 13, 2009

Common Java Implementations

Java makes a lot of tasks more difficult than they should be. For example, checking if 2 objects differ can be cumbersome when you take into consideration that values could be null. Consider the following example implementing equals for a Name object (yes, some of that duplication could be simplified, but ignore that for now):

public class Name {
    private String first;
    private String middle;
    private String last;

    ...

    @Override
    public boolean equals(Object other) {
        if (other == null || !(other instanceof Name)) {
            return false;
        }

        Name otherName = (Name) other;

        if ((first == null) != (otherName.first == null) {
            return false;
        }

        if ((middle == null) != (otherName.middle == null) {
            return false;
        }

        if ((last == null) != (otherName.last == null) {
            return false;
        }

        return first.equals(otherName.first) &&
               middle.equals(otherName.middle) &&
               last.equals(otherName.last);
    }
}

Now consider the task of reading all the lines of a file:

File file = ...;
BufferedReader br = new BufferedReader(new FileReader(file));
String line = br.readLine();
List<String> lines = new ArrayList<String>();

while (line != null) {
    lines.add(line);
    line = br.readLine();
}

Maybe you need to build a string that is a colon delimited list of the items in a list:

if (list.isEmpty()) {
    return "";
}

String result = "";
result += list.get(0);

for (int i = 1; i < list.size(); i++) {
    result += ":";
    result += list.get(i);
}

return result;

Well, whenever you start cracking your knuckles, preparing to dig in and write a bit of code that you feel should be easier, stop. Take a deep breath. Consider who might have solved this problem first. It has likely been done before, so you can bank on another's implementation, with the benefit of it being thoroughly tested and likely highly performant.

In particular, the Apache Commons projects has a suite of useful tools to bank on. The equals example can be simplified to the following (thanks to the lang commons project):

@Override
public boolean equals(Object other) {
    if (other == null || !(other instanceof Name)) {
        return false;
    }

    Name otherName = (Name) other;
    return new EqualsBuilder()
            .append(first, otherName.first)
            .append(middle, otherName.middle)
            .append(last, otherName.last)
     .isEquals();
}

While the reading of lines can be simplified to (thanks to the IO commons project):

File file = ...;
List<String> lines = (List<String>) FileUtils.readLines(file);

Besides the Apache Commons projects, you could try out the Google Collection Library to solve the colon delimited list:

return Joiner.on(":").join(list);

You may also want to browse Google's Guava Libraries project.

Whenever these libraries can be applied, you can be sure to cut the amount of code you are writing to a fraction of what it would have been, and likely a lot easier to understand and maintain.

Thursday, December 10, 2009

Don't make V1 TOO Crappy

Avery is a company that provides some... paper needs. I don't know all of what they provide, but I know they provide labels. I know this because my fiancée and I were printing out labels for our upcoming wedding. They provide an online tool that could be... very good. Instead, it is... so so.

They provide web tools to construct a PDF which can be printed on to your labels. This is an awesome idea because it is a very portable way to print to their paper products, and could provide highly customized behavior geared towards the type of paper you are printing to. Labels, in our case, could have some really cool software designed specifically around producing the exact labels you want on the specific Avery label product you purchased. If Avery wanted to be a leader, they could even support their leading competitor's products, so when you think of a good experience for your printing needs, you think Avery. This means you might be more likely go for the Avery products the next time you wanted to buy some labels, if they can leave that good of an impression on you. Nevermind the ad revenue they could rake in on top of that.

There is no reason this can't work on any browser with any operating system, right? Well, that is their first failure. The tool (which seemed to be primarily built with flash) wouldn't work on Chrome or Firefox in Ubuntu. I see no compelling reason why it had to be built with Flash, except that it probably was easier to develop their customized UI. Well, I don't believe Flash really gains you much there, but I can understand the thinking that it does. Regardless, some of the controls just failed to respond in my browsers. Namely, a checkbox to toggle the first row in a mail merge as header names vs a separate data row wouldn't respond. Also, the button to add a new span of text on the label didn't respond. Ultimately, I was forced to use IE within my Windows VM, as disgusting as that was. Firefox might have worked on Windows (and maybe even Chrome), but after 2 failed browser attempts, I wanted to go to the crappiest browser that too many unskilled developers still seem to target exclusively.

Ok, force me to use a shitty browser to get my work done... that's a huge mark against you, but if the rest of the user experience is just mind blowingly awesome, I can forgive you. But seriously... I have to have a blown mind by the time I'm done for me to forgive you. It's just too easy these days to write cross browser HTML and JavaScript to excuse the lack of portability. This is not an application that screams the need for Flash, so I think it's an indication of a lack of talent and creativity that they resorted to Flash.

Moving on, I ran across another bug that showed the lack of polish they put into this product. They have a nice mail merge feature that allowed me to upload an Excel spreadsheet and use the rows as addresses. This is a must have for printing labels, but also one of those features that makes you happy to be using a computer instead of doing this stuff before our digital age, like attempting homebrew calligraphy with a ballpoint pen. So, they had a nice feature that I already mentioned. You could mark the first row as the row specifying your field names, effectively starting your data on the second row. Neat! It even worked! Except... for our recent batch of labels, when selecting fields to place in the label, only the first field was showing, and truncated as "Mr." That's odd. We tried dropping the header row, and the problem still showed up. I guess I should add that the first column value of the first row was something like "Mr. & Mrs. John Doe." It dawned on me that maybe these mindless developers were so bad that a simple & character could screw up the mail merge. I moved the row down and put in a more innocuous row at the top. Sure enough, it worked smoothly! So, let me get this straight... a mail merge that could have completely arbitrary data in it breaks down because of a simple & character???!?!? I shudder at the thought of what security holes might be present in this application.

Ok, I have saved by far the most egregious issue I have with their application for last. Their login process. Now, I can't for the life of me understand why they need any of my information to let me generate a PDF. However, they had multiple required fields, including my name and email. Why do you, Avery, need this information to construct a PDF to help me utilize your real product? It baffles me. Well, I gave the information, and then had to go back and start again due to some of the issues above. I figured this process had implicitly registered me, since the form had looked like a registration form. When I went back and attempted to log in, I discovered that this was not the case at all. That irked me, so I decided to register. And, FYI, the registration form looked exactly like the form to just use their web tools directly... just with an added password field. I registered and then clicked the link to get back to printing labels, and guess what? I got the form again to get my name and email to start the label making process again! Are you KIDDING ME?!?!?!?!!!!!

In what could have been an amazing experience, Avery thoroughly dashed my hopes that more companies are starting to understand that cross platform web tools are the future. Instead of being a leader in the specialized paper printing business, they have made me shudder at the incompetent developers that they likely hired to do this job. There were nuggets of a really cool product, so there must be some smart people there... but they are likely drowning from a few really bad apples.

I think this kind of disproves Jeff Atwood's recent post that you should release a crappy version 1 and iterate. Or at least, it reinforces the thought that he included that you shouldn't release total crap for version 1. I guess Avery could redeem themselves in my view if they are iterating and actually listening to good feedback. If I come back in a couple months, and all of these issues are somehow tackled... I could forgive these transgressions. I seriously doubt it will be the case though.

I think Avery is stepping in the right direction with their online tools. They have a lot of potential there. Unfortunately, their execution makes me embarrassed as a fellow web developer.

Friday, December 4, 2009

A Few Worthy Bytes

Bandwidth is a lot cheaper now than it was 10 years ago. Dialup is a dying breed. So please, please, please don't craft your HTML to save every last byte. It is a lot more worthwhile to make your code readable and immediately intuitive than to prevent an extra 10 bytes from going to the user. If you find that you are causing huge downloads of extra HTML, and it is becoming an issue based on real statistics... then start looking into addressing the issue.

For example, consider the following snippet:

<a href="/path/elsewhere"><%
  if @condition
    %>True text<%
  else
    %>False text<%
  end
%></a>

It may not look that bad alone, but when you try to save every byte, the result months from now will be a garbled mess that is difficult to comprehend. Instead, ignore those extra bytes and produce the following:

<a href="/path/elsewhere">
  <% if @condition %>
    True text
  <% else %>
    False text
  <% end %>
</a>

If outputting all that extra useless whitespace makes you feel too icky, Rails has an option for you. If you close your scriptlet tag with a dash, as "-%>", Rails will strip some of the whitespace.

<a href="/path/elsewhere">
  <% if @condition -%>
    True text
  <% else -%>
    False text
  <% end -%>
</a>

Just remember that someone has to maintain the code, and making code more difficult to comprehend will end up costing you more than the bandwidth you saved.

Wednesday, December 2, 2009

Free and Loving It

I have a soft spot for Open Source Software. When I compare two software products, I will almost always lean towards the Open Source alternative, if it can accomplish my task with some amount of pleasure.

Just tonight, I was working with Open Office with my fiancée. We ran into several kinks along the way, ultimately costing us the night. We couldn't finish our task, though I mostly blame my procrastination.

In particular, we were trying to print out some labels, given a spreadsheet of addresses. I had no problem finding howtos and working through the problem. However, we ran into small problems here and there. For example, I couldn't create a database out of a spreadsheet right away because Ubuntu didn't package the Database portion of Open Office along with the rest. I then innocuously named our database SomeName_12.1.2009. When trying to print, this ended with an error that it couldn't connect to the database. Renaming the database to SomeName_12_1_2009 solved the problem. Something we could figure out easily enough, but not something I would expect an average user to think of (nor do I think they should have to think of it). She became frustrated, rightfully so, and ultimately exclaimed that she's going back to Windows (for her Office needs).

At this point, it dawned on me that I don't mind giving Open Source the benefit of the doubt. The hacker in me enjoys figuring out how to work with the software (as long as what I want to do is possible, and reasonably easy). The hacker in me also appreciates that the project was built by people that are likely much more passionate about writing software, and much more passionate about getting something useful to people (instead of making a bunch of money). This is not to say there aren't people in the proprietary world that love software and want to deliver awesome stuff to people, but I just can't identify with that crowd of enthusiastic developers as much.

This is also why I foam at the mouth when I read comments like Jeff Atwood's:

as predicted, Google's "let's copy how Microsoft does phones, but open source!" model is a fail: http://is.gd/589U8

I've read the article he links to, and I consider it complete bullshit. I have a G1, and I love it. I have played with the Droid, and I drool over it. I know several people that have one (or some other Android powered phone), none are unhappy about the pick. Jason Calacanis of This Week in Startups (among many other things) has commented on his show that he loves his new Droid. Browsing some of the comments on that negative post, I see several that point to rogue processes as the likely culprit of the device slowdown discussed in the article. I've found this to be true of my G1. Sometimes I will discover my battery drained much faster with little reason during the day, or it will become extremely sluggish. Both cases I've had more than enough reason to believe it was a rogue process from something I had installed. With greater power comes greater responsibility.

In the Linux side of things, I have become far too attached to Emacs and the powerful command line based applications that I would never willingly go back to a Microsoft prompt. The Free world gets me, and I get them. I tend to believe in live and let live sort of philosophies, which has no room for restrictive licenses and the likes of DRM. Software patents scare me because I want to be able to develop anything I want, without having to worry that someone else may have already thought of the idea and patented it. I also don't care if someone takes my ideas and tries to make them better. I may be a bit envious, but I firmly believe the meat of a product is in the execution, not the imagination. Ideas are a dime a dozen, but passion for your users and the desire to develop something of quality and value is truly rare.

I don't understand the Microsoft world, and I don't want to. Which world do you identify with, and why?

Hey, what smells like blue?