Newspapers can't go away quick enough. I blogged about the newspapers recently, but I was speaking merely as an indifferent party at the time. I don't read the paper, and so I don't really care if they survive or perish. Now I have a strong opinion. I want them to go away, and I want them to go away soon. Well, this isn't entirely fair, because I'm basing this strong hatred on the actions of a single particular paper... the San Jose Mercury News. Presumably not all papers act in such sleazy, annoying manners, but any that do... I hope they disappear tomorrow. Scratch that, I hope any business at all that acts this way perishes tomorrow.
Ok, enough teasing. Rewind a few months. My doorbell rang, and I went and looked out the peephole. There was a strange kid... and I foolishly opened the door. It was a kid going door to door selling subscriptions to the San Jose Mercury News in an effort to get help going to college. I had actually helped a kid doing the same thing a year or two before, and I didn't mind helping this kid too. The last time I had given cash, but this time I was out. Then, I made my second mistake... I paid with a check. The kid needed my phone number, supposedly so the Mercury News could verify I was indeed helping him out. I reluctantly gave him my number... my third mistake. Somehow, I knew I was making a mistake, and immediately wished I had just closed the door on him. I had half a mind to call him back and write a new check out to him, and let him cash it and keep it, or buy a paper for his school or something. I indicated I didn't actually want the paper, and that he could give it away... I think he ended up giving it to a neighbor nextdoor that wasn't home.
Fast forward back to present time. The whole thing ended up being a scam. I'm sure the kid got some help from the Mercury News towards college, but at the cost of the Mercury News getting my phone number. I got a call shortly after the subscription ended with a request to resubscribe. Damn are they aggressive when calling! It took a few minutes for me to dash her confidence and finally end the call. I thought that was that, but have since received at least two more calls, one coming just this last Saturday. A holiday weekend no less! I would think they would stop calling if I emphatically tell them I am not interested, and even explicitly say I have never read the newspaper, and never intend to. I guess I'm just a number in a big list of possible sources of revenue now. Next time I guess I just need to tell them to make sure I'm off their list. And the next time some poor kid is going door to door selling San Jose Mercury News? Sorry! I will send 'em packing. I don't mind helping a kid go to college, but not if it is just to get my number on a list to cold call every other month.
I think I walked away with another lesson on how to treat your customers. If you might be bugging them, stop and reconsider what you are doing. Your sources of revenue should be from people that love what you are doing for them, not people so annoyed by you that they buy your product just to shut you up. It's definitely a no no to set up a faux charity just to turn around and annoy the contributers.
Monday, November 30, 2009
Tuesday, November 24, 2009
Closures for Java 7: DOA
To start off today's (probably) brief post, I want to quote Stephen Colebourne's blog:
Now, I can't vouch for his facts, but it seems accurate, so I am going forward with the assumption that I'm getting it from the horse's mouth.
That said, I want to state what closures in Java 7 will be if they are implemented as stated above. Can you guess? Yeah... sugar syntax for anonymous classes.
As a Java developer, I don't want to complain. Anonymous classes are a major pain in the ass. Any time I attempt some functional programming with them, I always look back and think... that would be so much more elegant with a foreach loop, or some such thing. Anonymous classes are a lot of syntax for very little meat. Getting rid of the painful parts of that syntax will definitely be a good thing.
As a Ruby developer, these so-called "closures" are a laughing stock. Can anyone really claim these are actually closures? Let's just stop kidding ourselves and call them elegant anonymous classes.
Without control-invocation statements or non-local returns, you can't turn a foreach into a method call with a closure. Without access to non-final variables, you have to either move the variable into the class, wrap it in an object, or do the horrible 1 element array trick. You know... construct a 1 length final array which you can then both get and set the element from within the anonymous inner class... it blows.
If my vote matters, it is for waiting till Java 8 and doing closures right, or giving us a little meat for Java 7 closures. Bare minimum, non-final variables have to be accessible.
JDK 7 closures will not have control-invocation statements as a goal, nor will it have non-local returns. He also indicated that access to non-final variables was unlikely. Beyond this, there wasn't much detail on semantics, nor do I believe that there has been much consideration of semantics yet.
Now, I can't vouch for his facts, but it seems accurate, so I am going forward with the assumption that I'm getting it from the horse's mouth.
That said, I want to state what closures in Java 7 will be if they are implemented as stated above. Can you guess? Yeah... sugar syntax for anonymous classes.
As a Java developer, I don't want to complain. Anonymous classes are a major pain in the ass. Any time I attempt some functional programming with them, I always look back and think... that would be so much more elegant with a foreach loop, or some such thing. Anonymous classes are a lot of syntax for very little meat. Getting rid of the painful parts of that syntax will definitely be a good thing.
As a Ruby developer, these so-called "closures" are a laughing stock. Can anyone really claim these are actually closures? Let's just stop kidding ourselves and call them elegant anonymous classes.
Without control-invocation statements or non-local returns, you can't turn a foreach into a method call with a closure. Without access to non-final variables, you have to either move the variable into the class, wrap it in an object, or do the horrible 1 element array trick. You know... construct a 1 length final array which you can then both get and set the element from within the anonymous inner class... it blows.
If my vote matters, it is for waiting till Java 8 and doing closures right, or giving us a little meat for Java 7 closures. Bare minimum, non-final variables have to be accessible.
Sunday, November 22, 2009
Smart XML Processing with Regexes
Recently, Jeff Atwood wrote about parsing HTML with regular expressions. I want to speak about it briefly, because I came across this issue last week. I gathered from his post that the lesson is to consider your options with an open mind, and only block a possible solution if you really understand the alternatives. Use facts and knowledge to choose your implementation details, not superstition and theoretical best practices. Best practices usually are created for a reason, but that's not to say there's never a reason to turn your head on them.
This post hit home with me because I had an XML file to parse that was over a gigabyte. From this XML file, I needed a very small handful of the data, and it was very regular XML. XML parsing is a solved problem, but most XML libraries I've used would easily choke on such a file.
Instead of even considering attempting to process this data with a normal XML processor, I wrote a simple Ruby script to extract the information. It looped over each line, looking for key parts of the data with lines like:
Then, I processed the key tags and data I was looking for with regular expressions, such as:
The above was done within the if blocks. The key point being regexes would have been too slow alone, so I used the simple indexer method to quickly determine if the line contained something that mattered to me. Then I used the regex to pull the data that I actually wanted.
Can you write XML to break my processing? Of course! The question is... does it matter? And that answer was no. I only need to process this data once, maybe another time sometime in the distant future, but the XML is so regular that I know it will work for all the data. On top of this, if I missed some data, it wouldn't matter in the slightest for my purposes. So, in short, proper XML processing would have severely slowed me down (ignoring all lines that don't contain a keyword is much faster), and it would have produced no real benefit.
I ended up processing all the data in little over a minute or two, and I considered it a huge success. Over a gigabyte of XML to process seemed a rather daunting task initially!
This post hit home with me because I had an XML file to parse that was over a gigabyte. From this XML file, I needed a very small handful of the data, and it was very regular XML. XML parsing is a solved problem, but most XML libraries I've used would easily choke on such a file.
Instead of even considering attempting to process this data with a normal XML processor, I wrote a simple Ruby script to extract the information. It looped over each line, looking for key parts of the data with lines like:
if line["<expectedTag>"]
# deal with this tag
end
Then, I processed the key tags and data I was looking for with regular expressions, such as:
data = line[/<expectedTag>(.+)</expectedTag>/, 1]
The above was done within the if blocks. The key point being regexes would have been too slow alone, so I used the simple indexer method to quickly determine if the line contained something that mattered to me. Then I used the regex to pull the data that I actually wanted.
Can you write XML to break my processing? Of course! The question is... does it matter? And that answer was no. I only need to process this data once, maybe another time sometime in the distant future, but the XML is so regular that I know it will work for all the data. On top of this, if I missed some data, it wouldn't matter in the slightest for my purposes. So, in short, proper XML processing would have severely slowed me down (ignoring all lines that don't contain a keyword is much faster), and it would have produced no real benefit.
I ended up processing all the data in little over a minute or two, and I considered it a huge success. Over a gigabyte of XML to process seemed a rather daunting task initially!
Wednesday, November 18, 2009
Serve Your Users
I'm a bit upset. Some friends and I were planning a trip to San Francisco soon, and a few of them have booked a night at the Sheraton Fisherman's Wharf (don't worry, I will tie this in to software in a bit, trust me). I needed to book a night for me and my fiancée, so I brought up their website. Uh oh! The hotel was booked solid that night. This was bad... what if we have a hard time finding a place? This wasn't what made me mad though... well, besides at myself for not booking earlier.
What if the website wasn't accurate? I dialed up the hotel, just to be sure. It went something like the following (though it's coming from memory, so expect a bit of embellishment):
Me: Hi! Do you have a room available for the night of X?
Them: I'm sorry, I don't see anything available. Is the night flexible?
Me: Well, my friends already booked the night with you, soooo...
Them: I can check the Starwood Hotels, Le Meridien. It is about a mile away. Shall I check availability for you?
Me: Uuuuh, well, my friends are already staying at your hotel. Is there anything nearby that might have a room?
Them: ... It's only a mile away. Shall I look that up for you?
Me: Sure.
... She proceeds to book a night at Le Meridien, informing me of an offer comparable to what my friends had, though I made sure I had a refundable option so I could think it over ...
Ok, so this may seem like pleasant help from the reservations department at the Sheraton, but it's not quite why I'm angry. You see, after I hung up, I first checked how far on the map the 2 hotels were. It ended up being 1.4 miles... not exactly easily walkable for a night on the town. This wasn't why I was steaming though.
I then did a quick search of the nearby hotels to the Sheraton. I zoomed in on Google Maps, and all the hotels nearby were listed right on the map. The Hyatt, a block away. Holiday Inn, a block away. Best Western, across the street. Radisson, across the street. This was when my anger bubbled up. I called up the Best Western and found out that not only was a room available, but I could get the same price my friends got (and which was offered me at Le Meridien). I quickly cancelled the night at Le Meridien, quite thankful I didn't rush into the no refund deal I was initially offered.
Let's be clear, I fully understand where the Sheraton employee was coming from. They may get some kind of commission for redirecting my business to their sister hotel. They want to ensure they are getting my money. What irks me, though, is that I made it clear I preferred to be near my friends, yet she proceeded to push an option to me when she very likely knew full well there were alternatives that would have suited me better. It may be that any of the hotels I saw would do the exact same thing in a heartbeat, but I feel it is a grave mistake.
First, the Sheraton had a great opportunity to turn me into a fan. Had they pointed me to one of the numerous walking distance competitors, I would have remembered that fondly, and told everyone about my experience. Not many companies clearly have your best interests at heart. Instead, I remember it angrily... and tell everyone about my experience.
This is how I feel this story relates to software... well, more about business, but same thing if you are a software company. The way you need to treat your customers is as if your goal is to see their goal achieved in the way that best makes them happy. If that means pointing them to a competitor who would solve their problem better... then happily point them to your competitor's open arms. Don't treat your customers (or potential customers) as if their money is the only thing you care about, like the Sheraton did in this case. Your users will find out you weren't being completely honest, and they will hate you for it. They will speak out and write on some puny but public blog and tell everyone about the experience. Ultimately, your users will find their way to the option that is aligned with solving their problem, not extracting their money.
What if the website wasn't accurate? I dialed up the hotel, just to be sure. It went something like the following (though it's coming from memory, so expect a bit of embellishment):
Me: Hi! Do you have a room available for the night of X?
Them: I'm sorry, I don't see anything available. Is the night flexible?
Me: Well, my friends already booked the night with you, soooo...
Them: I can check the Starwood Hotels, Le Meridien. It is about a mile away. Shall I check availability for you?
Me: Uuuuh, well, my friends are already staying at your hotel. Is there anything nearby that might have a room?
Them: ... It's only a mile away. Shall I look that up for you?
Me: Sure.
... She proceeds to book a night at Le Meridien, informing me of an offer comparable to what my friends had, though I made sure I had a refundable option so I could think it over ...
Ok, so this may seem like pleasant help from the reservations department at the Sheraton, but it's not quite why I'm angry. You see, after I hung up, I first checked how far on the map the 2 hotels were. It ended up being 1.4 miles... not exactly easily walkable for a night on the town. This wasn't why I was steaming though.
I then did a quick search of the nearby hotels to the Sheraton. I zoomed in on Google Maps, and all the hotels nearby were listed right on the map. The Hyatt, a block away. Holiday Inn, a block away. Best Western, across the street. Radisson, across the street. This was when my anger bubbled up. I called up the Best Western and found out that not only was a room available, but I could get the same price my friends got (and which was offered me at Le Meridien). I quickly cancelled the night at Le Meridien, quite thankful I didn't rush into the no refund deal I was initially offered.
Let's be clear, I fully understand where the Sheraton employee was coming from. They may get some kind of commission for redirecting my business to their sister hotel. They want to ensure they are getting my money. What irks me, though, is that I made it clear I preferred to be near my friends, yet she proceeded to push an option to me when she very likely knew full well there were alternatives that would have suited me better. It may be that any of the hotels I saw would do the exact same thing in a heartbeat, but I feel it is a grave mistake.
First, the Sheraton had a great opportunity to turn me into a fan. Had they pointed me to one of the numerous walking distance competitors, I would have remembered that fondly, and told everyone about my experience. Not many companies clearly have your best interests at heart. Instead, I remember it angrily... and tell everyone about my experience.
This is how I feel this story relates to software... well, more about business, but same thing if you are a software company. The way you need to treat your customers is as if your goal is to see their goal achieved in the way that best makes them happy. If that means pointing them to a competitor who would solve their problem better... then happily point them to your competitor's open arms. Don't treat your customers (or potential customers) as if their money is the only thing you care about, like the Sheraton did in this case. Your users will find out you weren't being completely honest, and they will hate you for it. They will speak out and write on some puny but public blog and tell everyone about the experience. Ultimately, your users will find their way to the option that is aligned with solving their problem, not extracting their money.
Wednesday, November 11, 2009
Cautious Development
One of the most appealing features of Test Driven Development for me is that it helps you write code that actually works when you are done. If you are testing at each step, you end up with code that works for all those features that you explicitly tested. This is not to say you will end up with bug free code, of course. Nobody but a pointy haired boss would expect that.
All too often, I see code that is supposedly done, but a cursory run through some simple examples shows earth shattering bugs. Bugs where the basic features being implemented don't even work. What possesses a developer to think something is done when it hasn't been played with for a while, showing some level of completion? I guess we all fall into the trap of simplistic changes that can't possibly cause a problem, only to have some bug crop up specifically because we aren't looking. Sometimes some manual (or even automated) tests would be so tedious to set up that it just doesn't seem worth the effort. But I've seen cases where it's clear the code was not very carefully crafted in any regard, with no reason that it couldn't have been done better.
The most absurd example came from someone I helped interview quite a while ago. I distinctly remember going in and running the beginnings of the code with him. I put in some input and saw some output... all was good. When I returned a while later, the IDE still showed the exact session we had run as much as an hour earlier. Not only did the code not even compile, it had no hope of working even if we could trick the compiler into giving us a green light. Is it a rare quality among developers to actually play with the code as you go along?
I'm not saying you need to exercise the code in a particular manner, just that you exercise it in any manner. Run the code at every step, playing with the new features you are working on, and sanity testing some older features. Write it test first and watch new tests succeed as older tests continue to succeed. Hell, even take a waterfall approach with a big bang batch of code, then furiously cycle through short runs to find and fix a bug. I don't care, as long as when you say you are done, it isn't trivial to find a bug in the code.
Unsurprisingly, I'm also a fan of writing your code in the most cautious manner possible. Some developers seem to like to go in to old code and just run wild in it, tearing things out and replacing them left and right with no regard or respect for something that might be doing the job well, or at least well enough. Sometimes a piece of code can prove to be so obscure and hard to maintain (or even understand) that it makes little sense but to throw it out and start from scratch. That should be more rare than common, though.
When it comes to refactoring, I like to go in and be very careful that the new code preserves equivalent functionality, especially if the code isn't covered by a good suite of tests. Don't go in and replace a whole class with a new class that does the same thing in a way you like better. Instead, move code around and massage it into the shape you need, slowly but surely preparing it for the new feature you need to add. Continually ask yourself if the shape of the code does exactly what was being done before (unless of course you discover a bug). Pretend that if you introduce a new bug as a result of your changes, the entire company and all the users will come and yell at you and deride you for making such a mistake. Worry for every millisecond that you might be making a breaking change.
Care for your code. Envision the code is your lover. Would you do a single thing that might hurt your lover's feelings? Would you want her to stub her toe because you moved the table to the wrong location? Then don't make sweeping changes without being extremely cautious, because you will only guarantee to stub your code's toes. If you treat your code right, she will only get more beautiful, while learning new and exotic tricks.
All too often, I see code that is supposedly done, but a cursory run through some simple examples shows earth shattering bugs. Bugs where the basic features being implemented don't even work. What possesses a developer to think something is done when it hasn't been played with for a while, showing some level of completion? I guess we all fall into the trap of simplistic changes that can't possibly cause a problem, only to have some bug crop up specifically because we aren't looking. Sometimes some manual (or even automated) tests would be so tedious to set up that it just doesn't seem worth the effort. But I've seen cases where it's clear the code was not very carefully crafted in any regard, with no reason that it couldn't have been done better.
The most absurd example came from someone I helped interview quite a while ago. I distinctly remember going in and running the beginnings of the code with him. I put in some input and saw some output... all was good. When I returned a while later, the IDE still showed the exact session we had run as much as an hour earlier. Not only did the code not even compile, it had no hope of working even if we could trick the compiler into giving us a green light. Is it a rare quality among developers to actually play with the code as you go along?
I'm not saying you need to exercise the code in a particular manner, just that you exercise it in any manner. Run the code at every step, playing with the new features you are working on, and sanity testing some older features. Write it test first and watch new tests succeed as older tests continue to succeed. Hell, even take a waterfall approach with a big bang batch of code, then furiously cycle through short runs to find and fix a bug. I don't care, as long as when you say you are done, it isn't trivial to find a bug in the code.
Unsurprisingly, I'm also a fan of writing your code in the most cautious manner possible. Some developers seem to like to go in to old code and just run wild in it, tearing things out and replacing them left and right with no regard or respect for something that might be doing the job well, or at least well enough. Sometimes a piece of code can prove to be so obscure and hard to maintain (or even understand) that it makes little sense but to throw it out and start from scratch. That should be more rare than common, though.
When it comes to refactoring, I like to go in and be very careful that the new code preserves equivalent functionality, especially if the code isn't covered by a good suite of tests. Don't go in and replace a whole class with a new class that does the same thing in a way you like better. Instead, move code around and massage it into the shape you need, slowly but surely preparing it for the new feature you need to add. Continually ask yourself if the shape of the code does exactly what was being done before (unless of course you discover a bug). Pretend that if you introduce a new bug as a result of your changes, the entire company and all the users will come and yell at you and deride you for making such a mistake. Worry for every millisecond that you might be making a breaking change.
Care for your code. Envision the code is your lover. Would you do a single thing that might hurt your lover's feelings? Would you want her to stub her toe because you moved the table to the wrong location? Then don't make sweeping changes without being extremely cautious, because you will only guarantee to stub your code's toes. If you treat your code right, she will only get more beautiful, while learning new and exotic tricks.
Monday, November 9, 2009
View Sanitizing and Micro-Optimizing
Maurício Linhares posted an intriguing response to my recent post about auto-sanitizing Rails views. I was just gearing up to respond via a comment when I realized I could probably turn it into a full post, so here goes my response!
So, first of all, let me applaud Maurício for actually writing some code and sharing it, rather than keeping the discussion academic and merely flinging arguments around the interwebs. I can't say I always do it, but I have the most fun writing a blog post where I show some code to achieve a solution to a problem I am having. He even went the extra mile to create a plugin for his idea.
That said, I must respectfully disagree with his approach. The gist of how he tackles the problem is to sanitize data as it comes in rather than when you are displaying it. He even argues in his blog post that it is "more adherent to the MVC":
He makes a convincing argument that the view layer should not sanitize input. My big problem with this is that you have actually taken very specific view details and moved it into the controller and model layers, contrary to what he is claiming. Namely, you have introduced into the controller/model layers the idea that your data is going to be displayed via HTML. However, what if you want to expose a JSON API later on via the same controllers, but with new views? Now, you will need to unsanitize the data, and resanitize it for JavaScript output! You have inadvertently snuck view information into the database! Your data is pigeonholed as HTML data, and it now takes double effort to use the data in another matter (such as JSON data).
This last point deserves some extra attention. Consider if our transform on the data was a lossy transform. This case isn't, because you can easily unsanitize sanitized HTML, but forget that for a second. For example, let's say we wanted all data to be sanitized and censored, such that words like "ass" and "crap" got changed to "***". If we had a bug that caused "crass" to be changed to "cr***", we have just lost information that is irretrievable. If we saved the sanitizing and censoring for the view, where it belongs, we could always fix the censoring code and our "high fidelity" representation will allow us to now correctly show "crass." Let me quote a Stack Overflow podcast, where Joel explains this same position:
Yes, it is tedious and error prone to use "h" everywhere, but that is the exact same problem I was trying to address in my post. However, I feel training myself to use <%: foo %> over <%= h foo %> is a better muscle memory than marking all input as sanitizing. Let's consider the consequences if you forget to apply the new scriptlet versus if you forget to apply the sanitizing of inputs. If you forget the new scriptlet, you have a new XSS hole that needs to be closed by simply changing "=" to ":" (or alternatively adding a call to "h"). If you forget to use the sanitizing of inputs, you have 2 major problems. You have an unknown amount of XSS exploits (everywhere you display that data, which could be in many places). You also have a bunch of data that is now invalid. You now need to either add sanitizing to all the view locations you output the information (which would be tedious and contrary to the whole point of Maurício's approach), or you need to update all existing records to be sanatized, just before enabling sanitizing of input.
There is another issue with this approach that a new scriptlet tag avoids. By making the sanitize decision in the view layer, you have the option of exactly what you will sanitize. Let's consider a site like a blog or Stack Overflow. In such applications, you want some amount of HTML displayed, though not necessarily on all fields of the model. You might want to whitelist sanitize the blog post, question text, or answer text, yet fully sanitize the labels or tags. Granted, you could update the plugin to allow such complexity, but it will be just that... complexity that will bleed through how you invoke the plugin. You will now need to not only specify which actions or controllers use this sanitizing, but also which parameters are excluded from being sanitized.
All of the above pales in comparison to the biggest sin of sanitizing the parameters, and it is one of my biggest pet peeves. It is one of the key points for why Maurício chose the path he did. Premature optimization.
The argument goes that rather than waste the CPU cycles every time you load the page (which is hopefully a lot), you should waste the cycles once as the input is being passed in and saved to the database. Premature optimization usually rears its ugly head in the form of much more insane choices, like insisting on how you should concat your strings. Thankfully, Jeff Atwood has already done metrics showing us that it doesn't matter.
Is sanitizing as quick as string concatenation? Probably not. I would be willing to bet, though, that it is fast enough for a small website. Why waste extra consideration on it until you have the awesome problem of having too many users?
Let's take a step back. Is pushing View logic into Model/Controller territory even worth the possible performance benefit? If I am going to throw proper MVC separation concerns out the window, it better be for a damn good reason. Allowing us to get orders of magnitude more pageviews might be worth it (if metrics proved that it was the best possible improvement, which is doubtful, but let's consider it). The whole concept is to cache something you are doing a lot by doing it once, before it even goes to the database. Let's extrapolate that concept. Instead of sanitizing first so we don't have to sanitize on every page view, why don't we cache the result of the action invocation itself? For every action/params pair that produces a view based strictly on data in the database, we could invoke the action once and just cache the rendered view for future use. Then, simply blow the cache away and re-render when you change the related database row(s). It is typically much more common to view than to update, so I expect this approach would give significantly better performance benefits than simply avoiding a bunch of sanitization calls.
With some careful thinking, we now have a much better solution to remove all the redundant sanitize invocations... and we've even removed redundant calls to the database, and any other costly algorithms we have done within our actions! All while preserving proper MVC separation of concerns. You can bet that I will explore this space when I have the fortunate problem of having too many users (and I wouldn't be surprised if there are available solutions that match my description).
Sorry for going off so much on your very well-meaning post, Maurício! I think you brought a very interesting possible solution, and it's always great to see code brought to the table. However, I do feel we should all seriously consider all approaches, and fully consider the consequences of the path we choose... not just to this particular problem, but any problem. It's best to drill down early and think about what issues our code may cause for us in the future. Don't consider this an excuse to dwell on issues to the point of failing to release useful functionality, though.
So, first of all, let me applaud Maurício for actually writing some code and sharing it, rather than keeping the discussion academic and merely flinging arguments around the interwebs. I can't say I always do it, but I have the most fun writing a blog post where I show some code to achieve a solution to a problem I am having. He even went the extra mile to create a plugin for his idea.
That said, I must respectfully disagree with his approach. The gist of how he tackles the problem is to sanitize data as it comes in rather than when you are displaying it. He even argues in his blog post that it is "more adherent to the MVC":
Now, as the data is cleanly stored in your database, you don’t have to waste CPU cycles cleaning up data in your view layer (and you can even say that you’re more adherent to the MVC, as cleaning up user input was never one of it’s jobs).
He makes a convincing argument that the view layer should not sanitize input. My big problem with this is that you have actually taken very specific view details and moved it into the controller and model layers, contrary to what he is claiming. Namely, you have introduced into the controller/model layers the idea that your data is going to be displayed via HTML. However, what if you want to expose a JSON API later on via the same controllers, but with new views? Now, you will need to unsanitize the data, and resanitize it for JavaScript output! You have inadvertently snuck view information into the database! Your data is pigeonholed as HTML data, and it now takes double effort to use the data in another matter (such as JSON data).
This last point deserves some extra attention. Consider if our transform on the data was a lossy transform. This case isn't, because you can easily unsanitize sanitized HTML, but forget that for a second. For example, let's say we wanted all data to be sanitized and censored, such that words like "ass" and "crap" got changed to "***". If we had a bug that caused "crass" to be changed to "cr***", we have just lost information that is irretrievable. If we saved the sanitizing and censoring for the view, where it belongs, we could always fix the censoring code and our "high fidelity" representation will allow us to now correctly show "crass." Let me quote a Stack Overflow podcast, where Joel explains this same position:
Spolsky: Here's my point. Uhh, in general, my design philosophy, which I have learned over many years, is to try and keep the highest fidelity and most original document in the database, and anything that can be generated from that, just regenerate it from that. Every time I've tried to build some kind of content management system or anything that has to generate HTML or anything like that. Or, for example, I try not to have any kind of encoding in the database because the database should be the most fidelitous, (fidelitous?) highest fidelity representation of the thingamajiggy, and if it needs to be encoded, so that it can be safely put in a web page then you run that encoding later, rather than earlier because if you run it before you put the thing in the database, now you've got data that is tied to HTML. Does that make sense? So for example, if you just have a field that's just their name, and you're storing it in the database, they can type HTML in the name field, right? They could put a < in there. So, the question is what do you store in the database, if they put a < as their name. It should probably just be a < character, and it's somebody else's job, whoever tries to render an HTML page, it's their job to make sure that that HTML page is safe, and so they take that string, and that's when you convert it to HTML. And the reason I say that is because, if you try to convert the name to HTML by changing the less than to < before you even put it in the database. If you ever need to generate any other format with that name, other than HTML - for example you get to dump it in HTML to an Excel file, or convert it to Access, or send it to a telephone using SMS, or anything else you might have to do with that, or send them an email, for example, where you're putting their name on the "to" line, and it's not HTML - in all those cases, you'd rather have the true name. You don't want to have to unconvert it from HTML.
Yes, it is tedious and error prone to use "h" everywhere, but that is the exact same problem I was trying to address in my post. However, I feel training myself to use <%: foo %> over <%= h foo %> is a better muscle memory than marking all input as sanitizing. Let's consider the consequences if you forget to apply the new scriptlet versus if you forget to apply the sanitizing of inputs. If you forget the new scriptlet, you have a new XSS hole that needs to be closed by simply changing "=" to ":" (or alternatively adding a call to "h"). If you forget to use the sanitizing of inputs, you have 2 major problems. You have an unknown amount of XSS exploits (everywhere you display that data, which could be in many places). You also have a bunch of data that is now invalid. You now need to either add sanitizing to all the view locations you output the information (which would be tedious and contrary to the whole point of Maurício's approach), or you need to update all existing records to be sanatized, just before enabling sanitizing of input.
There is another issue with this approach that a new scriptlet tag avoids. By making the sanitize decision in the view layer, you have the option of exactly what you will sanitize. Let's consider a site like a blog or Stack Overflow. In such applications, you want some amount of HTML displayed, though not necessarily on all fields of the model. You might want to whitelist sanitize the blog post, question text, or answer text, yet fully sanitize the labels or tags. Granted, you could update the plugin to allow such complexity, but it will be just that... complexity that will bleed through how you invoke the plugin. You will now need to not only specify which actions or controllers use this sanitizing, but also which parameters are excluded from being sanitized.
All of the above pales in comparison to the biggest sin of sanitizing the parameters, and it is one of my biggest pet peeves. It is one of the key points for why Maurício chose the path he did. Premature optimization.
The argument goes that rather than waste the CPU cycles every time you load the page (which is hopefully a lot), you should waste the cycles once as the input is being passed in and saved to the database. Premature optimization usually rears its ugly head in the form of much more insane choices, like insisting on how you should concat your strings. Thankfully, Jeff Atwood has already done metrics showing us that it doesn't matter.
Is sanitizing as quick as string concatenation? Probably not. I would be willing to bet, though, that it is fast enough for a small website. Why waste extra consideration on it until you have the awesome problem of having too many users?
Let's take a step back. Is pushing View logic into Model/Controller territory even worth the possible performance benefit? If I am going to throw proper MVC separation concerns out the window, it better be for a damn good reason. Allowing us to get orders of magnitude more pageviews might be worth it (if metrics proved that it was the best possible improvement, which is doubtful, but let's consider it). The whole concept is to cache something you are doing a lot by doing it once, before it even goes to the database. Let's extrapolate that concept. Instead of sanitizing first so we don't have to sanitize on every page view, why don't we cache the result of the action invocation itself? For every action/params pair that produces a view based strictly on data in the database, we could invoke the action once and just cache the rendered view for future use. Then, simply blow the cache away and re-render when you change the related database row(s). It is typically much more common to view than to update, so I expect this approach would give significantly better performance benefits than simply avoiding a bunch of sanitization calls.
With some careful thinking, we now have a much better solution to remove all the redundant sanitize invocations... and we've even removed redundant calls to the database, and any other costly algorithms we have done within our actions! All while preserving proper MVC separation of concerns. You can bet that I will explore this space when I have the fortunate problem of having too many users (and I wouldn't be surprised if there are available solutions that match my description).
Sorry for going off so much on your very well-meaning post, Maurício! I think you brought a very interesting possible solution, and it's always great to see code brought to the table. However, I do feel we should all seriously consider all approaches, and fully consider the consequences of the path we choose... not just to this particular problem, but any problem. It's best to drill down early and think about what issues our code may cause for us in the future. Don't consider this an excuse to dwell on issues to the point of failing to release useful functionality, though.
Thursday, November 5, 2009
Easy Partials in Rails
I created something at my job that has proven to be extremely useful that I think many people could benefit from. I like to call it Easy Partials, and the goal is to make using partials a bit easier (in case the name didn't make that glaringly obvious). The problem is that rendering a partial requires method calls when a little extra work will allow simpler and more readable partial invocation via convention.
You are probably lost, so let me give you some examples. As it stands today, to render a partial you would do:
Which would take the partial "_my_partial.erb" from the same directory and render it.
It works, but what if you could take the whole "convention over configuration" idea and change it into:
A lot simpler, and pretty intuitive, right? Since partials by Rails conventions start with "_", it makes sense to just name a method as such to render the partial. Note that there is no "=" in the scriptlet, the _my_partial method will concat the partial, so you won't obtain a string to render.
We could create a helper method for every single partial we want to render like that, but that's rather cumbersome, isn't it? It's also not very DRY. You won't have exactly the same code, but you will find yourself with a lot of helpers that look rather similar. Instead, let's try overriding method_missing in our application_helper so we can avoid all those repeated helpers!
What we've done here is check on method_missing to see if the method name starts with "_", and, if so, treat it as a partial and concat it. If the method doesn't start with "_", we fall back to the original method_missing implementation.
This works, but what if the partial needs some local variables? Before you would do:
Instead, let's do:
Again, a lot simpler, and quite intuitive. To achieve this, our code will look like this:
Now we are passing the Hash passed in from the view on to concat_partial so we can specify the locals we want to render. We could check that there is no more than 1 argument passed into method_missing, but I prefer not to (feel free to use and improve anything you see here, in case that wasn't clear).
The next improvement we can make is to allow blocks to be passed in. There is no direct correlation for this next part, that I know of, except to build it yourself with helper methods. It was inspired by Ilya Grigorik.
Here is an example of what we will build:
This will allow us to effectively pass in a block to the partial, so we can abstract some of the content in the partial so that the caller can define it. And now for the code:
Within your partial, you will use a "body" variable to output the contents of the block passed in. If you try to use a variable named "body" along with partial blocks, the block will override the body variable, so keep that in mind.
For the final improvement, consider a partial that belongs to more than one controller. What do we do then? Well, how about we maintain a shared directory that we pull from if the partial cannot be found within the local directory. Thus:
So, if you use the _my_partial examples from above within the views for "person_controller", but there is no views/person/_my_partial.erb, it will fall back on views/shared/_my_partial.erb.
Using Easy Partials, you can avoid redundant helper methods, keep html in easy to access ERB templates, improve the readability of your views that use partials, and make your code more accessible for your non-programmer UI designers. Note that you can even invoke Easy Partials from within helper methods.
Special thanks to Ilya Grigorik's post on block helpers, which planted the seeds for Easy Partials.
You are probably lost, so let me give you some examples. As it stands today, to render a partial you would do:
<%= render :partial => "my_partial" %>
Which would take the partial "_my_partial.erb" from the same directory and render it.
It works, but what if you could take the whole "convention over configuration" idea and change it into:
<% _my_partial %>
A lot simpler, and pretty intuitive, right? Since partials by Rails conventions start with "_", it makes sense to just name a method as such to render the partial. Note that there is no "=" in the scriptlet, the _my_partial method will concat the partial, so you won't obtain a string to render.
We could create a helper method for every single partial we want to render like that, but that's rather cumbersome, isn't it? It's also not very DRY. You won't have exactly the same code, but you will find yourself with a lot of helpers that look rather similar. Instead, let's try overriding method_missing in our application_helper so we can avoid all those repeated helpers!
module ApplicationHelper
alias_method :method_missing_without_easy_partials, :method_missing
def method_missing_with_easy_partials(method_name, *args, &block)
method_str = method_name.to_s
if method_str =~ /^_.+$/
partial_name = method_str[/^_(.+)$/, 1]
concat_partial partial_name
else
method_missing_without_easy_partials method_name, *args, &block
end
end
alias_method :method_missing, :method_missing_with_easy_partials
# Concat the given partial.
def concat_partial(partial_name)
content = render :partial => partial_name
concat content
nil
end
end
What we've done here is check on method_missing to see if the method name starts with "_", and, if so, treat it as a partial and concat it. If the method doesn't start with "_", we fall back to the original method_missing implementation.
This works, but what if the partial needs some local variables? Before you would do:
<%= render :partial => "my_partial", :locals => { :var => "123" } %>
Instead, let's do:
<% _my_partial :var => "123" %>
Again, a lot simpler, and quite intuitive. To achieve this, our code will look like this:
module ApplicationHelper
alias_method :method_missing_without_easy_partials, :method_missing
def method_missing_with_easy_partials(method_name, *args, &block)
method_str = method_name.to_s
if method_str =~ /^_.+$/
partial_name = method_str[/^_(.+)$/, 1]
concat_partial partial_name, *args
else
method_missing_without_easy_partials method_name, *args, &block
end
end
alias_method :method_missing, :method_missing_with_easy_partials
# Concat the given partial.
def concat_partial(partial_name, options = {})
content = render :partial => partial_name, :locals => options
concat content
nil
end
end
Now we are passing the Hash passed in from the view on to concat_partial so we can specify the locals we want to render. We could check that there is no more than 1 argument passed into method_missing, but I prefer not to (feel free to use and improve anything you see here, in case that wasn't clear).
The next improvement we can make is to allow blocks to be passed in. There is no direct correlation for this next part, that I know of, except to build it yourself with helper methods. It was inspired by Ilya Grigorik.
Here is an example of what we will build:
<% _my_partial :var => "123" do %>
<p>
Some block content.
</p>
<% end %>
This will allow us to effectively pass in a block to the partial, so we can abstract some of the content in the partial so that the caller can define it. And now for the code:
module ApplicationHelper
alias_method :method_missing_without_easy_partials, :method_missing
def method_missing_with_easy_partials(method_name, *args, &block)
method_str = method_name.to_s
if method_str =~ /^_.+$/
partial_name = method_str[/^_(.+)$/, 1]
concat_partial partial_name, *args, &block
else
method_missing_without_easy_partials method_name, *args, &block
end
end
alias_method :method_missing, :method_missing_with_easy_partials
# Concat the given partial.
def concat_partial(partial_name, options = {}, &block)
unless block.nil?
options.merge! :body => capture(&block)
end
content = render :partial => partial_name, :locals => options
concat content
nil
end
end
Within your partial, you will use a "body" variable to output the contents of the block passed in. If you try to use a variable named "body" along with partial blocks, the block will override the body variable, so keep that in mind.
For the final improvement, consider a partial that belongs to more than one controller. What do we do then? Well, how about we maintain a shared directory that we pull from if the partial cannot be found within the local directory. Thus:
module ApplicationHelper
alias_method :method_missing_without_easy_partials, :method_missing
def method_missing_with_easy_partials(method_name, *args, &block)
method_str = method_name.to_s
if method_str =~ /^_.+$/
partial_name = method_str[/^_(.+)$/, 1]
begin
concat_partial partial_name, *args, &block
rescue ActionView::MissingTemplate
partial_name = "shared/#{partial_name}"
concat_partial partial_name, *args, &block
end
else
method_missing_without_easy_partials method_name, *args, &block
end
end
alias_method :method_missing, :method_missing_with_easy_partials
# Concat the given partial.
def concat_partial(partial_name, options = {}, &block)
unless block.nil?
options.merge! :body => capture(&block)
end
content = render :partial => partial_name, :locals => options
concat content
nil
end
end
So, if you use the _my_partial examples from above within the views for "person_controller", but there is no views/person/_my_partial.erb, it will fall back on views/shared/_my_partial.erb.
Using Easy Partials, you can avoid redundant helper methods, keep html in easy to access ERB templates, improve the readability of your views that use partials, and make your code more accessible for your non-programmer UI designers. Note that you can even invoke Easy Partials from within helper methods.
Special thanks to Ilya Grigorik's post on block helpers, which planted the seeds for Easy Partials.
Wednesday, November 4, 2009
Dev Days: Mobile
Let's revisit Dev Days San Francisco. Last time, I talked about the Microsoft talk, and how I was able to incorporate auto sanitization into Rails. This time, I would like to discuss the three mobile talks that were presented. Just for full disclosure, I am a big fan of Android, so you will likely see that bias come through. I know I felt the bias while taking in the three talks.
iPhone
First up was Rory Blythe talking about iPhone development. Of the three presenters, he was the most at ease speaking in front of a crowd. I'm sure Rory is a nice guy, but he had a swagger that... felt very Apple-esque. He was the hipster Apple guy smugly looking at us and telling us why we should cow down to the almighty Steve Jobs, and why we should be happy to do so. He made the audience laugh the most, most often with flaws in the iPhone development platform that we all know is an issue. Cheekily telling us three or four times that you cannot develop for the iPhone on anything but an Apple machine sure doesn't sit well with an Ubuntu fan like me. I don't think he was trying to sell the platform, though, just give us a taste for what it was like.
Most of the talk was him walking through the steps to develop a simple application in Xcode. It was very visual. Click here and drag over there to hook up a click handler of some kind. Drag this widgit in line with these others. Ultimately, we got to peek at some actual Objective C. If you weren't at Dev Days and you have never seen Objective C, be thankful you didn't have to, because it ain't pretty. He joked about how many places you have to adjust the code just to set up a simple property. As a Rails web developer, it made me want to claw my eyes out. I would never want to develop in such a redundant, cumbersome language.
There was good news, though. He introduced us to MonoTouch, a development environment for producing iPhone applications written in C# with the .NET platform. I'll take Ruby any day, but between the choices of C# and Objective C, it is a no brainer (C# of course... from someone who generally despises Microsoft technology). Too bad I can't use Ruby to develop an iPhone app on an Ubuntu machine. I might consider iPhone development then, though not seriously. If you wonder what is wrong with me... well, I choose an open platform, because I feel in the long run it will win out to a closed platform. I am convinced all the bad press towards the App Store approval process fiasco, and Apple's draconian attitude towards the 3rd party development community will ultimately be the iPhone's undoing.
Maemo/QT
The next mobile talk came from Daniel Rocha of Nokia. He talked about Qt, specifically in the context of Maemo. He was very dry. Dryer than a desert when compared to Rory. Some people just aren't built for speaking... I'm sure I would freeze up and be just as bad, but thankfully I don't seek such torture for myself.
It was a running theme to do some actual development on stage to demo the platform for the audience. The core message of his talk was "Look! Cross platform! Same code for all the major platforms, including mobile!" Alright, C++ cross platform I guess is kinda cool, except it's not that impressive in today's context. Java has been highly cross platform from the beginning, and all the primary scripting languages are quite cross platform as well.
The most impressive aspect of his cross platform demo is that the UI is among this cross-platformness, which I suppose is the whole point. However, he admitted about still needing macros to segregate the platforms now and then. And I really don't believe that a mobile UI should be the same as a desktop UI. With the state of mobile devices, I think it's wiser to develop a UI with a small screen (and touch screen capabilities) in mind. Negative points also for running Linux in a VM inside Windows instead of the other way around.
The Qt IDE he was showing off was also quite unimpressive. It looked like Visual Studio circa 2000. The UI designer looked exactly like an old Visual Studio UI designer, with that grid of dots and all. I got the feeling that anyone used to a good IDE like Eclipse would feel shackled in this thing. Anyone used to an awesome editor like Emacs will... well, an Emacs user would never stoop to a lesser editor of any kind.
The funny thing about this talk is that it was followed by an Android presenter who pointed out how most major manufacturers, except Nokia, were coming out with devices with Android. All I can say is, Nokia is dropping the ball if they stick with this platform over hopping on Android. I saw no compelling reason to develop for Qt/Maemo.
Android
James Yum finished the Mobile hat trick with an Android talk. I wanted to be blown away by this talk, but unfortunately Rory was the only awesome speaker of the three. To be fair, James was up front that he was asked to do this talk last minute, replacing the original Google speaker, who probably would have crushed it.
He started off by showing the "Droid Does" Verizon commercial, which was an awesome commercial (despite not actually showing the device, and despite coming from a company I'm not a fan of). He then read some of the YouTube comments for the commercial, to humorous effect. I really hope T-Mobile comes out with a phone this compelling. According to James and the commercial, it has an 800x480 screen with a physical keyboard and a powerful chipset.
Great start, lousy finish. He followed this up with an uninspiring demo of how to deal with threads in Android, with the specific goal of making a snappier UI. He went from a simplistic but completely incorrect way to do threading to an object oriented way using the Android API. It made developing with Java look almost as bad as Objective C. I'm not exactly a fan of Java (though I work in it almost every day), but it really isn't this bad to work with Java, and specifically Android. I think he would have better shown off the platform by taking a real (though small) application idea, and implementing it on stage. This is what Rory did, and it had a much greater effect than slogging through the most complicated aspect of modern programming you could possibly think of.
Overall, James was far too green for the speech. He was (understandably) clearly nervous, and he was unable to answer most of the questions at the end of his talk. I was silently rooting for him, hoping he could show all the would-be iPhone developers what they are missing, but I was disappointed. Maybe Google will learn from this and keep a good speaker on hand as a backup, should the primary speaker have to drop out last minute.
iPhone
First up was Rory Blythe talking about iPhone development. Of the three presenters, he was the most at ease speaking in front of a crowd. I'm sure Rory is a nice guy, but he had a swagger that... felt very Apple-esque. He was the hipster Apple guy smugly looking at us and telling us why we should cow down to the almighty Steve Jobs, and why we should be happy to do so. He made the audience laugh the most, most often with flaws in the iPhone development platform that we all know is an issue. Cheekily telling us three or four times that you cannot develop for the iPhone on anything but an Apple machine sure doesn't sit well with an Ubuntu fan like me. I don't think he was trying to sell the platform, though, just give us a taste for what it was like.
Most of the talk was him walking through the steps to develop a simple application in Xcode. It was very visual. Click here and drag over there to hook up a click handler of some kind. Drag this widgit in line with these others. Ultimately, we got to peek at some actual Objective C. If you weren't at Dev Days and you have never seen Objective C, be thankful you didn't have to, because it ain't pretty. He joked about how many places you have to adjust the code just to set up a simple property. As a Rails web developer, it made me want to claw my eyes out. I would never want to develop in such a redundant, cumbersome language.
There was good news, though. He introduced us to MonoTouch, a development environment for producing iPhone applications written in C# with the .NET platform. I'll take Ruby any day, but between the choices of C# and Objective C, it is a no brainer (C# of course... from someone who generally despises Microsoft technology). Too bad I can't use Ruby to develop an iPhone app on an Ubuntu machine. I might consider iPhone development then, though not seriously. If you wonder what is wrong with me... well, I choose an open platform, because I feel in the long run it will win out to a closed platform. I am convinced all the bad press towards the App Store approval process fiasco, and Apple's draconian attitude towards the 3rd party development community will ultimately be the iPhone's undoing.
Maemo/QT
The next mobile talk came from Daniel Rocha of Nokia. He talked about Qt, specifically in the context of Maemo. He was very dry. Dryer than a desert when compared to Rory. Some people just aren't built for speaking... I'm sure I would freeze up and be just as bad, but thankfully I don't seek such torture for myself.
It was a running theme to do some actual development on stage to demo the platform for the audience. The core message of his talk was "Look! Cross platform! Same code for all the major platforms, including mobile!" Alright, C++ cross platform I guess is kinda cool, except it's not that impressive in today's context. Java has been highly cross platform from the beginning, and all the primary scripting languages are quite cross platform as well.
The most impressive aspect of his cross platform demo is that the UI is among this cross-platformness, which I suppose is the whole point. However, he admitted about still needing macros to segregate the platforms now and then. And I really don't believe that a mobile UI should be the same as a desktop UI. With the state of mobile devices, I think it's wiser to develop a UI with a small screen (and touch screen capabilities) in mind. Negative points also for running Linux in a VM inside Windows instead of the other way around.
The Qt IDE he was showing off was also quite unimpressive. It looked like Visual Studio circa 2000. The UI designer looked exactly like an old Visual Studio UI designer, with that grid of dots and all. I got the feeling that anyone used to a good IDE like Eclipse would feel shackled in this thing. Anyone used to an awesome editor like Emacs will... well, an Emacs user would never stoop to a lesser editor of any kind.
The funny thing about this talk is that it was followed by an Android presenter who pointed out how most major manufacturers, except Nokia, were coming out with devices with Android. All I can say is, Nokia is dropping the ball if they stick with this platform over hopping on Android. I saw no compelling reason to develop for Qt/Maemo.
Android
James Yum finished the Mobile hat trick with an Android talk. I wanted to be blown away by this talk, but unfortunately Rory was the only awesome speaker of the three. To be fair, James was up front that he was asked to do this talk last minute, replacing the original Google speaker, who probably would have crushed it.
He started off by showing the "Droid Does" Verizon commercial, which was an awesome commercial (despite not actually showing the device, and despite coming from a company I'm not a fan of). He then read some of the YouTube comments for the commercial, to humorous effect. I really hope T-Mobile comes out with a phone this compelling. According to James and the commercial, it has an 800x480 screen with a physical keyboard and a powerful chipset.
Great start, lousy finish. He followed this up with an uninspiring demo of how to deal with threads in Android, with the specific goal of making a snappier UI. He went from a simplistic but completely incorrect way to do threading to an object oriented way using the Android API. It made developing with Java look almost as bad as Objective C. I'm not exactly a fan of Java (though I work in it almost every day), but it really isn't this bad to work with Java, and specifically Android. I think he would have better shown off the platform by taking a real (though small) application idea, and implementing it on stage. This is what Rory did, and it had a much greater effect than slogging through the most complicated aspect of modern programming you could possibly think of.
Overall, James was far too green for the speech. He was (understandably) clearly nervous, and he was unable to answer most of the questions at the end of his talk. I was silently rooting for him, hoping he could show all the would-be iPhone developers what they are missing, but I was disappointed. Maybe Google will learn from this and keep a good speaker on hand as a backup, should the primary speaker have to drop out last minute.
Tuesday, November 3, 2009
Accidental Complexity
I thought I had some interesting insights in my last post about Postgres and why I was going back to MySQL, at least for the time being. So much so, that I posted it on Reddit. Alas, it didn't go over well. It had a lot of downvotes, and most of the comments were fairly negative. I think I'm still right about what I did and why, but I think I can better articulate why I think so in two words:
Accidental Complexity.
It's definitely not a new concept, and I'm sure many people have talked about it before, but in reading the responses on Reddit, it became clear that this is precisely the reason why I switched to MySQL, and precisely why I probably won't switch back anytime soon.
Wikipedia defines accidental complexity as "complexity that arises in computer programs or their development process (computer programming) which is non-essential to the problem to be solved."
This perfectly describes the issues I was having. Think about if I could sudo apt-get install postgres, hook it up in Rails, and be on my way. I could focus nearly all of my time on essential complexity (ie developing features for my actual application), or at the very least, accidental complexity arising from other applications or from my coding choices.
The choice may bite me later on with other accidental complexity that Postgres tackles well, but right now I want to actually get the project to a point where I can start showing people, and maybe start getting some users. After all, getting users is the whole point. To that end, reducing the hassle being caused by the database is a Good Thing.
So, I really think this should be a lesson for all the applications I write, and all the applications you write as well. Make sure your application reduces the accidental complexity you are forcing on your users. If a significant portion of them are spending a lot of their time configuring your application instead of solving the problem they are trying to solve, you have failed them.
Don't take this to the extreme, though... you still have to see the big picture. Who knows, maybe the default user settings is good for the majority of users, and I just happen to fall in the unlucky minority, but comments like "pg_hba.conf can be a bitch, but updating it to allow transparent local access is just a couple of lines away" tells me that I'm not the only person who has had trouble configuring Postgres, even if the configuration changes are small.
Accidental Complexity.
It's definitely not a new concept, and I'm sure many people have talked about it before, but in reading the responses on Reddit, it became clear that this is precisely the reason why I switched to MySQL, and precisely why I probably won't switch back anytime soon.
Wikipedia defines accidental complexity as "complexity that arises in computer programs or their development process (computer programming) which is non-essential to the problem to be solved."
This perfectly describes the issues I was having. Think about if I could sudo apt-get install postgres, hook it up in Rails, and be on my way. I could focus nearly all of my time on essential complexity (ie developing features for my actual application), or at the very least, accidental complexity arising from other applications or from my coding choices.
The choice may bite me later on with other accidental complexity that Postgres tackles well, but right now I want to actually get the project to a point where I can start showing people, and maybe start getting some users. After all, getting users is the whole point. To that end, reducing the hassle being caused by the database is a Good Thing.
So, I really think this should be a lesson for all the applications I write, and all the applications you write as well. Make sure your application reduces the accidental complexity you are forcing on your users. If a significant portion of them are spending a lot of their time configuring your application instead of solving the problem they are trying to solve, you have failed them.
Don't take this to the extreme, though... you still have to see the big picture. Who knows, maybe the default user settings is good for the majority of users, and I just happen to fall in the unlucky minority, but comments like "pg_hba.conf can be a bitch, but updating it to allow transparent local access is just a couple of lines away" tells me that I'm not the only person who has had trouble configuring Postgres, even if the configuration changes are small.
Sunday, November 1, 2009
Postgres... is it worth it?
In working on my side projects, I needed to pick a database. I decided to pick Postgres, and it has been nothing but problems since the beginning. It sucks, because half the problems don't really seem to be the fault of Postgres, though they all seem to stem from the issue that IS their fault.
So, when I first started with Postgres, right off the bat it is a pain. Every single time I need to install Postgres (usually on a new development machine), it is a pain in the butt! They provide a lot of flexibility for database user security. However, the last time I installed MySQL (which is admittedly a long time ago), you set it up, and it asks you for a root user password, and that's it. The rest seemed a snap, at least in the rose colored glasses of the past. With Postgres, each and every install requires me to search to figure out what settings to change in which config file just to get the damn users to be able to connect! Give me some reasonable defaults! If I try to connect to my database locally with a specific user, it would be awesome if the system would give full permissions to any databases I create with that user. Then things like rails would work out of the box, no configuration necessary. I will lock down the system to my hearts content if I need to, but make it JUST WORK first.
That was probably the extent of what I can actually blame Postgres for. It's a big fault in my opinion, though. Software that just works out of the box with no configuration necessary is nirvana. That is why Rails typically feels like heaven to me.
The next issue was probably HostMonster's fault. They give Postgres a back-seat treatment. They run an old Postgres version (I think 8.1), with the excuse that they need to wait till cPanel (their site management web app) upgrades what they support. Ok, I can live with that, but it sucks. When I actually tried to create a database, though, Rails couldn't connect to it! Grrr. HostMonster was on it though, and actually fixed my configuration issue. That was cool of them, go HostMonster! I have been a fan of their support, but that's not what this blog post is about. The issue had been a Rails configuration, I think changing how it was connecting... probably because HostMonster's user connection settings were set up a certain way, contrary to my Rails configuration.
Things were smooth until I needed to create a new database for a new website I wanted to create. It just didn't work. I couldn't create the database, and there was nothing I could do about it... possibly something HostMonster had done, but I didn't want to wait for a fix, so I took the dive and just used a MySQL database.
But I stuck with Postgres on my development machines. Why change? I still wanted it more than MySQL.
Until now.
I just upgraded to Karmic Koala on my netbook. I think it went well, except the upgrade from Postgres 8.3 to 8.4 didn't go so well (which was part of the Karmic upgrade). Both versions are now installed, and when I try to load Rails, I can't connect to the database. Running rake db:migrate even fails. No nirvana, no more Postgres. I'm fed up with the issues. I'm going to the dark side of MySQL, and I'm not going to come back until it is dead simple to set up a Postgres database with Rails.
Postgres intrigued me so much because of the recent drama surrounding MySQL... namely that the Evil Empire (Oracle) now owns them. Perhaps it's a bit superficial, but it's drama I didn't want to get caught up in. I want to know my database system will be open and free throughout the life of my products. However, MySQL makes things so easy that I just can't resist switching back. I will reconsider if it becomes easier to set up a database on Postgres out of the box.
Lesson? Make defaults that work towards achieving your users goals without them needing to dig into any documentation. If it's not easy to start, your users won't start, so you won't have users (at least not as many as you could have).
So, when I first started with Postgres, right off the bat it is a pain. Every single time I need to install Postgres (usually on a new development machine), it is a pain in the butt! They provide a lot of flexibility for database user security. However, the last time I installed MySQL (which is admittedly a long time ago), you set it up, and it asks you for a root user password, and that's it. The rest seemed a snap, at least in the rose colored glasses of the past. With Postgres, each and every install requires me to search to figure out what settings to change in which config file just to get the damn users to be able to connect! Give me some reasonable defaults! If I try to connect to my database locally with a specific user, it would be awesome if the system would give full permissions to any databases I create with that user. Then things like rails would work out of the box, no configuration necessary. I will lock down the system to my hearts content if I need to, but make it JUST WORK first.
That was probably the extent of what I can actually blame Postgres for. It's a big fault in my opinion, though. Software that just works out of the box with no configuration necessary is nirvana. That is why Rails typically feels like heaven to me.
The next issue was probably HostMonster's fault. They give Postgres a back-seat treatment. They run an old Postgres version (I think 8.1), with the excuse that they need to wait till cPanel (their site management web app) upgrades what they support. Ok, I can live with that, but it sucks. When I actually tried to create a database, though, Rails couldn't connect to it! Grrr. HostMonster was on it though, and actually fixed my configuration issue. That was cool of them, go HostMonster! I have been a fan of their support, but that's not what this blog post is about. The issue had been a Rails configuration, I think changing how it was connecting... probably because HostMonster's user connection settings were set up a certain way, contrary to my Rails configuration.
Things were smooth until I needed to create a new database for a new website I wanted to create. It just didn't work. I couldn't create the database, and there was nothing I could do about it... possibly something HostMonster had done, but I didn't want to wait for a fix, so I took the dive and just used a MySQL database.
But I stuck with Postgres on my development machines. Why change? I still wanted it more than MySQL.
Until now.
I just upgraded to Karmic Koala on my netbook. I think it went well, except the upgrade from Postgres 8.3 to 8.4 didn't go so well (which was part of the Karmic upgrade). Both versions are now installed, and when I try to load Rails, I can't connect to the database. Running rake db:migrate even fails. No nirvana, no more Postgres. I'm fed up with the issues. I'm going to the dark side of MySQL, and I'm not going to come back until it is dead simple to set up a Postgres database with Rails.
Postgres intrigued me so much because of the recent drama surrounding MySQL... namely that the Evil Empire (Oracle) now owns them. Perhaps it's a bit superficial, but it's drama I didn't want to get caught up in. I want to know my database system will be open and free throughout the life of my products. However, MySQL makes things so easy that I just can't resist switching back. I will reconsider if it becomes easier to set up a database on Postgres out of the box.
Lesson? Make defaults that work towards achieving your users goals without them needing to dig into any documentation. If it's not easy to start, your users won't start, so you won't have users (at least not as many as you could have).
Subscribe to:
Posts (Atom)