codelord.net

Code, Angular, iOS and more by Aviv Ben-Yosef

Using Chef to Automatically Configure New EC2 Instances

| Comments

This is a follow up post to my post about using Puppet to get the same result. In the comments to that post I was told by a few people that chef can make my life easier and I decided to give a try. Here’s what I came up with.

In this post, as in the previous one, our goal is to be able to start a new EC2 instance with one command, which will in turn be created and started with Apache running.

First of all, instead of having to set up our own server to tell the newly created instances what to do, we are going to use a hosted chef server on Opscode’s server. The hosting is free for 5 nodes, and so you can try this out without having to pay them. Go to Opscode’s site and register a new user, then also add a new organization.

On our system, we need to start by installing chef. You will also want to install the dependencies needed to make chef talk with EC2 (these are not installed automatically when installing the gem because they’re optional):

Now, we need to setup a chef repository. This repository will contain our cookbooks (libraries that contain recipes, which are scripts for doing stuff, like installing apache) and roles (which map recipes to nodes), among other stuff. To get it run:

In the repository create a .chef directory. Now back on Opscode’s site, you need to download 3 files: your organization’s validator key, your user’s key and a generated knife.rb. Once installed, copy them all to the .chef directory:

These will be used by the new instances to connect to Opscode and identify themselves as truly being created by you (this saves us from having to hack an awkward solution for this to work on Puppet). Add to your knife.rb file your AWS credentials:

We will now fetch the apache2 cookbook, which will allow us to install apache on our instances by adding a single configuration line. To download an existing cookbook, do the following:

You can see what other cookbooks are made available by looking around here. Now, we’ll create a role for our instances. Create the file roles/appserver.rb with this data:

And to update our Opscode server with the new cookbook and role:

We’re getting really close now! You should have a security group define in AWS that has port 22 (SSH) open, for knife to be able to connect to it and configure it, and port 80 (HTTP) for our Apache to be available. I called mine “chef”. You will also need to decide with AMI (image) to use, you can find a list of AMIs supplied by Opscode here. And now, to create an instance with one command line, as promised:

This will take a while, as knife will create the instance, connect to it, install ruby, chef itself, apache etc. Once it says it has finished simply copy the public DNS of the newly created image (it should be printed once knife finishes) and open it in your browser. My, what a sense of accomplishment one gets from seeing the string “It works!”

I find this a lot easier, cleaner, stream-lined and fun. I’m still learning the ropes with chef, but it has already surprised by being easy to change, being completely git-integrated and by Opscode’s fast support (even for non-paying customers). You can dig further in these links.

You should subscribe to my feed and follow me on twitter!

Fake It Till You Make It - Team Edition

| Comments

Fake it till you make it is a known pattern in Test Driven Development implementation, which means one writes code that acts like it knows what it’s doing in order to know what it’s doing. This is a powerful technique and I’ve already written how using the same trick on the individual scale can help you make your team better.

I just recently realized that I had already seen this principle applied to a whole team which then caused a whole department to follow suit.

Back in 2005, I had the luck to join a particularly interesting team. Hanging around the section the team was part of clearly showed that all other teams regard that specific team (let’s call it A Team) as a highly skilled team. People said they were the XP (Extreme Programming) team, and were generally looked at as an example of how a good team should work.

After joining the team I got a look from the inside of what was really going on. All the developers were highly talented, but being “The XP Team”? Hah! 2 guys have read Kent Beck’s (amazingly awesome) Extreme Programming Explained and simply started pairing and writing automated unit tests before the code.

Simply starting with those 2 small parts of the XP way of doing things got them improved results which then got the rest of the section interested. By simply saying they were going to try that XP thing and saying it made their lives better, the A Team got the ball rolling for the whole section without never even trying to start an Agile Transition.

And this wasn’t a one trick poney! About 2 years later, the same thing happened with Scrum. One teammate read a good intro to it (back when it was still a free PDF), told the rest of the team which then decided to give a try. After a few sprints of seeing how organized standup meetings and the like actually helped our process we decided to keep it.

We didn’t try to “get everyone to realize this is the best way”. Some people happened to come inside our room during standups, or see the scrum board. Those alone got people interested and from then on again, A Team got the section to advance nicely.

This is a marvelous story that I only now realize how rare it is. Simply because the team looked to the rest of the section like they knew what they were doing it got all of them to try agile without having to break down walls or bust open doors. Sometimes just doing what feels right is enough.

Fake it till you make it is just another way of saying “If you build it they will come”!

You should subscribe to my feed and follow me on twitter!

You Owe it to Yourself to be Old-School

| Comments

I love watching House. My favorite episodes are those where he manages to debug an illness not by knowing an obscure desease, but by having the holistic knowledge of how the body works and thus being able to deduce the real problem.

I find this correlates very much to a set of tools and knowledge a lot of coders are missing that has tremendous value. Joel Spolsky wrote years ago that developers should learn C in order to have a thorough understanding of their environment. I actually think this should be taken a few notches further.

Learn C and some systems programming and you have the ability to grasp basics of most tools you use. How can you spot and truly understand memory leaks without having to manage memory allocation by yourself?

What would you do if some code you wrote or application you use suddenly simply blurts out it has a connection error? Or the Apache server you’re installing is acting up on you? My #1 power tool for these situations is simply opening wireshark and look at what goes through my wire. Learn the basics of TCP/IP and you’ll be able to debug most network problems swiftly.

And don’t get me started on using the shell. No matter what you think, having shell-fu pays off daily. Any text manipulation you’re thinking of, most simple processing tasks – you can whip up a oneliner to do it in less time than most IDEs take to start up.

And the reasons just go on and on. Reading important functions from the Linux kernel will help you understand why Java suddenly won’t fork child processes. Knowing how known security issues work (injections, buffer overflows, etc.) is the only way for you to catch security mistakes at the drawing-on-the-board stage and not at the shit-the-DB-is-stolen stage.

I don’t care if you’re doing Rails and never need to see the outside of a pointer. There’s nothing like having the holistic grasp of things to help you solve problems quickly, a-la Dirk Gently. All these points I’ve made in this post? All real problems solved in the last couple of months with some old-school chops.

Do yourself good – read K&R for some C understanding. Read the first chapters of TCP/IP Illustrated. Read Linux Kernel Development (3rd Edition) for a nice walk-through of the interesting parts. This knowledge won’t get obsolete anytime soon. Can you say that about your favorite framework?

You should subscribe to my feed and follow me on twitter!

Stop Wasting My Code

| Comments

During my service in the army I had the opportunity to move around some electronic equipment from place to place. A lot of it was pretty old (and by that I mean it predates me), but worked perfectly where it was. We had systems running for decades without a problem, but once we unplugged them and moved them to a different room they went dead.

Over time we’ve identified this phenomenon and simply noted that things that aren’t in use stop functioning. It used to puzzle me, but eventually I came to accept this. What still is hard for me to accept though, is the fact that this is exactly the same with software as it is with hardware, if not worse.

I thought I learned this lesson a few years ago, after reading the Pragmatic Programmer and having it hammer YAGNI and KISS to my head, but I keep getting surprised every time I find out that I’ve just done it again.

Actually, learning Git has made this problem rear its ugly head again. Git makes it easy to write up some code and then keep it somewhere. I’d either stash some changes or keep a side branch with some work I started. The really bad part is adding this code to production code, simply because it’s there. The problem is that code gets stale if it’s not really used, and fast.

I can’t think of a single case where we added code before it was actually needed and got something good out of it. Fact is, every line of code you write before there’s a real use case or actual need for is just you guessing. And we’re mostly guessing anyway about stuff we actually need to get done, so why add more ambiguity in there?

As I read in Growing Object-Oriented Software code isn’t sacred simply because it’s there, and it won’t take as long to write it again if you’ll need to. Don’t be afraid to delete code that isn’t actually needed just because you put two hours in it. The time you’ll spend maintaining it will take much more.

This is exactly the Lean definition of Waste – everything not adding value to customers, and adding code just for you to feel better isn’t helping your customers. I now consider waste as one of my sworn enemies. At my work I’ve decided to take on myself the role of do-we-really-need-that dude. It means being a PITA sometimes, but it pays off tenfold.

Next time you feel tempted to commit that code you’re not sure you’ll need anymore, keep in mind the best code is no code.

You should subscribe to my feed and follow me on twitter

Book Review: Growing Object-Oriented Software

| Comments

Starting with a test means that we have to describe what we want to achieve before we consider how.

2010, for me, was a year with quite a good reading list. It was when I first got to read some really good books such as Clean Code, Agile Software Development, TDD by Example and Apprenticeship Patterns. These are all stellar books I highly recommend.

Yes, indeed it was an awesome year and yet I can tell you that the best book I read this year is Growing Object-Oriented Software, Guided by Tests (GOOS, for short).

I actually never heard of the authors before 2010. As opposed to books by authors such as Kent Beck and Robert Martin which one regularly hears about, I was quite astonished that I kept hearing about this book in different places.

I heard talks mention it, I saw lots of tweets about it and quite a few people that I highly value were praising it. This picked my interest and boy, am I glad I decided to add it to my pile.

I’ve read a lot about better development, better testing and better everything. And yet, I’ve never come across a book as thourough and as comprehensive as GOOS. If you read my other reviews you will see that what usually buys me over are good code walk-throughs. Now let me tell you, you haven’t seen a good walk-through until you’ve seen GOOS.

Code isn’t sacred just because it exists, and the second time won’t take as long.

On the one hand, the book is loaded with practical tips for making your tests better, faster, more readable and maintainable. It covers the nuances of testing ORM systems, GUIs, multi-threading problems and more.

On the other hand, every page turn is greeted with more nuggets of OOP lore. Actually, seeing all this wisdom clustered so tightly by people that have been struggling with these problems for over a decade now seems illegal to me. Are we really allowed to learn so many secrets of the profession this fast? Surely some sort of blood sacrifice has to be made?

Once we start a major rework we can’t stop until finished. There’s a reason surgeons prefer keyhole surgery to opening up a patient.

I’ve read GOOS over the course of a few months, consuming chapters little by little and letting the knowledge sink in. I was amazed at how much this affected my way of thinking about OOP and TDD, pretty much right off the covers. I already blogged about how my new OOP-Spidey-Sense helped us improve our architecture.

I’ll finish with saying this book is a game-changer for me, even though I’ve been doing TDD for a few years now. To the authors, Nat and Steve, I take my hat off. They have earned a place of honor in my Deserve-A-Beer list.

And to sum up all these great quotes from GOOS, here’s another gem:

The last thing we should have to do is crack open the debugger and step through the tested code to find the point of disagreement.

You should subscribe to my feed and follow me on twitter!

Using Puppet to Automatically Configure New EC2 Instances

| Comments

Note: I posted an update about doing the same with chef here.

This is a quickie techie post that summarizes a few hours of learning that I wish someone else had put up on the web before me. I assume some knowledge about Puppet, and recommend the Pro Puppet book and heard good stuff about Puppet 2.7 Cookbook.

So, I wanted to be able to configure via Puppet the way our new instances should be configured, and then be able to easily spawn new instances that will get configured by said puppet. The first part is installing puppetmaster. I decided to manually setup an EC2 instance that will act as the puppet master:

Under /etc/puppet/manifests/site.pp we place the “main” entry point for the configuration. This is the file that is responsible for including the rest of the files. I copied the structure from somewhere where the actual classes were put under /etc/puppet/manifests/classes and import it in site.pp. Do note that currently this setup only supports a single type of node, but supporting more should be doable using external nodes to classify the node types.

Auto-signing new instances

A common problem with puppet setups is that whenever a new puppet connects to the puppet master it hands it a certificate which you then have to automatically sign before the puppetmaster will agree to configure it. This is problematic in setups like mine where I want to be able to spawn new instances with a script and don’t hassle with jumping between the machines right after the certificate was sent and approving it. I found two ways to circumvent this:

1. Simply auto-signing everything and relying on firewalls

In case you can allow yourself to firewall the puppetmaster port (tcp/8140) to be only accessible to trusted instances, you do not actually need to sign the certificates, you can tell puppet to trust whatever it gets and leave the security in the hands of your trusty firewall. With EC2 this is extremely easy:

  • Setup a security group, I’ll call mine “puppets”
  • Add a security exception to the puppetmaster that allows access to all instances in the “puppets” group
  • Create all puppet instances in the “puppets” security group
  • Configure puppet to automatically sign all requests: echo “*” > /etc/puppet/autosign.conf

I decided to go with this solution since it’s simpler and less likely to get broken. I didn’t see it documented anywhere else. The downside is that you’ve got to have your puppetmaster on EC2 too.

2. Automatically identifying new instances and adding them

This is a solution I saw mentioned a few times online. Using the EC2 API tools write a script that gets the DNS names of all the trusted instances you’ve got and write them. Once you have this getting it to run with a cron job every minute will do the trick. This can be done with sophisticated scripts, but for my (very initial) testing, this seemed to work:

Getting new instances to connect to the master

The last piece of the puzzle. Since we use Ubuntu, we could simply use the Canonical-supplied AMIs. These support user-data scripts that are executed as root once the system boots. Below is a simple script that does this:

  1. Update the instance
  2. Add the “puppet” entry to DNS – puppet expects the master to be accessible via “puppet” DNS resolution. This little snippet gets the current IP of the master via our DNS name and writes it to /etc/hosts
  3. Install & enable puppet and voila!

Once all of this is up and running, creating a new instance is as easy as:

ec2-run-instances -g puppets --user-data-file start_puppet.sh -t m1.small -k key-pair ami-a403f7cd

Happy puppeting!

You should subscribe to my feed and follow me on twitter!

Adding GOOS Sauce to GWT MVP

| Comments

For a few months now I’ve been using Google Web Toolkit. One thing that was bothering me was that even when following the praised MVP (Model-View-Presenter) pattern as per the documentation, you pretty quickly get into messy land.

Here’s a snippet from the official GWT MVP tutorial:

In this example, you see that our Presenter, when bound, registers a click handler for a button, in order to perform some action when it is called. This might seem nice and all, but there’s a smell. This is a violation of the Law of Demeter (the missing SOLID rule, one might say). This simply makes it harder to test, since we now have to add another layer of indirection between the SUT and its collaborators. Instead of making the view a tiny bit smarter, we use it as a dumb collection of widgets the presenter manages. This is clearly not in “Tell, don’t ask” form.

The thing that really bothers me is how coupled the presenter gets with its view. Take the above example, and say that you decided that it would be better to have two “save” buttons on the UI. Does the presenter really care? Should it even change? And what if you actually want the save button to change to a remove button when the user picked something? Should the presenter now deal with getSaveOrRemoveButton() ? Of course not.

GOOS it up

After beating around this bush for quite some time, I decided to try and find a better way. I’m currently reading the brilliant Growing Object Oriented Software book, and decided to try its approach to push a better implementation. After a bit of refactoring I got this:

This might seem like a tiny change. And it is. But it makes all the difference in the world in how more responsive your design gets, especially in our world where the view is most likely to change a dozen times before settling on something. Once there are enough of these, I push the presenter as a dependency into the view, and let it call the presenter directly. The funny thing is this style is actually implicitly mentioned in the second part of the GWT MVP tutorial. Just some GOOSing helped us get to a better, more malleable design!

Don’t be afraid to do something differently than the documentation, especially if you gave it a fair shot and it didn’t work out.

You should subscribe to my feed and follow me on twitter!

Notes from the 5th Israeli Software Craftsmanship Meeting

| Comments

This week I had the pleasure to attend to the 5th meeting of our Software Craftsmanship group, and boy what a meeting it was. For the first time we’ve tried a different format that is 100% hands-on and reduced someone-talking-with-slides time to a minimum.

The meeting was composed of 3 tables: a code review table where people actually brought code from home/work and discussed about with others, a TDD table where a veteran TDDer talked people through a Kata and a third table, led by yours truly, that aimed to introduce people to the concept of DRY and tackling some duplications problems.

Unfortunately for me, this means I couldn’t take part in the other tables, but from what I could pick up I must say it was really fun getting so much positive feedback for a meeting.

The DRY table consisted of a Kata I’ve composed specifically to raise issues of duplication already a few minutes into it, which you can find here. It was amazing seeing people keep working on the Kata after the 1-hour dedicated to it was up, and I never expected to see such variety (Java, C#, Python, Ruby, JavaScript and PHP were all spotted)!

The slides from my (extremely) short introduction to DRY are available here:

You should subscribe to my feed and follow me on twitter!

Serializer Kata: Practicing DRY

| Comments

This kata is intended to help one practice the DRY principle (Don’t Repeat Yourself). You can read more about DRY here.

A few notes:

  • After completing a step in the Kata and before moving on to the next, take the time to make sure your code’s duplication ≤ 0
  • For the sake of focus, you may ignore matters of character escaping, encoding, error handling, object graph cycles and the likes
  • Our focus is on reducing duplication, it is not finishing the kata

In this Kata, our goal is to implement 2 simple object serializers. One serializer is to XML, the other is to JSON.

  1. Support serializing  a class without any members
    1. To XML: EmptyClass –> <EmptyClass></EmptyClass>
    2. To JSON: EmptyClass –> {}
  2. Add support for serializing a class’ integer members
    1. To XML: IntClass(a=1, b=2) –> <IntClass><a>1</a><b>2</b></IntClass>
    2. To JSON: IntClass(a=1, b=2) –> { “a”: 1, “b”: 2 }
  3. Add support for serializing a class’ string members
    1. To XML: StrClass(a=“first”, b=“second”) –> <StrClass><a>first</a><b>second</b></StrClass>
    2. To JSON: StrClass(a=“first”, b=“second”) –> { “a”: “first”, “b”: “second” }
  4. Add support for serializing a class’ other class members
    1. To XML: CompositeClass(inner=(a=1)) –> <CompositeClass><inner><a>1</a></inner></CompositeClass>
    2. To JSON: CompositeClass(inner=(a=1)) –> { “inner”: { “a”: 1 } }

If you found this interesting subscribe to my feed and follow me on twitter!

Liskov Substitution Principle Violation Spotted in the Wild

| Comments

The Liskov Substitution Principle (LSP) states that “if S is a subtype of T, then objects of type T in a program may be replaced with objects of type S without altering any of the desirable properties of that program.” This principle is actually so important it’s part of SOLID).

At my work we’ve just wasted quite some time chasing down a bug that was due to a violation of LSP, in the JDK! Everyone knows the Set collection, which simply makes sure the collection can’t contain the same object twice (“same” as defined by Java’s “equals”). Set itself is unordered, which can be bumming, but fortunately the JDK people were nice enough to add SortedSet.

Given LSP, one might assume that wherever you use a Set you can simply replace it with a SortedSet in order to get the same thing but with sorted output. Well, think again! (Tam, tam, tammmm)

Suppose you have this nice class:

This is all very fine and dandy, but somewhere in our code we wanted to replace a Set of Accounts with a SortedSet, so the accounts will be displayed sorted by their names (only their names). So, we whipped up this simple Comparator:

This looked very cool and everything worked, until we attempted to add to the SortedSet 2 accounts with the same name but different IDs. We expected both to be inserted to it, since they are not “equal”, but were surprised by the result of this:

The above test fails. After much digging and debugging, we realized that the TreeSet uses the Comparator to determine whether the account was in the set. Once it found an object in the set that had the same “comparison value” (“compareTo” returned zero), it decided the account was already in the set. This is stupidly stupid, since we felt the natural behavior is that returning zero means we don’t really care about these objects’ order, and that equals() will be used to determine which are actual duplicates. Switching the code to use a non-SortedSet (e.g. HashSet) makes the test pass.

This violation of LSP has caused us much frustration and wasted efforts. Making me feel even worse, we found out after the fact this is documented behavior:

The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

Edit: So, yeah, had we read this beforehand we would have known this. The problem is we shouldn’t have to read it. Some commenters helped out and said we can simply make our Comparator look at “id”. Indeed we can, but this is a simplified case. In cases the object’s “equals” looks at all members, even private ones, how will you be able to provide just a comparator to sort them by a specific attribute that is public? The behavior of SortedSet means you simply can’t, making the whole point of having pluggable Comparators a a bit misleading, since most of them will have to re-implement ”equals”. Indeed, the docs for Comparator#compare (as opposed to Comparable#compareTo) recommend an ordering that is consistent with equals(), but sometimes that’s just not possible. In those cases, it turns out, one can’t sort!

To that, all I’ve got to say is “shame on you!”

You should subscribe to my feed and follow me on twitter!