codelord.net

Code, Angular, iOS and more by Aviv Ben-Yosef

Shell Hackery: The Use of “cd .”

| Comments

I have a nasty habit of going over my bash history every once in a while. Usually I sort commands by frequency to find stuff I can automate/alias. Last time I came across cd . and thought I’d write up a little explanation of why I find this seemingly useless command useful.

So what does it do? cd . literally means “change directory to the current directory”, which sounds like a no-op. The point is that sometimes the current directory is no longer the current directory! Let’s start with an example.

Say I have a git repository on my_repo/ and on its master branch there’s a my_repo/folder directory and on its bugfix branch that directory doesn’t exist. Now imagine I have a terminal window open after performing the following command:

cd my_repo/folder # now on branch master

And now, while that terminal is open I need to switch to the bugfix branch for a few minutes, do my thing and return to it. If I switch branches using a different terminal or some GUI tool, what becomes of my terminal’s shell? When I switched to the bugfix branch, git essentially removed that directory the shell was in, and when I returned to the master branch, the directory was put back into place.

So, one might expect that after switch back and forth between branches and returning to my original terminal, simply executing ls -l will show that everything is ok. But it won’t. What I would actually see when running ls -l is that the current directory is empty!

Oh no! Are all our files lost? Nope. They’re right there in my_repo/folder, but our shell doesn’t know that. To understand why, we need to dig a bit deeper. When a unix process accesses any file or directory, it obtains a file descriptor to it. That includes a shell’s current directory – all throughout its lifetime, it has an open fd of the current dir. You can see that by running lsof -p [your shell pid].

When process A holds an open fd to a file/directory and process B removes that directory, what should happen? Unix doesn’t have that file locking mechanism windows does. What it does do is remove the file from anywhere except still holding it somewhere til process A finishes working with it. What this means is that if, for example, you’ve got a file open in some software and accidentally “rm”ed the file, you can still recover the file because it’s held somewhere by the open program. You can see an example for restoring files this way on linux here.

Back to our problem! Our shell process is now sitting with its current directory actually being some phantom directory that is no more. That means that even after we checked out the master branch again and the directory was already there, no one updated our shell regarding that. It does know it’s in “my_repo/folder”, though.

That means that in order to quickly get our terminal back to being useable (say, we want ls to actually show stuff) we can, of course, be all lame, close the shell and open a new one. Or, we can “refresh” the file descriptor to the current directory. How?

cd .

Hope you learned something new!

You should subscribe to my feed and follow me on twitter!

Why I Regret Choosing RightScale

| Comments

A few months ago we had to decide on some framework/environment to use for our devops needs. I’ve blogged about my experiences with Puppet and Chef on EC2. Somehow, we eventually ended up using RightScale.

Quick disclaimer: this is not a rant and I don’t intend any bashing. It’s just a report of my impression from using it.

RightScale provide a system for configuring and managing your cloud infrastructure, from defining how servers are created to monitoring and changing them. RightScale has a few nice features. It has a pretty nice clustering setup of MySQL solution for EC2. It also has decent monitoring and alerting capabilities.

My main problem with it, though, is that they basically took a few steps backwards from all other known solutions, making my life so much harder. I’ve pointed most if not all of these issues to RightScale on twitter and private emails, yet I can’t imagine seeing these issues solved any time soon.

Scripting (Dis)Abilities

If you’ve used Chef or Puppet, you probably got hooked on the ease of managing and creating your own set up scripts. RightScale’s solution, RightScripts is a weaker, 1990ish kind of solution:

  • No templates – remember the days you had files with placeholders like @@REPLACE_HERE@@ to sed out? Know how nice are real templates in Chef for example, where you can use .erb files? Well, with RightScale it’s all gone again. Sed away.
  • No dependencies – RightScale do have a nice RightScript to install MySQL. Problem is, it depends on a bunch of other scripts and there just isn’t any link to it. Install it, hopefully find a reference for dependency name in README. Install dependency. Look for its dependencies. Error prone and tiresome.
  • Made up version control model – No longer can you use git to update and manage your scripts. RightScale has a dumb-down version control system where you can “commit” changes to scripts you make. These aren’t accessible locally on your machine and lack all the nice features of real VCS: can’t grep, can’t search history. You can’t do a git status and see what has changed all over your servers. Chaos.
  • Scripts are edited in text areas – that’s right. That means I’m constantly copying the script from the browser to vim, edit it, copy back and save.
  • No easy sharing of scripts – with Chef you could download cookbooks from all over the internet. With RightScale you’re limited to a closed and pretty empty market of rightscripts.
  • No composability – say you’ve got a generic script to attach an EBS volume to a server. Want to attach 2? Thought you can just call the script twice with different parameters? Wrong! You can’t. Only option is to copy and paste the script with a new name and new parameter names.

Some of these issues might be solved soon, since RightScale seem to be working on enabling use of Chef for scripts. We’ve tried to set up this beta on our installation but got a lot of exceptions and left it as it is for now.

Mouse Control

The UI is centered around clicking way too much. They’re pretty nice monitoring dashboard per machine is not configurable. That means that for each and every server we have a routine of doing over a few graphs, clicking and dragging stuff the way we like them. Want to change alert type of a server? Click them all one by one. Need to run a script on all your servers? Click, click, click. This is a painstakingly slow process that makes me feel undervalued each and every time.

No Automatic Updates

The beauty of systems like Chef and Puppet is that you can make a change in the configuration and it will automatically get to all of your servers. That’s not the case here. You have to go over each server, figure its state and then run the proper scripts.

Bottom Line

If you have decent coding ability and know your way around a server, chances are you’d be better off no using RightScale. There’s just so much you’ll be missing out and a major time waste. I truly hope to see these issues taken cared of, but I think we’re far from it.

You should subscribe to my feed and follow me on twitter!

In the Mind of a Master Programmer

| Comments

He would probably object to me calling him that, but I’ve long ago realized Kent Beck is one of the precious few who deserve the title “a mastermind”. With Extreme Programming, Test Driven Development, Responsive Design, the Four Elements of Simple Design and more under his belt, who can claim otherwise?

After attending a workshop of his about a year ago and listening to him talk a whole day I was astonished. I tried to pick his brains to understand what makes him tick. Of course there are many factors here – reading over 10,000 books and being smarter than most would help anyone. But something a bit less common takes a major part in this in my opinion, what Kent told me he has: a “habit of desperately wanting things to make sense” and his ability to take things apart until they do.

I recently picked up another of Kent’s books, Implementation Patterns. I love this book because it shows exactly that: his process of thinking and breaking things apart in order to understand them. The book provides a rare glimpse to his method of decomposition. Since I’ve been coding for years, a lot of the patterns made sense to me or seemed trivial. But the “magic” is the fact he was able to put into words things that for me were just hunches. Actually explaining what makes you sense a method is too long or what is a proper name for a variable is something I’ve never seen done with such care to specifics.

Because it’s such a quick read, I think anyone will benefit by reading Implementation Patterns. More than helping you understand our craft better, it will provide a new outlook on decomposing and judging your designs and pretty much everything else.

You should subscribe to my feed and follow me on twitter!

Input Validation means more than Javascript

| Comments

So much has been written about security before, that I never thought I’d end up writing something about it. Then again, I never thought one of the top U.S. banks will get hacked simply by twiddling digits in a URL.

Basically, the only thing you should take away from this post is that when it comes to external data – trust no one. And I mean absolutely no one.

I think and hope that by now most web developers know not to trust data that users entered in input fields. That trust is what gave birth to SQL injections. Nowadays, just about no one should be exposed to such a lame problem, especially since pretty much every ORM framework out there protects you from these. But checking your input fields is just the beginning.

Every form of input you accept, even indirect input, is still untrusted input. I just want to go over a few examples, because you all should have this in mind:

URLs – Just like I mentioned above, CitiBank got hacked simply because someone noticed an integer on his browser address bar and started incrementing it. Any parameter you accept from a URL should be examined. Accessing an email by id? Make sure it corresponds with the current user. Always.

Form arguments/JSON – These are just the same thing as validating input fields. Everyone should know by now that it’s wrong to trust and validation done on the client side, since every moderately capable person can craft his own POST/GET requests and bypass any validation. Validate everything on the server. And don’t use the client as a place to put some state in, unless it really belongs there. I can’t tell you how many ecommerce sites I’ve seen that pass the price of products along your regular forms as hidden input fields. From that point it’s just a few right clicks in firebug and you’re gonna get that LCD TV for $1.

Cookies – Again, these are inputs generated from your clients. Yeah, you put the cookie there in the first place, but since you put it there your users had the chance to do whatever they want to it. So, putting in a cookie any kind of integer means it has to be validated again on the server side, just like a URL parameter. Any data you put there might have been mangled. The solution is to either not use cookies for anything like that, or sign your cookies the way Rails does.

Really anything possible – Have you ever used a service that allowed you to update certain stuff via email? That’s, for example, another form of input. You wouldn’t want someone to change some URL/number in the email when he’s replying and get access to a different user’s data, would you?

These are really just the tip of iceberg, but I’m constantly surprised to see how many around us are popping up web sites with no thought given to these problems. Just a tiny bit of thinking can prevent you from topping reddit for being a lame developer.

You should subscribe to my feed and follow me on twitter!

Statistics of 62K Passwords

| Comments

A couple of days ago, LulzSec published a batch of 62K random logins (emails and passwords). At first, I grabbed it in order to make sure that neither me nor anyone on my contacts had his passwords revealed. Later I decided to run a few stats on this rare dump of data. Following are a few interesting facts.

Password length

The dump’s average password length is 7.63. I was surprised, because I thought most users would use something like 4 characters, but then remembered a lot of sites nowadays enforce a a 6-8 character limit minimum, so this makes sense. As you should know, and as you can find in Hacking: The Art of Exploitation, longer passwords are greatly harder to crack, so this is definitely a case where size does matter.

Here’s a short graph depicting the distribution of password length (Note that edge groups have less than 10 passwords and so aren’t really seen here):

Passwords by length

Common Passwords

Not surprisingly, the most common password is 123456 with 569 occurrences, followed by its “more secure” cousin 123456789 with 184. The 3rd most common password is… “password” (132 occurrences)! The other top-10 passwords are interesting – some are plain words such as “romance”, “mystery”, “tigger” and “shadow”, “102030” makes quite a few appearances.

The 10th most used password is quite intriguing actually – “ajcuivd289”. Everyone on the internet seem baffled as to the source of this password. My guess would have to be it’s some worm that resets the accounts it hacked into to it. Edit: As Marc comments below, the logins with these passwords seem “clustered”, which makes it more likely that these are actually the result of some bot creating accounts. Thanks Marc!

A couple hundred passwords are just not-so-random keyboard taps (“123qwe”, “asdf1234”, etc.). 789 passwords are taken exactly from the username, and twice that many are part of the username followed by some digits (most seem like birth years).

Inside Passwords

12179 of the passwords are all numeric, some are 14 digits long! That’s just crazy. While 34717 (that’s more than half) of the passwords contain any digits, only 1262 contain capital letters and 533 contain special characters!

Some Common Words

418 passwords contain the word “love”. “sex” is in 125, “jesus” in 67. More people prefer cats (414) to dogs (291). And the language battle – 6 javas, 2 pythons and 17 “ruby”s (guess which one is also a name).

I’d like to sum this up with urging you to never use the same password twice and use a password manager in order to generate secure passwords! Using a password manager ensures that even if a certain site is breached, it doesn’t mean all of your passwords are revealed, and secure paswords are just harder to brute force.

You should subscribe to my feed and follow me on twitter!

Sometimes Tests Have to Fail

| Comments

A friend asked me about a common problem that pops up in real-world projects and testing: What do you do when you test code with random properties?

A simple example might be handing out jobs to a few workers. If your algorithm for doing that is random, you can usually assert that no one of 3 workers gets all 10 jobs, for example. But, being random, that assert should eventually fail. We’ll assume that with the frequency the team runs the tests, a failure is expected every few days.

Surely no one wants to see the tests fail a couple times a week (especially if you’re keeping score for who broke the build). On the other hand, you’d like to keep the tests. What is a pragmatic coder to do?

If you’re not that meticulous to your suite rarely failing, you might just leave it as it is, which, I think, sucks.

The mega-tester’s approach, which I’ve tried in the past, is usually to stub out the random number generator with values that make sure the failures won’t happen. This is usually cost-effective only for the simplest of cases, and the more complex ones results in brittle tests that are coupled to the implementation and that might need to be changed frequently.

What I rather is to postpone the problem! Say we change our test’s parameters to 10 workers and 3000 jobs. The chances of one worker getting all jobs becomes quite minor. This tweak of parameters in the test is usually simple to do and can guarantee quite a safety net.

And still, sometimes bad stuff happen. 64bit hash collisions are somewhere, out there in the world. If you’re one of those guys that are bugged by that chance, I give you a simple JUnit rule that will retry a specific test in case it fails, making it twice as unlikely to fail. Those 64bit collisions are now more like 128bit! woohoo!

The rule allows you to simply annotate a test to make it retry in case it fails:

And the implementation is as simple as:

With the tests so unlikely to fail, I’d start a lottery at work for whoever breaks them.

Happy testing!

You should subscribe to my feed and follow me on twitter!

Testing Techniques: Managing External Resources

| Comments

A friend approached me with one of the known problems in the testing world – How do you keep external resources under a test harness? Having heard the question a few times before, I thought I’d share my thoughts, and mainly put together the common advice that drifts around the web.

The Dilemma

Nowadays, it’s hard to get more than a 100 lines of code before adding an external resource to our code. It might be a web service to manage something, or some convoluted API to receive data from or just about anything. Usually, writing tests for code that directly talks with these resources using the resources themselves is very problematic, for numerous reasons:

  • It significantly slows the tests, because it requires network access and processing on the service’s side.

  • It might cost you money, send emails, tweet stuff and do things you’d rather not do 300 times a day as you run your tests.

  • Making your code handle error conditions with the service is hard or impossible, as you can’t control when those occur.

Basically, all of these factors usually amount up to you having crappy tests that you rarely run. That sucks.

Decouple & Isolate

The best solution I’m aware of is simply isolating the thing. We usually strive to wrap whatever service we’re using with a single-point interface. The decoupling is great since I’ve yet to encounter a service with an API that matched my thinking of the domain problem. Wrapping it up allows us to keep using our own language and logic throughout the system.

A benefit of that is we now have a simple interface or facade we need to stub/mock out during tests. That’s usually relatively easy, and allows us to run our tests blazingly fast and test all those hard to reach to corner cases.

But what if the service changes?

That’s the finishing touch. You should still maintain a suite of tests that run against the real service. Those should be the plain tests that make sure you’re using the API right and that would break if anything you’re relying on changes. These tests won’t be part of your regular suite that gets run constantly. Instead have your CI server run them daily/weekly and let you know when something changes.

This puts us basically in a win-win situation, with us being able to run our tests quickly and yet have the assurance that we won’t miss API changes and the likes.

Happy testing!

You should subscribe to my feed and follow me on twitter!

Design is Simpler Now: Embrace the Extract

| Comments

For the past 5 years or so I’ve been searching for ways to produce better designed code. I hate the fact I basically can’t put my finger on why certain designs aren’t as good as others.

That’s why I was really blown away when I first learned about the SOLID principles and started practicing TDD. At last I have found rules that gave me the capability to weigh designs, and a process that helped push me towards what feels like better code.

But even 5 rules were too much for me!

SOLID, no doubt, drives better design. My problem was incorporating it natively with my every day coding. Call me dumb, but I just can’t bring myself to contemplate 5 different aspects whenever I whip up a class. I still find it as an excellent checklist to go through when I’m considering refactorings, but thinking about it constantly just drained a big part of my concentration.

For a few months now I’ve been getting the feeling that my OOD toolset has reduced quite a lot to the very essence. That feeling was also magnified by reading GOOS and pretty much everything written by J. B. Rainsberger here and here.

The first tool I use heavily (and I mean heavily, my mind has managed to get OCD about it) is duplication – or DRY. This tool alone makes any codebase a magnitude better. I’ve written plenty about DRY before.

But, just yesterday I realized that other than that, I mainly concentrate on one thing, as I contemplated on twitter:

I think I can sum up all my OOD skills with “wait, shouldn’t this be in a different class/method?” Wondering if that’s a good thing…

Yup, that’s the trick. I was quickly assured by two amazing guys that have been doing this longer than I’ve been breathing, agile manifesto authors:

Ron Jeffries: Yes it is a good thing. I would suspect you also note duplication?

James Grenning: Think of the alternative.. you are asking the right question

You see that? Noticing duplication and moving stuff somewhere else. That’s all there’s to it. This simple question directs at you the Single Responsibility Principle and generally, along with DRY, covers most of the bases needed to adhere to the elements of simple design.

The main question I ask myself now every time I think of a problem, start changing a function, write a test, and at just about anytime I’m coding is “is this the right place for this?” And quite often the answer is “no.” Push this forward and beautiful designs show up, designs of short, cohesive classes. So, to sum it up: Embrace the Extract.

You should subscribe to my feed and follow me on twitter!

Crafting Up - Community is Key

| Comments

It’s been almost a year now since the founding of our local Software Craftsmanship group. This, for me, is a huge dream-come-true.

For years I’ve been looking for a good community around here to join, went to several meetups and looked around to no avail. My frustration grew about a year ago when I noticed the Chicago community is so buzzing with activity, people there have a meetup every day almost. That’s why when Uri started organizing the first meeting I jumped in whole-heartedly.

In just a few months the meeting has influenced me quite a lot. First of all I got to meet a lot of new, smart and interesting people I never would have otherwise. It’s not easy to find people that are as passionate about our profession as I am, yet our group didn’t disappoint me.

The meetings also supply my need to pair with new people. Pair programming is a magical way of working and sharing knowledge, and I’ve yet to have a session with a new pair without picking up something new. I love the first minutes where we have to find a common language to get things started, and even more the high fives of getting a green bar.

Also, a good community is the best way to get feedback. I can say I’m trying to leech this to the max. I’ve already gave talks/sessions at 2 meetings, bugging people frequently on twitter and the mailing list. A varied community of like minded people allows you to get different outlooks and insights to things you’ve been neck-deep in for a while.

And last but not least, a good community might make magic stuff happen. I don’t know how, but I’m sure our group had something to do with the fact some of us got to have dinner with Uncle Bob and Brett Schuchert, two awesome coders and Clean Code authors, on their last visit here.

Bottom line, be part of a community, and if there isn’t one around you help start it! It’s a great source of kindred spirits, an invaluable and rare resource!

You should subscribe to my feed and follow me on twitter!

Making Embedded GitHub Gists Show Up on RSS Readers

| Comments

Just a quick let-you-know: I found out that the gists I use to embed code in my posts don’t show up on RSS readers (e.g. Google Reader).

I know how annoying it is not to be able to read a blog fully from my reader for me, and so found a nice Wordpress plugin called Embed GitHub Gist that handling embedding gists elegantly and also automatically makes sure the code will be displayed even on readers.

I’ve even updated my latest post (about Chef and EC2) to work with it, and new posts from now on will look good too :)