Take MD5 for example, a few years ago MD5 was one of the most popular hashing algorithms for websites; However, it has quickly gone from a fairly secure algorithm to a big security no-no in the space of a few years. With large companies running huge distributed networks with custom software, it’s usually easier said than done to upgrade the system to use a newer hashing algorithm (It can take months and and even years to modify all the code required across all the different systems). With that in mind, I’ll explain the problem using MD5 and a hypothetical company still using such algorithm due to a lengthy upgrade process.
MD5 hashes are 16 bytes (32 hexadecimal chars which are half a byte each) in length and each byte contains 8 bits that can each be zero or one. That is (2^8)^16 or 2 ^ 128 combinations, in decimal: 340,282,366,900,000,000,000,000,000,000,000,000,000.
Now let’s take our theoretical site which has a 20 char password limit. Assuming the password can contain alphanumeric characters, is case sensitive, and allows the use of standard symbols: that’s 94 combinations for each character (94 ^ 20) which in decimal is 2,901,062,400,000,000,000,000,016,168,360,584,816,944.
What have we noticed already? 94 ^ 20 is far bigger number than 2 ^ 128. Sparing you any more big numbers, a 19 char pass (94 ^ 19) is significantly smaller than 2 ^ 128, so 20 chars is the shortest length of password that still (theoretically) produces more combinations than an MD5 hash can.
Because the MD5 algorithm takes input of any length, but all hashes are fixed at 16 bytes, multiple passwords can hash to the same value (see: hash collisions, collision attack). That is, your password could be the entire works of shakespear but a hacker bruteforcing the hash could theoretically find a match that is less than 20 characters (making allowing any more, a waste of resources).
Disclaimer: I use the word “theoretically” because MD5 does not generate hashes in a linear fashion, it is totally possible that multiple 20 char or less passwords could hash to the same value, but no 20 char or less password could hash to a given value; However, allowing more than 20 chars in this case would only slightly improve security as beyond this number collisions are a certainty.
Let’s say our software is written in C and has to hash passwords of variable length, how do we know the length of the password? Well, in C the end of a string is specified by the presence of the byte 0x00 (null byte), to get the length of the string; an application would count each byte of the string until it finds the null byte (slllllooooowwww). To speed up things we could limit passwords to a certain length, then just pad all passwords less than that length with a predefined byte. As a result we only have to handle a buffer of fixed length and not have to worry about working out the length of someones novel of a password.
If the software was storing plain-text passwords in a database (terrible idea), a length limit would also be a must, because: for reasons I’m not going to explain, databases can be handled easier and with greater speed if all fields in each row are of fixed length (in a theoretical high speed database, if one person had a 64 char password, every other user in the database should have a 64 char password field to make each row of equal length, which would result in am unnecessarily huge database).
The Ape Condition Problem
Let’s say you put 4 apes in a metal cage and hang a banana on a string from the roof. Every time an ape pulls on the banana: it triggers a mechanism that electrifies the cage, shocking ALL the apes. Eventually the apes learn that touching the banana results in them all getting electrocuted. You then take one of the apes away and replace it with a new one, as soon as the new ape reaches for the banana, the other 4 apes beat the shit out of him. Quickly the new ape learns that touching the banana results in a beat down from the other apes. If slowly over time you keep swapping the apes out until none of the apes were there in the beginning: you now have a bunch of apes beating the shit out of any ape who touches the banana, and not one of them knows why.
The programming community sometimes works in a similar way. In the past due to software and hardware limitations there were many reasons to limit passwords and little reason to have long passwords (poor cracking hardware meant even short password cracking was near impossible). Over time these reasons become invalid, but some programming continue to implement such limits, their reason? Other people do it (or some 1990s forum thread tells them they’d be an idiot not to).
Alternate Security Means
A simple problem with people typing long passwords: It’s likely they make a mistake while typing it. Think about credit card pin numbers, A 4 digit pin which you’re hardly going to mistype. If everyone is limited to a 4 digit pin, the chances of mistyping or forgetting it are slim, which allows the system to implement harsh security measures such as locking your account after 3 failed attempts.
I should point out that such means of security only apply to systems where password hash databases are unlikely to be leaked, or the cracked database won’t result in mass compromise of accounts.