Saturday, May 10, 2008

Auto Increment Values and mysql_insert_id()

The other day I was working on a project, inserting data into a MySQL database, nothing out of the usual. I had a function set up to do the inserts, and it returned the auto-increment value of the insert, if the insert failed I returned false. In the page I was calling the function I was running an if statement on the insert value to make sure that the insert was successful. And for some reason the if statement kept failing. So the first thing I did was check my query. It all looked good, so I checked my if statement and the function call. Everything was good. So I checked the database, and the insert was successful. But for some reason my if statement was failing. So I echoed the return value from the function, and it said (which can mean false too, which is why my if statement was failing). But why was the function returning ? So I started returning different things from the function to see where it was getting from. Turns out it was the mysql_insert_id() function I was using to get the auto-increment value of the insert.

So I looked that up on php.net and it basically says that if the previous MySQL query didn't perform an insert, then it returns zero. So I double checked and there was no other queries in between my insert and the call to mysql_insert_id(). So why it was returning zero, I'm still not sure. But the workaround I used at the time was essentially I ran an if statement in the function like:
$newid = mysql_insert_id();
if($newid === 0) {
return true;
}
else if($newid === false) {
return false;
}
else {
return $newid;
}

This just makes it so when I call the function, I don't get a failure notice when the insert was successful. However, this doesn't give me the actual id of the inserted item. In this case it wasn't that important to me, but it still bugged me that mysql_insert_id() wasn't doing what it was supposed to. So I was looking around, and found another workaround. I haven't tried it yet, but I am confident that it will work a lot more reliably. Here is a link to the original article where I found it. It goes into other ways of knowing the id of an insert (but the last one is really the only reliable one). To summarize it though, instead of using a function inside PHP to get the insert_id, you use a MySQL query.
INSERT INTO `comments` (`comment`, `username`) VALUES ('first comment', 'jojonaloha');
SELECT last_insert_id();


I would think that it would be best to do this in the same query, just for reliability. I also think this article brings up another interesting thing, which I've also been thinking about. That is with auto_increment fields or even INT fields in general; what is the maximum value of an int field? If you read MySQL's documentation, you can see the range of values that can be stored in MySQL numeric fields (tinyint, smallint, mediumint, int, big int), you can see what these values are and the corresponding ranges.

Typically I've used INT fields, and I've always seen them as size: 11, and have continued with the same thing. You might think that means the maximum is 99,999,999,999 but that isn't right. The maximum value (if you are leave the field as signed) is only 2,147,483,648. If you change the field to unsigned the max is 4,294,967,295 or 4^16. This is still not 11 digits long. So INT 11 is a little misleading on auto_increment fields. Of course, if you need a signed value (allowing negative numbers), then you need that 11th digit for the negative sign. But on auto_increment fields, you should make them unsigned and INT 10, because INT 11 just doesn't make sense.

So why does all that matter? Well, what happens when you get that 4 billionth item in the database? Basically the insert fails and returns an error that basically says "duplicate value on primary key". In normal circumstances you'll probably have a while before you have to worry about that, unless you are running a log of everything that happens on your site, then you could run into problems a lot sooner. In that case you may have to rethink how you set up your primary keys. One option (which is also in the previous article) is to not use auto_increment fields wherever possible. So, for example, instead of using an auto_increment id for a user, use their username as the primary key (which you should be making sure is unique anyway). That may not be the only or best solution. This is especially true if you have line item tables like the comments table, which contains the user who made the comment, because if the user changes their username, then you have to change all the line item tables as well.

I'm sure there are many solutions to this, but for now, it's something to think about. And if I come up with some really good ones, I'll let you know.

No comments: