Category Archives: Databases

Manage Transactions in MySQL – Lesson 1

MySQL has become the database of choice for many Web developers over the last few years and for good reason. It’s fast, free, easy to use, and has great community support.

But many experienced developers refused to touch MySQL because, they complained, the product didn’t implement features that were absolutely critical in an SQL server. MySQL’s most egregious omission, according to some, was its lack of transaction support. But thanks to recent developments in MySQL land, that’s no longer the case.

When it first hit the cyber-street, MySQL offered only one table type for data storage, the ISAM table now upgraded to the MyISAM type for all recent versions of MySQL. But MyISAM tables were limited. Very limited.

Then the folks from Sleepycat Software came into the picture. Sleepycat creates and sells a database storage engine which is used mostly with embedded devices. The storage engine comes with an API that allows developers to integrate Sleepycat’s data storage software into their products. And that’s just what the folks at MySQL did, they integrated the Berkeley DB (or BDB) table from Sleepycat. This was the first transactional table type included available to MySQL users.

Berkeley DB tables were followed shortly by two other transactional table types: InnoDB and Gemini. Gemini tables are adopted from another embedded storage mechanism; this one from NuSphere, aProgress Software property. InnoDB tables were designed specifically for MySQL.

But we’re getting ahead of ourselves here. Before we take a closer look at each of these different options, we need to start at square one: Why you want transaction support in the first place.

Contents

  1. Getting in on the Transactions
  2. Are You on ACID?
  3. Lockdown!
  4. Using MyISAM Tables

 

Getting in on the Transactions

If you’re relatively new to databases, or if MySQL has been the database on which you’ve cut your teeth, you may not even know what a transaction is. Put simply, a transaction is a series of SQL statements that are executed as a single unit; either all the statements are executed completely or none are executed at all.

Why are transactions so important? Consider a fairly typical Web application: a shopping cart. In a simple shopping cart application, you’re likely to have tables that look a little something like this (only less simplified, and with real info instead of the dummy data):

01 products
02
03 +------------+-------------------+---------------------------+---------------+
04
05 | product_id | product_name      | product_description       | product_price |
06
07 +------------+-------------------+---------------------------+---------------+
08
09 |          1 | Old Pair of Socks | Plenty of holes           |          4.25 |
10
11 |          2 | Old T-shirt       | Ketchup stains everywhere |          6.75 |
12
13 +------------+-------------------+---------------------------+---------------+
14
15
16
17 inventory
18
19 +------------+----------+
20
21 | product_id | in_stock |
22
23 +------------+----------+
24
25 |          1 |       25 |
26
27 |          2 |       12 |
28
29 +------------+----------+
30
31
32
33 buyers
34
35 +----------+-------+---------+------------+------------+
36
37 | buyer_id | fname | lname   | phone      | fax        |
38
39 +----------+-------+---------+------------+------------+
40
41 |        1 | John  | Doe     | 7185551414 | NULL       |
42
43 |        2 | Jane  | Johnson | 7185551414 | 2126667777 |
44
45 +----------+-------+---------+------------+------------+
46
47
48
49 orders
50
51 +----------+----------+-------------+
52
53 | order_id | buyer_id | order_price |
54
55 +----------+----------+-------------+
56
57 |        1 |        1 |       11.00 |
58
59 +----------+----------+-------------+
60
61
62
63 order_items
64
65 +----------+------------+---------------+
66
67 | order_id | product_id | product_price |
68
69 +----------+------------+---------------+
70
71 |        1 |          1 |          4.25 |
72
73 |        1 |          2 |          6.75 |
74
75 +----------+------------+---------------+

You can see how these tables fit together in this diagram:

schema.jpg

When it comes time to process an order, you have to run several SQL statements within a script (written in PHP, Perl, Java, or whatever language you prefer). In the script, you want to take a look at what items the buyer wants, see if there’s adequate inventory to complete the order, and if there is adequate inventory, you want to complete the order. In pseudo-code, the script used to complete the order would look something like this:

01 get buyer data and shopping cart data from web forms
02
03 insert buyer data into buyer table
04
05 start order by creating row in orders table
06
07 get current order_id
08
09 for each item desired
10
11    check available inventory
12
13    if inventory is available
14
15       write item to order_items table
16
17       decrement inventory
18
19    endif
20
21 end for loop
22
23 get total for items for the order
24
25 update orders table

I simplified this listing so that it’s easier to spot a potential problem. Consider what would happen if the power failed on the machine hosting the database just as it was in the middle of checking the inventory of the first item in the order. You’d restart your machine to find a row in the orders table without any child rows in the order_items table. It’s quite possible you’d be left with data that was, to a large extent, incomprehensible. You wouldn’t know what orders had been placed unless your customers sent you email, wondering just when you planned to send them the old pairs of socks they had requested not the best way to run a business.

If you had transaction support in place, however, you could treat this group of statements as a single unit. If for any reason the database failed to complete all of the statements in their entirety, the data would revert (or roll back) to the condition it was in prior to the first statement’s execution.

Of course there’s a bit more to transactions. A transaction-capable database must implement four specific properties. Wanna know what they are? Turn the page.

 

Are You on ACID?

Transaction-capable databases must implement four properties, collectively known by the mnemonic acronym ACID.

  • Atomicity: Transactions must be singular operations. Either all statements are executed as a single (atomic) unit or none are executed at all.
  • Consistency: In a transactional database, data will move from one consistent state to another. There will never be a point during a transaction when data is partly processed.
  • Isolation: The dealings of one transaction will not be visible to other clients until the transaction is completed successfully or rolled back. You can be sure that the data available to one transaction is accurate because it is isolated from changes other clients might make.
  • Durability: When a transaction completes successfully, the changes are permanent. Nothing, not even a disk crash or power failure, will erase the changes made by a successfully completed transaction.

SQL servers that allow transactions make use of several keywords: BEGIN WORK, COMMIT, and ROLLBACK. The phrase BEGIN WORK lets the SQL server know that the SQL statements that follow are part of a transaction. The transaction is not completed until either a COMMIT or ROLLBACK statement is executed. COMMIT writes the changes to the database. Once a transaction has been successfully COMMITed, only another successfully committed SQL statement can alter the data. No crashes or concurrently run SQL statements will effect the data. The ROLLBACK command tells the database that all of the statements within the transaction should be ignored and the database should revert to the point it was in prior to the start of the transactions. In the case of a crash, all transactions that were not expressly committed are automatically rolled back.

You can now revisit the pseudo-code first presented on the previous page. I’ve improved the listing by incorporating a transaction.

01 <tt>
02
03 get buyer data and shopping cart data from web forms
04
05 insert buyer data into buyer table
06
07 BEGIN WORK
08
09 start order by creating row in orders table
10
11 get current order_id
12
13 for each item desired
14
15    check available inventory
16
17    if inventory is available
18
19       write item to order_items table
20
21       decrement inventory
22
23    else
24
25        set error variable to true
26
27    endif
28
29 end for loop
30
31 if error variable is true
32
33     ROLLBACK
34
35 else
36
37     get total for items for the order
38
39     update orders table
40
41     COMMIT
42
43 </tt>

Notice the ROLLBACK that I’ve added. This ensures that if inventory isn’t available to complete even a portion of a user’s request, the entire order is ignored. No row will be written to the orders or order_itemstable. Are you beginning to see how good transactions can be?

In a Web environment, you can expect multiple users (database clients or threads) to be accessing the script simultaneously. Therefore, the inventory of a given item will be changing continually. But when using transactions, you don’t have to worry that a user will complete an order for an item that is actually out of stock. That’s because of the I (for Isolation) in ACID. In a transactional environment, each transaction is isolated from the other, so one transaction cannot see the working of another transaction until the first one is complete.

The last sentence was a bit of a simplification, but it’s pretty close, and it’s good enough for now. In order to isolate one transaction from another, database systems must implement some sort of locking scheme. That is, the database needs a way for one client (or thread) to lock out (or isolate itself) from all other clients.

Locking is a key element of transactions. I’ll be talking about this subject extensively throughout this tutorial, starting on the next page.

 

Lockdown!

Locking makes some portion of data the property of a single client. That client says, in effect, “this data here is mine, and the rest of you can only do what I expressly permit.” Locks can have one of two effects: A lock may prevent other clients from altering data (with UPDATE or DELETE statements) or a lock may prevent all access to some data preventing UPDATEs, DELETEs, and even SELECTs.

To understand locking mechanisms in MySQL, you first need to recognize that MySQL is an unusual product. It isn’t really a single, unified piece of software. Rather, it uses technology from several different sources, and the way you implement your transactions in MySQL largely depends on the table type you use. Each table-type uses a different method of locking, and the differences in those locking mechanisms will effect how you write your code.

MyISAM tables are very fast for SELECTs, but they have some serious drawbacks when it comes to locking. These shortcomings are what prevented MySQL from implementing some key database features, including transactions.

Looking at the way MyISAM struggles with locking, you really begin to appreciate the power and value of actual transactions. So before we get into BDB, Gemini, and InnoDB, let’s first take a look at the limitations of MyISAM’s table-level locking.

 

Using MyISAM Tables

I already mentioned that MyISAM tables (usually MySQL’s default table type) don’t support transactions. This is largely because MyISAM tables offer only table-level locking, which means that locks can only be placed on entire tables. So if you want to prevent a single row in a table from being changed, you need to prevent all rows in the table from being changed.

Take our inventory table as an example. If one client is buying an item, you’ll want to check the inventory, and if the item is there in sufficient quantity, you’ll decrement the number available after the sale. To make sure the quantity doesn’t change between the time you check the availability and the time your change the inventory, you’ll want to put a lock on that row. But because MyISAM offers only table-level locking, you’ll have to cut off access to all other rows in the table.

MyISAM offers two types of table-level locks, a read lock and a write lock. When a read lock is initiated by a client, all other clients are prevented both from making changes to the table via INSERTs, DELETEs, or UPDATES.

To see how a read-level lock works on a MyISAM table, open up two copies of the MySQL command-line client. Then create a table and insert some data with the statements below

01 create table inventory (
02
03     product_id int not null  primary key,
04
05     in_stock int not null,
06
07     index index_on_in_stock(in_stock)
08
09 )type=myisam;
10
11
12
13 INSERT INTO inventory (product_id, in_stock) VALUES(1,25), (2,12);

Now in one client, place a read lock on the inventory table with the following command:

1 LOCK TABLES inventory READ;

Now in the second copy of the client run a SELECT * FROM inventory. You’ll see that the command executes just fine. However, if you try to run an UPDATE, it’s a different story. Try the following command with the lock in place.

</pre>

1 UPDATE inventory set in_stock=24 where product_id=1;

</pre>

You’ll see that this copy of the client does not respond. It’s locked out and can’t execute the command. Only the client that placed the lock can change the table. Now go back to the first copy of the command-line client and release the lock with:

1 UNLOCK TABLES;

Once the lock is released the second client will be free to run the UPATE command and change the row.

A write lock prevents other clients from even running SELECTs on the locked table. You can place a write lock with the following command:

1 LOCK TABLES inventory WRITE;

Go back and run the previous experiment but this time issue a WRITE lock. You’ll see that the other client is prevented from doing anything, even reading from the locked table with a SELECT.

So now you know all the locks that are possible with MyISAM tables. As I mentioned earlier, the folks at MySQL AB (who run MySQL development) often argue that transactions really aren’t necessary. They say that by properly applying locks and writing clever SQL, you should be able to avoid the need for transactions. Below I’ve written some pseudo-code that adds transaction-like abilities to the shopping cart pseudo-code using MyISAM tables.

01 INSERT query into buyers table.
02
03 run last_insert_id() to get buyer_id
04
05 run INSERT into orders table
06
07 run last_insert_id() to get order_id
08
09 get write lock on inventory table
10
11 for each of the items in the order
12
13     get quantity from the inventory table
14
15     if quantity is > 0
16
17         insert into order_items table
18
19         update inventory table subtracting the ordered item
20
21     elseif quantity = 0
22
23         delete all items from order_items with current order_id
24
25         delete item from orders table with current order_id
26
27         update inventory table to replenish items previously subtracted
28
29         set error variable to true
30
31         break from for loop
32
33 if error variable is not true
34
35     update orders table with the current order_id, adding the order_total
36
37 else
38
39     output error

I might be able to clean this up a bit and remove a few of the SQL statements, but I think you see the way you’d need to go writing transaction-like code with MyISAM tables. While this code might add a degree of isolation (remember the I in ACID) by way of locks, there are other ACID properties missing here. Most notable is the lack of consistency: The data moves through several inconsistent states before the script is done. If the power goes out during one these inconsistent phases, you’ve got problems.

While ISAM locks are better than nothing, they are really no substitute for true ACID transactions. In Lesson 2 of this tutorial, I’ll discuss locking and transactions in BDB, InnoDB, and Gemini tables.

Manage Transactions in MySQL – Lesson 2

In Lesson 1 of this tutorial, you leaned how to approximate transaction-like behavior with MyISAM tables. In this lesson, you’ll see how you can achieve real transactions with each of MySQL’s three transactional table types: BDB, InnoDB, and Gemini. But before I get to specifics surrounding each table, I need to revisit the notion of locking.

Remember that MyISAM supports only table-level locks of two types: read locks and write locks. The transactional tables types offer more granular locking mechanisms: They can place locks on specific subsets of data. You’ll read more about each table’s locking mechanisms as I get to them in the following pages.

Contents

  1. Using BDB Tables
  2. Working with Gemini Tables
  3. Using InnoDB Tables
  4. The Right Choice for You

 

Working with MySQL’s Transactional Table Types

Transactional table types can apply two types of locks. These are known as shared locks and exclusive locks. A shared lock is similar to the MyISAM read lock. When a client places a shared lock on data, other clients cannot alter that data with UPDATE or DELETE statements, but they can read the locked data via SELECTs. When a client gains an exclusive lock, other clients can neither alter the locked data via UPDATEs and DELETEs, nor can they read the data with SELECTs.

Before we get to the actual code, there’s a MySQL setting I need to mention. MySQL contains an environment variable called autocommit. By default, autocommit is set to 1. When autocommit is set to 1, you cannot run transactions with any of the MySQL table types. In this mode, MySQL treats every SQL statement as its own transaction. To use groups of statements as transactions you need to turn autocommit mode off. So in all the code snippets that follow, you’ll notice that I start each with the line:

1 <code>SET autocommit=0;
2
3 </code>

OK, now on to the rest of the code, starting with BDB tables.

 

Using BDB Tables

To check if you have BDB tables available, run the following query:

01 mysql> show variables like '%have%';
02
03 +---------------+-------+
04
05 | Variable_name | Value |
06
07 +---------------+-------+
08
09 | have_bdb      | YES   |
10
11 | have_innodb   | YES   |
12
13 | have_isam     | YES   |
14
15 | have_raid     | NO    |
16
17 | have_symlink  | YES   |
18
19 | have_openssl  | NO    |
20
21 +---------------+-------+
22
23 6 rows in set (0.00 sec)

If the first row in the listing says “no” you don’t have BDB installed. If you’ve got either InnoDB or Gemini, that’s fine — you can just use one of those, and I’ll talk about these tables soon (but please stick around, what you learn here will be useful there). If you decide to work with BDB tables, you can either install a MySQL-Max binary or recompile MySQL from source using the --with-bdb flag. BDB tables are available for most operating systems, including Windows, Linux, and Mac OS X.

To create BDB tables, you can include type=bdb at the end of the CREATE statement. For example:

1 create table inventory(
2
3     product_id int not null primary key,
4
5     in_stock int not null
6
7 )type=bdb;

Or, if you wish to move data from a MyISAM to a BDB table, you can run an ALTER statement, like so:

1 <code>
2
3 ALTER table table_name type=bdb;
4
5 </code>

Note that before you run an ALTER statement like this, you should backup the table. This type of ALTER statement actually dumps the table data, creates a new BDB table, and then loads the dumped data into the table. It’s an involved process that can fail, and if it does fail you want to be protected. Also be prepared to wait a while. If you’ve got large table, the dump-and-reload process can take a bit of time.

BDB tables offer page-level locking. What’s a page? It’s a group of rows. The exact number of rows in a page varies depending on the number of rows in your tables, the operating system in use, and other factors. During a transaction, MySQL will apply a lock to a page worth of data. All other rows are unaffected by locks and ready to process whatever queries you throw at them.

Page-level locking is a vast improvement over the table-level locking used by MyISAM tables. If only groups of rows — small subsets of whole tables — are locked by a single client, that means many clients can access different portions of the table simultaneously. In fact, it’s acceptable, even commonplace, to have many clients applying various locks to different portions of a table at the same time. The simultaneous access of data by different clients is usually referred to as concurrency. In database land (where the men have hairless backs and the women shapely legs) the greater the concurrency the better.

One of the great things about transactions is that for the most part you don’t need to worry about manually applying the locks. Just start a transaction, run your SQL statements as you normally would, and the clever MySQL engine will mostly take care of the locking for you. Neat-o!

Take a look at a simple transaction in a BDB table working with the following table:

01 mysql> select * from inventory;
02
03  +------------+----------+
04
05  | product_id | in_stock |
06
07  +------------+----------+
08
09  |          1 |       24 |
10
11  |          2 |       47 |
12
13  +------------+----------+
14
15  2 rows in set (0.05 sec)

To see the effect of a transaction, open up two copies of the command-line client. In one, start a transaction and run a SELECT on the first row:

1 mysql> set autocommit=0;
2
3 mysql> begin work;
4
5 mysql> select * from inventory where product_id=1;

Now in the second client, start a transaction and try to update the row 1.

1 mysql> set autocommit=0;
2
3 mysql> begin work;
4
5 mysql> update inventory set in_stock=23 where product_id=1;</code>

You’ll see that the second client, the one trying to do the UPDATE, doesn’t respond. That’s because the first client has a share lock on that row. To release the lock, commit the transaction in the first client:

1 <code>mysql> commit;</code>

You’ll notice that the second client is immediately freed to run the update operation.

Take a look at the following improvement to the pseudo-code I used in Lesson 1 of the tutorial. Note the use of the BEGIN WORK-COMMIT/ROLLBACK syntax.

01 set autocommit=0
02
03 INSERT query into buyers table.
04
05 run last_insert_id() to get user_id
06
07 BEGIN WORK
08
09 run INSERT into orders table
10
11 run last_insert_id() to get order_id
12
13 for each of the items in the order
14
15     get quantity from the inventory table
16
17     if quantity is > 0
18
19         insert into order_items table
20
21         update inventory table subtracting the ordered item
22
23     elseif quantity = 0
24
25         set error variable
26
27 if error variable is not set
28
29     update orders table with the current order_id,
30
31     adding the order_total
32
33
34
35     COMMIT
36
37 else
38
39     ROLLBACK
40
41     output error

Pretty nice, eh? Much cleaner than the MyISAM transaction-equivalent, I think. Keep in mind that you get all the ACID properties with this code if you’re using BDB tables. There is only one line of this code that doesn’t really work well and needs to be improved.

Notice that for each item in the order, I first check the quantity from the inventory table. That quantity will come from a SELECT statement, something like SELECT in_stock FROM inventory WHERE product_id=1. But in most cases, the inventory is going to change just after this SELECT is run. If there’s sufficient quantity in stock, this person is going to buy the item and therefore the quantity listed in the table will decrement by one when the transaction is complete. You’d hate to see other clients get data that will be inaccurate in a matter of milliseconds.

In a case like this, you’d be much better off putting an exclusive lock on a page containing the row in question until the transaction is complete. Remember, with an exclusive lock, all other clients are forbidden from even reading the locked rows. But unfortunately, BDB tables have no way to execute exclusive locks on pages of data. Only by instituting a table-wide write lock can you safeguard the data in the inventory table.

When revised with the table-wide write lock the pseudo-code looks like this:

01 set autocommit=0
02
03 INSERT query into buyers table.
04
05 run last_insert_id() to get user_id
06
07 BEGIN WORK
08
09 run INSERT into orders table
10
11 run last_insert_id() to get order_id
12
13 LOCK TABLES inventory WRITE
14
15 for each of the items in the order
16
17     get quantity from the inventory table
18
19     if quantity is > 0
20
21         insert into order_items table
22
23         update inventory table subtracting the ordered item
24
25     elseif quantity = 0
26
27         set error variable
28
29 UNLOCK TABLES;
30
31 if error variable is not set
32
33     update orders table with the current order_id,
34
35     adding the order_total
36
37
38
39     COMMIT
40
41 else
42
43     ROLLBACK

The LOCK TABLES command here is a bit messy. But with the other two MySQL tables you won’t have to worry about this kind of lock.

 

Working with Gemini Tables

It’s difficult to talk about Gemini tables without touching on the controversy surrounding the maker of the Gemini table, NuSphere. The NuSphere corporation is currently in a legal dispute with MySQL AB, the folks who code the MySQL core engine. The issues include copyright infringement and abuse of the GNU GPL. You may want to read further about these issues before choosing the Gemini table type.

Gemini offers a more granular level of locking than BDB. With Gemini (and InnoDB, which I’ll discuss momentarily), you get the advantage of row-level locking. With Gemini, you can apply both share and exclusive locks to one row at a time. A row-level locking scheme will generally achieve higher concurrency than page-level locking, and much greater concurrency than table-level locking.

You can check if you have Gemini installed by running show variables like '%have%'. And if you don’t have Gemini, you can get an installer that includes MySQL, PHP, mod_perl and other goodies from theNuSphere website. Note that the most current release of the Gemini table seems to have some bugs running on Windows 98, so if any of the following experiments crash your NuSphere MySQL installation on Windows 98, don’t blame me.

To create Gemini tables include the type=gemini syntax at the end of the CREATE statement, and to transfer a current table to Gemini, run an ALTER TABLE query. The same caveats I mentioned for BDB apply here. For example:

01     create table inventory(
02
03     product_id int not null primary key,
04
05     in_stock int not null
06
07 )type=gemini;
08
09
10
11
12
13 ALTER table table_name type=gemini;

Unlike BDB, Gemini tables implement both shared and exclusive locks. If a thread runs an UPDATE or DELETE statement on a row, Gemini will place an exclusive lock on that row, preventing other threads from either reading or altering the locked row until the transaction is complete. If you think about it, this makes perfect sense. If a row is about to change, other threads should be protected from seeing potentially inaccurate data.

If a thread runs a SELECT statement on a row, a shared lock will be placed on that row so other threads can read the row with SELECTs but they won’t be able to alter the row with UPDATE or DELETE statements.

You can do some quick experiments with Gemini tables to see how this locking works. Assume you have the following Gemini table.

01 mysql> select * from inventory;
02
03  +------------+----------+
04
05  | product_id | in_stock |
06
07  +------------+----------+
08
09  |          1 |       24 |
10
11  |          2 |       47 |
12
13  +------------+----------+
14
15  2 rows in set (0.05 sec)

Open two copies of the command-line client. In the first start a transaction, then run a SELECT on row 1.

1 mysql> set autocommit=0;
2
3 mysql> begin work;
4
5 mysql> SELECT * FROM inventory WHERE product_id=1;

Now in the second client, start a transaction and try two statements, a SELECT and an UPDATE.

1 mysql> set autocommit=0;
2
3 mysql> begin work;
4
5 mysql> SELECT * FROM inventory WHERE product_id=1;
6
7 mysql> UPDATE inventory SET in_stock=23 WHERE product_id=1;

You’ll see that the SELECT executes immediately, but the UPDATE doesn’t execute. The UPDATE won’t execute until you COMMIT or ROLLBACK the transaction started in the first client, which releases the shared lock.

You can re-run this experiment trying an UPDATE in the first client. You’ll notice that the second client is prevented from reading the locked row with a SELECT. When a client runs an UPDATE statement, it gets an exclusive lock on that row.

When I discussed BDB tables, I pointed out that at times when inventory numbers are changing fast, you need a SELECT to obtain an exclusive lock so your users access the most up-to-date information. When you run a SELECT on the inventory table to find out the number of items currently in stock, you’re better off locking out all other threads from reading that row because that number is likely to change very quickly — an UPDATE will almost always follow that SELECT.

With Gemini tables, you can get a exclusive lock while running a SELECT by using the FOR UPDATE keywords. For example.

1 <code>SELECT * FROM inventory WHERE product_id=1 FOR UPDATE</code>

Using this added power, the shopping cart pseudo-code can be re-written like so:

01 set autocommit=0
02
03 INSERT query into buyers table.
04
05 run last_insert_id() to get user_id
06
07 BEGIN WORK
08
09 run INSERT into orders table
10
11 run last_insert_id() to get order_id
12
13 for each of the items in the order
14
15     get quantity from the inventory
16
17     table with SELECT ... FOR UPDATE
18
19
20
21     if quantity is > 0
22
23         insert into order_items table
24
25         update inventory table subtracting the ordered item
26
27     elseif quantity = 0
28
29         set error variable
30
31 if error variable is not set
32
33     update orders table with the current order_id,
34
35     adding the order_total
36
37
38
39     COMMIT
40
41 else
42
43     ROLLBACK

This is much cleaner, and much safer, than non-transactional code. And you can do something similar with InnoDB tables.

 

Using InnoDB Tables

InnoDB offers row-level locking and works on a wide variety of operating systems, including most UNIX variants, MacOS X, and Windows. This table type is now included by default in most MySQL installations, so if you get current binary from mysql.com, you’ll have InnoDB tables. You can check if you have InnoDB installed by running SHOW VARIABLES LIKE '%have%';.

You can create InnoDB tables by using type=innodb in the CREATE statement or by running an ALTER TABLE command. For example:

01 create table inventory(
02
03     product_id int not null primary key,
04
05     in_stock int not null
06
07 )type=innodb;
08
09
10
11
12
13 ALTER table table_name type=innodb;

The InnoDB transaction model is not quite as straightforward as the Gemini model. If you don’t understand how InnoDB implements transactions, you could end up with some serious problems with your data.

InnoDB uses what’s called a multi-versioning concurrency model. Essentially, this means that each transaction is locked to a view of the data at a specific point in time. When a thread initiates a transaction, InnoDB gets a snapshot of the data based on the exact moment the transaction started. To understand how this works, take a look at the following example. You have a simple InnoDB table:

01 mysql> select * from inventory;
02
03 +------------+----------+
04
05 | product_id | in_stock |
06
07 +------------+----------+
08
09 |          1 |       24 |
10
11 |          2 |       47 |
12
13 +------------+----------+

In one copy of the command line client, you start a transaction:

1 mysql> set AUTOCOMMIT=0;
2
3 mysql> begin work;

This thread now has a snapshot of the table seen above. If you were to then open another copy of the client and start another transaction, that second client would have the same snapshot of the same table. At this point, the first client runs an UPDATE statement and commits the transaction.

1 <code>mysql> UPDATE inventory set in_stock=23 WHERE product_id=1
2
3 mysql> commit;</code>

Even after this COMMIT, the second client will not see the results of the UPDATE. The second client receives a snapshot of the data at the moment its transaction began. Until the second client finishes its transaction (with a COMMIT or ROLLBACK), it will not be able to view the changes implemented by other threads that took place while the transaction was active.

Things get really sticky if the second client was also running an UPDATE on the same row. If the first and second clients start their transactions at the same time, they would get the same snapshots of data. The first client would run an update, WHERE order_id=3, but the second client would be unaware of the change. The second client could then run its own update, UPDATE WHERE order_id=3. After both of these clients commit their transactions, only one of the values can survive. As it turns out, the last one to COMMIT wins.

As you can see, this is potentially a dangerous situation. If you’re not careful, some of your threads could see out-of-date information.

But there are steps you can take to make sure the rows obtained by your transactions are up to date. It may not surprise you that staying current involves locking. With InnoDB, threads can apply the same types of locks that were available in Gemini.

In Gemini tables, when you run a SELECT within a transaction, a share lock is placed on the rows read by the SELECT. However, by default, InnoDB does not place any locks on rows during a SELECT. But you can let InnoDB know that you want to obtain a share lock by the adding the clause LOCK IN SHARE MODE at the end of a SELECT.

When a thread issues a SELECT with a LOCK IN SHARE MODE clause, InnoDB goes beyond the snapshot of data taken at the start of the transaction and looks for the latest committed data. If another thread is operating on that data (via an UPDATE, DELETE, or another LOCK IN SHARE MODE), the first thread won’t be able to obtain the share lock and will wait until the other thread finishes its transaction and releases its locks. By attempting to obtain the share lock and receiving the latest committed data, the LOCK IN SHARE MODE clause ensures that the data read by a SELECT will be accurate and up to date.

If you’d like to see how this works, try the following, using the same table shown above. In one client, start a transaction and then run an UPDATE statement:

1 mysql> set autocommit=0;
2
3 mysql> begin work;
4
5 mysql> update inventory set in_stock=22 where product_id=1;

Now in the second client, run two different SELECTs. The first you will run without the LOCK IN SHARE MODE clause. Try SELECT * FROM inventory. This will execute immediately, but the data is potentially inaccurate because there’s no telling if the first client will COMMIT or ROLLBACK the transaction. If you alter that SELECT statement to include the LOCK IN SHARE MODE clause, the client will not respond until the first client issues a COMMIT or ROLLBACK.

Threads can obtain exclusive locks in SELECT statements by using the same FOR UPDATE clause that is used with Gemini.

The pseudo-code shown for Gemini will also work for InnoDB tables. But be very careful when writing your SELECT statements for InnoDB. You want to make sure you’re getting the latest, most accurate data.

 

The Right Choice for You

At this point, InnoDB is the most logical choice for most. It offers row-level locking, is available on most every operating system, and is getting a lot of support from MySQL AB. InnoDB is also the first table type to enforce foreign key constraints. But if you’re working with an installation that has support of one of the other transactional tables, both of those will work just fine.

When you decide on the table type you wish to work with, you’ll have more technical matters to consider. You should read everything the MySQL manual has to say on the table type you end up using.

Pay especially careful attention to your logs. The transactional tables use logs to help recover from crashes. If a hard-disk failure causes a crash on your system, you could be in trouble if your logs are on the same hard disk as your MySQL data.

If you’re unhappy with the types of locks initiated by specific statements, you can adjust the locking behavior of Gemini and BDB table with the SET TRANSACTION command. I won’t cover it in detail here because the defaults discussed in this tutorial are usually preferable. Note that InnoDB tables are unaffected by SET TRANSACTION command.

Transactions are a great improvement to the MySQL database. If you’re not using them now, consider using them in the future. Because when that backhoe works it way through your power cord, you’ll be awfully glad you did.

Django 1.2 Alpha Offers Multiple Database Support, Improved Security Features

Django, the popular web development framework written in Python, has released the first alpha for its much-anticipated new version, Django 1.2.

Among the new features coming in Django 1.2 are support for multiple databases — a key feature for larger websites running Django — improved security features and a messaging framework that works much like Ruby on Rail’s “flash” feature.

The multiple database support will likely be the most important part of the next version of Django since it will allow for much easier application scaling. Django 1.2 makes it easy to target individual databases within your apps using some new queryset methods which make it easy to read and write to specific databases.

The security features include much-improved protection against Cross-Site Request Forgery (CSRF) attacks. For more details on how the CSRF protection works, have a look at the new CSRF documentation page.

If you’d like to test out Django 1.2, or see how your apps run on the new release, head over to thedownloads page or update your Subversion checkout. Keep in mind though that this is still an alpha release and should not be used on production sites. The final release of Django 1.2 is scheduled to arrive in March 2010.

Manage Transactions in MySQL

In this tell-all tutorial, Jay Greenspan, author of MySQL Weekend Crash Course and co-author ofMySQL/PHP Database Applications, starts with a tour of the basics: He answers the age-old Q: “What’s the big deal with Transactions?”; investigates the four properties that a database must have to be considered transaction-capable; takes a closer look at locking mechanisms; and finishes up with a look at MyISAM tables, the lesser cousin of fully transaction-capable tables.

Once you have a taste of the limitations of MyISAM tables, you’ll be hungry for the real deal. In Lesson 2, Jay satiates that hunger with a thorough introduction to MySQL’s different transactional table types: BDB, Gemini, and InnoDB.

Cake and Trax for Beginners

In my last article, Building With Ajax and Rails I made a faintly disparaging joke about some new web frameworks that have been created in fond imitation of Rails. I got a lot of feedback about that joke. I’m not allowed to comment here about the pending lawsuits, but I would ask that the drive-by eggings of my house and threats to my family please cease. (They’ve been relocated to a secret Webmonkey farm anyway.)

Today we’re going to take a look at a couple of those frameworks for PHP:Trax and Cake. Both attempt to bring the quick, easy, helpful tools and easily understood, easily maintained structure of Rails to PHP — a boon to web developers who know PHP and perhaps have some keeper code in that language, but can’t resist the Rails buzz. Both Trax and Cake use the same model-view-controller pattern and Active Record ways of addressing data that Rails does. Makes one curious, no? I don’t have time to get deeply into them today, but both stress “rapid development,” so let’s see if I, your average not-too-bright web developer, can get a little app off the ground before the end of this article.

Contents

  1. Cutting into Cake
  2. Makin’ Trax
  3. Hardly the End of the Story
  4. Helpful Links

 

Cutting into Cake

This is 100 percent true:At lunch yesterday, my fortune cookie said “A nice cake is waiting for you.” So, I am going to start with Cake first and try Trax later.

I go to the Cake site and download the latest stable release (discouragingly called 1.1.19.6305) onto my Debian web server running Apache and PHP 5. Unzipping the tarchive, I immediately see some names that look familiar from Rails:app, config and something called dispatcher.php. The app directory looks even friendlier. It contains views, controllers, model – just like Rails!

There’s also a webroot directory, which I assume is going to be the root of my application. Indeed, look at that! If I browse to http://my.web.server/trax_and_cake_playground/cake/app/webroot, I see a friendly boilerplate intro page. Later, if I decide I want to use Cake full time, I’ll add a line to my Apache configuration to set that directory as the whole server’s root directory. At the moment, the page is extolling Cake’s coolness and warning me, “Your database configuration file is not present.”

Correct, it’s not. I find a sample database configuration in app/config/database.php.default. I’ll copy that sample to an actual working file, database.php, in the same directory. The file includes an array of settings, which I change to suit my own environment:

01 var $default = array('driver'   => 'mysql',
02
03
04
05                   'connect'  => 'mysql_pconnect',
06
07
08
09                   'host'     => 'localhost',
10
11
12
13                   'login'    => 'paul',
14
15
16
17                   'password' => 'passw0rd',
18
19
20
21                   'database' => 'bookmarker' );

Now, when I reload the intro page, it tells me, “Your database configuration file is present. Cake is able to connect to the database.” Yay!

Now, I’ve told it I have a MySQL database called “bookmarker,” but that’s not true yet. I still have to create it. I’m eager to get rolling, so I’ll just recycle the code from Webmonkey’s Ruby on Rails for Beginnersarticle in which I built a bookmarking app. This time I also want to populate the table with a little starter data, to give us something to work with. So I type:

01 $mysql
02
03
04
05 mysql> CREATE DATABASE bookmarker;
06
07
08
09 mysql> USE bookmarker;
10
11
12
13 mysql> CREATE TABLE bookmarks (
14
15
16
17    ->  id SMALLINT (5) NOT NULL auto_increment,
18
19
20
21    ->  url VARCHAR (255),
22
23
24
25    ->  name VARCHAR (100),
26
27
28
29    ->  description VARCHAR (255),
30
31
32
33    ->  PRIMARY KEY (id)
34
35
36
37    -> );
38
39
40
41 mysql> INSERT INTO bookmarks (url, name, description) VALUES
42
43
44
45 ('http://webmonkey.com/', 'Webmonkey', 'Not the only monkey that makes
46
47
48
49 websites, but the best!');
50
51
52
53 mysql> INSERT INTO bookmarks (url, name, description) VALUES
54
55
56
57 ('http://ubuntu.com/', 'Ubuntu', 'Linux for human beings (no monkeys
58
59
60
61 please!)');
62
63
64
65 mysql> exit;

At this point in developing a Rails app, I’d run the built-in scaffold-generating tool. Here in Cakeland, I see a file called bake.php in cake/scripts, and instantly I pick up on the metaphor – bake a cake, get it? Turns out that’s a false start, as bake.php doesn’t seem to work in the pre-beta. It’s giving me all sorts of fatal errors and files not found. I’m going to have to create my model, view and controller by hand-editing code.

The app/models directory contains an empty file called put_models_here. So, I do just that, creating a file in that directory called bookmark.php:

01 <?php
02
03
04
05 class Bookmark extends AppModel
06
07
08
09 {
10
11
12
13     var $name = 'Bookmarks';
14
15
16
17    function display()
18
19
20
21     {
22
23
24
25         $this->set('data',$this->Bookmark->findAll());
26
27
28
29     }
30
31
32
33 }
34
35
36
37 ?>

That puts the data from the table in $data, using the built-in set method. You might want to read along in the Cake API documentation – I certainly am!

Then I create a controller, app/controllers/bookmarks_controller.php:

01 <?php
02
03
04
05 class BookmarksController extends AppController
06
07
08
09 {
10
11
12
13     var $name = 'Bookmarks';
14
15
16
17 }
18
19
20
21 ?>

I will also need to create a view. In the app/views directory, I create a subdirectory called bookmarks and a file called index.thtml inside of the subdirectory. The “t” in the file extension is for template, I assume. Note that I have had to scratch around in some slightly opaque documentation to glean all this – not as polished as Rails, not yet at least! In index.thtml I just put together a little foreach loop to display my content:

01 <?php
02
03
04
05  echo "<p><table>";
06
07
08
09  foreach ($data as $bookmark):
10
11
12
13       echo "<tr><td>";
14
15
16
17       echo $html->link($bookmark['Bookmark']['name'],
18
19
20
21  $bookmark['Bookmark']['url']);
22
23
24
25       echo "</td><td><b>";
26
27
28
29       echo $bookmark['Bookmark']['description'];
30
31
32
33       echo "</b></td></tr>";
34
35
36
37  endforeach;
38
39
40
41  echo "</table><p>";
42
43
44
45  ?>

Finally, I need to add routing in order to tell Cake where my little app is and what to do with it. Inapp/config/routes.php I add a line:

1 $Route->connect ('/bookmarks/*', array('controller'=>'bookmarks',
2
3
4
5 'action'=>'display'));

Note that the action, “display”, corresponds to the method I defined in my controller.

Now all the pieces are in place at last. I browse tohttp://my.web.server/trax_and_cake_playground/cake/index.php/ bookmarks (I can clean up that ugly URL later) and lo! The boilerplate page is now displaying the preliminary bookmarks I put in my table! I am up and running. In Rails, at this point, I would have some nice auto-generated methods to add and delete bookmarks. Hopefully Cake’s auto-bake feature will come together soon.

Next, let’s take a look at Trax.

 

Makin’ Trax

Quick now, we know the drill. Go to the Trax site, download the archive, untar it. Like all of these prima-donna frameworks, Trax wants to be the top level of my web hierarchy, or rather it wants its publicdirectory to be, and its trax directory to be parallel at the same level, out of sight of prying eyes. If my web server were publicly accessible, I’d put it up there, or find a way around that limitation. For now I’ll just unzip it in my playground (so to speak). Whoo!

Already now, if I go to http://my.web.server/trax_and_cake_playground/trax/public/, I have a page telling me “Congratulations, you’ve put PHP on Trax!” I feel good.

Trax does have a working auto-generate function, which is very nice, because I am getting tired of hand-editing all these files. If I wanted to do that, I’d be writing Perl or something. Still, there are a few more steps. I need to install the PEAR modules that Trax requires, which I do on the command-line:

1 pear install Mail_Mime

I’ll use the same database and table I created back at the beginning, so long ago. I need to configuretrax/config/database.ini to tell the application where they are:

01 phptype = mysql
02
03
04
05 database = bookmarker
06
07
08
09 hostspec = localhost
10
11
12
13 username = paul
14
15
16
17 password = passw0rd
18
19
20
21 persistent = 1

And then I need to edit public/.htaccess to tell the application where in my file system its own configuration is. The start of that file reads:

1 # php include path for Trax ( Pear & Trax config dir )
2
3
4
5 php_value include_path
6
7
8
9 .:/usr/local/lib/php:/home/<username>/trax/config

I change it to suit my Debian setup:

1 # php include path for Trax ( Pear & Trax config dir )
2
3
4
5 php_value include_path .:/usr/share/php:/var/www/html/trax/trax/config

Now I am ready to — make Trax? Lay down Trax? There doesn’t seem to be a catchy verb for auto-generating models and controllers and views here. I, um, look both ways and prepare to cross the Trax.

The script that does the work is trax/script/generate.php. First, I need to edit the first line of that file to point to where my PHP executable actually is. Not /usr/local/bin/php, as the default has it. Mine is at /usr/bin/php. Also, the script seems to be happiest if I add my path information there as well, so toward the beginning of the file I enter the following:

1 ini_set("include_path", ".:/usr/share/php:/var/www/html/trax/trax/config");

While I’m at it, I’ll add the same line to public/dispatch.php.

Then I need to make generate.php executable with a deft chmod a+x.

Now I’m ready to run the generate script, which I do from the command line. If your server doesn’t like PHP scripts to be run from the command line, you’ll have to open the script in a browser. I type this:

1 ./trax/script/generate.php model Bookmark

and Trax replies:

1 created /var/www/html/trax/trax/app/models/bookmark.php

Then I type:

1 ./trax/script/generate.php controller bookmark display add

and Trax replies:

01 created /var/www/html/trax/trax/app/controllers/bookmark_controller.php
02
03
04
05  created /var/www/html/trax/trax/app/helpers/bookmark_helper.php
06
07
08
09  created /var/www/html/trax/trax/app/views/bookmark/display.phtml
10
11
12
13  created /var/www/html/trax/trax/app/views/bookmark/add.phtml

“Display” and “add” are the initial views I want. Finally (I hope), I just need to add routing, so I can view my views in a browser. I add the following to trax/config/routes.php:

1 $router->connect( "", array(":controller" => "bookmark") );

and voila! I can see a page! True, it’s blank, which means that I need to flesh out my actions and views. But it is a page! Let me just put in a couple of basics.

In trax/app/controllers/bookmark_controller.php, there’s an empty definition of the view() method. I’ll put this in:

01 function view() {
02
03
04
05        $bookmark = new Bookmark();
06
07
08
09        $this->bookmarks = $bookmark->find_all();
10
11
12
13    }

and in app/views/bookmark/display.phtml I put a PHP table similar to the one I used for Cake, to loop through the bookmarks table and display each one.

Whew. Two starter semi-apps built in as many hours. Not too bad.

 

Hardly the End of the Story

Would I do it again? Are either of the frameworks are 100 percent ready for prime time? Documentation is getting better every day, but neither framework has anything like the whopping user community of Rails.

I was able to iron out my own snags — several had to do with permissions, Apache directory overrides, and stuff like that – but if I couldn’t, it would be nice to have more resources to turn to. Conversely, PHP has a considerably larger support community on the Web than Ruby does, although that tide may slowly be turning.

Trax and Cake aren’t the end of the PHP dev framework story:there’s Symfony; Zend has a framework and a quick trawl of the web turns up things like PHP2Go, Qcodo and doubtless many more.

Rails is a pleasure to use, while Trax and Cake combine pleasure with a bit of effort. If you’re a PHP wiz, though, the fact that they’re based in a language you know should more than outweigh any difficulties brought on by the frameworks’ immaturity. It’s easier to debug a stuck app than to learn a whole new language, even one as perspicuous and enjoyable as Ruby.

So tune in next time when we cover Brails, the new web programming framework for the visually impaired! That’s a joke. Don’t sue.