Definition of Oldest Interesting Transaction

Abstract: Definition of Oldest Interesting Transaction

Problem: Definition of Oldest Interesting Transaction Solution: The oldest interesting transaction in a database is the oldest transaction not marked as committed. InterBase keeps a record of the outcome of every transaction on its TIPs (Transaction Inventory Pages). The states recorded on the TIP are: 0 -- active (or not yet started) 1 -- prepared (and abandonned there) 2 -- rolled back (or died in action) 3 -- committed When a transaction starts, it scans the TIP(s), looking for the first transaction it finds that is not marked as committed. It then makes a list of the states of all subsequent transactions up to itself and carries the list around with it for reference. Whenever a transaction reads a record, it compares the transaction id on the newest version of the record with its list. If the state of that transaction is listed as: COMMITTED, that's the version our transaction reads, and everyting is easy. ROLLED BACK, our transaction removes the rolled back version of the record from the database and starts on the next version back. ACTIVE, and the transaction listed as active is still alive, our transaction goes for the next oldest version. (If our transaction wants to modify the record, it has more to do, but that's really another topic.) If the transaction listed as active has died without telling anybody, our transaction changes that transaction's state to rolled back, removes the record version it was looking at, and starts on the next version back. PREPARED, our transaction waits for the prepared transaction to decide what it's going to do with its life. If the prepared transaction dies in the prepared state, our transaction gets an immediate error (unless it's declared that it doesn't care about records in limbo, in which case it reads the back version). If the prepared transaction commits or rolls back, our transaction handles the record as if it had always been committed ir rolled back The longer the list of interesting transactions, the longer the scan takes, and the more space the list takes. Even if none of the apparently interesting transactions are actually interesting (because none of them changed the database) their existence slows everybody else down. Sweeping the database, automatically or with Gfix, removes all rolled back records from the database and then changes the state of rolled back transactions to committed. Sweeping the database is the cheapest way to reduce the number of interesting transactions. (Backing it up and restoring it has the same effect, at much greater cost.) DEAD or ALIVE? Transactions frequently have to figure out whether other transactions are dead or alive. A transaction indicates to others that it is alive by holding an exclusive lock on its transaction id. A transaction queries the health of another by requesting a shared lock on the transaction id of the other. If the lock request fails, the transaction is alive. If it succeeds, the other transaction has met some end. If it closed itself cleanly, it will have recorded its final state. If not, then the transaction that just got the lock will change the state of the deceased transaction to rolled back. OLDEST ACTIVE The oldest active transaction is the oldest transaction whose interests have to be considered when garbage collecting old versions of records. In our multi-generational records, old versions gradually wither, die, and are pruned out by new transactions as they read records. Easiest case -- one is one and all alone. If my transaction (number 13) is starts alone and runs alone in the database, then the oldest active transaction in the database is 13. If my transaction reads a record and finds that the newest version was committed by transaction 12, keeping old versions written by 10 and 8, my transaction will recognize that no one cares about the two old versions and remove then, releasing space for reuse. Easy case. This gets more complicated when there are several transactions active. If my transaction (still 13) starts alone, but is soon joined by transaction 14 which creates a new version of that record, then there will be versions 14 and 12. Transaction 14 will see version 14, and my transaction will continue to read 12. If 14 commits and 15 then starts, 15 will see record version 14 as fine for him, but needs to leave version 12 around for me. So both 14 and 15 need to know that there is an older active transaction and protect its interests. That's the easy case, and could be handled by having each transaction look quickly at the TIP and remember the number of the oldest transaction marked as active. Reality is harder. Consider the case that transactions 12, 13, and 14 are all running together. An interesting record was created by transaction 10, which committed before any of them started, so they can all read that version. Number 12 modifies the record and commits. Number 14 commits. Number 13, still around, still wants the record version created by transaction 10. When transaction 15 starts, it has to know that transaction 13 can't read record versions created by transaction 12. Just knowing that the oldest transaction running around at the moment is 13 isn't good enough. What 15 (and every new transaction starting up) has to remember is the oldest transaction active when any transaction currently active started. Coping with reality. The way that works is that each transaction notices who's the oldest one alive and wandering around when it starts. It records that elderly transaction's id with its own transaction lock. It then queries all the other transaction locks to see who's the oldest any of them know about. The second number is the number the new transaction uses as its 'oldest active' when it decides what versions of records are really too moldy to keep.