Monday, April 21, 2008

Can You Hitch in a Cab?

Sometimes when repackaging an msi for an enterprise it's necessary to add files to an installation using a transform. For example, you may wish to drop out a common file that has been captured by the original vendor with a custom component code, and install it using its proper merge module.

It would be great to be able to embed the cab into the transform, so that the transform is self-contained. Well there's good news, and bad news.

The good news is that you can embed the cab in the transform. The bad news is that the transform can't be used during installation. Which is pretty bad news.

So effectively, you can hitch in a cab, but its illegal. At least in the msi world.

Here's the low down. You can embed a cab in the transform as long as the cab's binary field is listed in a regular table entry. For example, it must be listed in the Binary table. If the cab is listed only in the _Streams table, it won't get saved into the transform.

If you generate a transform file with an embedded cab, and apply it to an msi in InstEd, you will successfully be able to extract the cab again, proving that the transform contains the cab file. In fact, if you apply the transform to an msi, and perform a Save Transformed command, the resultant msi will install fine, including any files from the added cab.

However, if you apply the transform with the embedded cab to the msi during an installation:
msiexec /i msi_file TRANSFORMS=mst_file
then you get an error related to the installation not being able to find the cab.

My guess is that this is related to this little snippet in the msdn docs for MsiDatabaseApplyTransform:
The MsiDatabaseApplyTransform function delays transforming tables until it is necessary. Any tables to be added or dropped are processed immediately. However, changes to the existing table are delayed until the table is loaded or the database is committed.

When installing an msi, it seems that the OLE structured storage streams are extracted "before" the table that references the cab is transformed. This is surmised from fact that the transform contains the stream for the cab (you can pull it out of the transform in InstEd), but it is not accessible during the installation process. So, a likely scenario is that the table that references the cab's binary data (and hence the underlying stream) is transformed after the streams are extracted.

Can you hitch in a cab and get away with it?
Could you force the table that references the cab to be transformed before the installation code extracts the streams? Well, I haven't tested it, and it would be unsupported, but you might be able to do so by adding a custom action to the transform that reads from the table that references the cab. This would force the table to be transformed.

If this custom action could be run early enough in the InstallExecuteSequence (or even the InstallUISequence) then perhaps the table would be transformed before the streams were extracted. But if it did work, it would be unsupported and could possibly break in future releases of msi.

Having said that, it would be nice if Microsoft did officially allow cabs to be embedded in transforms.

Tuesday, April 8, 2008

Care for a Date?

While file system time stamps are not the ultimate arbiter of whether or not files have been edited, they can be useful in determining at a glance whether files have been edited.

Unfortunately, the Windows Installer API forces the Last Write Time timestamp to update just by opening a file in anything other than read only mode. Orca, and InstEd, don't (by default) open files in read only mode. Therefore just by opening a database file (not transforms) in Orca, even if no changes are made and a File->Save is never executed, the timestamp of the file will be changed when the file is closed.

This can be unfortunate when customising an msi installation, and trying to determine at a glance whether the source msi has been changed or whether all the changes have been kept in mst files (as they ideally should).

Why is this?
As previously discussed, the Windows Installer file format is based on OLE Structured Storage. The Structured Storage API provides a "transaction" mode, whereby changes to a file can be discarded. Coincidentally, the Windows Installer API also provides a transaction mode. It is almost certain that the Windows Installer transaction mode utilises the OLE Structured Storage transaction mode, rather than implementing transactions itself.

It is this transaction mode with which Orca, and InstEd, open files by default. Using this mode, the tool can make many changes to the file, and they get discarded unless specifically committed (File->Save). This allows fast saving (no need to save all the tables when saving, just call Commit), and easy discarding (simply don't call Commit).

But the underlying OLE Structure Storage transaction mode can, if required, save changes in "scratch" areas in the file, until such time as they are committed. Therefore, in transaction mode, the file must be opened with write permissions, even if commit is never called.

And at the NTFS file system level, as soon as a file handle that has been opened with write access is closed, the Last Write Time timestamp is updated.

The good news is that InstEd preserves the Last Write Time timestamp if no changes are made to the file. It does this by storing the timestamp when the file is opened, updating the stored timestamp whenever Save is called, and resetting the Last Write Time to the stored version whenever the file is closed.

Transform files are never opened with write permissions until they are saved, and therefore don't suffer the same problem.

Wednesday, April 2, 2008

Killer Whales can be dangerous

Don't get me wrong, Orca has been the mainstay of anyone wanting to rapidly edit Windows Installer files for a long time. And does an excellent job. Mostly.

The problem is that there are a few nasty things that Orca does silently. So you won't even know the msi file being worked on has been corrupted. See my previous entry about the _Streams table.

One other danger is the Copy and Paste Rows functionality.

When a row is copied, it's fields are placed as tab delimited strings onto the clipboard.
When multiple rows are copied, each row's string is separated by appropriate end of line characters.

However, if a string field in a row contains a tab, or an end of line character, then that row cannot be pasted back into the database.

Unfortunately, the user is not made aware when pasting rows that Orca has stopped pasting them because it has found an invalid number of tabs (fields) in the row for the table.

This becomes dangerous because an expected behaviour, copy and pasting rows, silently doesn't work.

InstEd resolves this problem by quoting fields that have tab and end of line characters, and escaping quotes within such an escaped field. This is compatible with Excel, so that pasting rows back into InstEd or Excel will result in correct behaviour.

Furthermore, it only quotes fields that contain tab or end of line characters, or that have a quote at the start or end of the field. This provides as much compatibility for copying from InstEd and pasting into Orca as is possible.

The upshot is that InstEd will always copy and paste rows correctly within itself, and with Excel, whereas Orca has the potential to (silently) lose information when pasting rows.