Wednesday, March 26, 2008

Killer Wail

Recently I received an email complaining that Orca lost information when doing a Save Transformed.

The sender was working on repackaging an msi that had to be finished with extreme time pressure and Orca was losing files when doing the Save Transformed As. He subsequently downloaded and installed InstEd, and in short order had his transformed msi saved with all data intact.

Orca's "Save As" and "Save Transformed As" issues are well documented, but no less dangerous for that.

From the "Special Considerations when editing Databases" page in the Orca Help:

Embedded Streams and Storages
When a database is saved using the Save As… or Save Transformed As… command, embedded binary streams (such as embedded cabinet files) are not saved to the new database unless they are part of a data row. Embedded sub-storages (nested install files) are never saved to the new database.

But the question remains, why doesn't it save the sub-storages? From my previous post, you might remember that the _Streams table is an abstraction of the subset of the underlying OLE structured storage that represents all the binary fields in the database.

The critical thing to understand is that while all binary fields in regular tables (such as the Icon table) are backed in the _Streams table, not all fields in the _Streams table are represented in regular tables.

When Orca does a Save As, or Save Transformed As, operation it creates a new database, and copies, table by table, row by row, the data into the target database. But, it doesn't copy all the _Streams rows that aren't represented in regular tables. Nor does it copy OLE structured storage entities that aren't represented by regular tables. Therefore, this data never makes it into the target database.

So, while the newly created database contains all the persistent tables, it can be missing data that is critical to the msi.

Note that critical data can be stored in the underlying OLE storage entities, that aren't in the _Streams table. For example, language transforms that are applied when an msi is installed, are stored as OLE structured storage entities, but are not represented in the _Streams table.

So why would you ever use the Save As feature in Orca? Well the only advantage is that it writes a fresh database, which means that the wasted space from many additions/deletions/edits gets trimmed out. But while the msi may be smaller, you have to be sure that all important data in the msi is represented in the persistent (regular) tables, otherwise it will get lost.

It seems to me that given the risks associated with this command, it would have been better named "Compact Database As", and not "Save As". And similarly, "Save Transformed As", should warn the user that it may lose information.

There is no way with Orca to perform a "Save Transformed As" without losing the information, unlike "Save As", where you can copy the source msi before editing and saving.

Could a tool compact the database and not lose the critical information? Well, yes, but only if it completely understood how the Windows Installer API uses the OLE structured storage. It could create a new database, copy all the tables, copy all the _Streams rows unique to the _Streams table, and then copy the remaining OLE structured storage entities, but the problem is that there is no documentation on how to tell which entities have already been copied when the tables were copied.

How does InstEd implement Save As and Save Transformed As? InstEd copies the underlying database file to the target, and then applies all changes that have been made since the underlying database file was last saved. This is equivalent to copying the underlying database to the target and then editing it. In this way, all the non-persistent table data is maintained, but you don't get a fresh database, with optimal space saving.

InstEd uses a similar mechanism for "Save Transformed As" to ensure that no data is lost.

If you wanted to achieve equivalent beahviour to Orca's Save As, you could export all the tables, and import them to a new and empty database.


c-l-b said...

Could a tool compact the database and not lose the critical information?
try this

Neil said...

Thanks c-l-b,

For other readers, the tool c-l-b links to is SolidWorks's Unfrag.exe, which is a generic OLE Compound File defrag app.

An alternative is EcoSqueeze.

I haven't tried these tools, but on the face of it they should work (if coded correctly). What they won't do is reduce fragmentation within tables, as the tables are black boxes to the OLE Compound File format.