Wednesday, September 2, 2009

Speed Demons

There is a whole genre of utilities written as quickly as possible, to satisfy a short term need, never to see the light of day past next week. But somehow they persist eternally as the worst example of the author's code, simply because they are the only thing that fills that need, and no one has the motivation to re-write them.

I suspect that Orca was probably written as such a utility. Somehow it ended up in the SDK, and now is the tool de rigueur for editing msis when you just want to make a quick change, or find some info quickly.

One unfortunate decision when it was written was to use the listview in non-virtualised mode. A virtualised listview basically means the listview is responsible for layout of the items, but not for managing the memory of the data in the listview. The big advantage is that items don't need to be added to the listview. When items are added to the listview in a non-virtualised mode, they are added one by one, and the strings for each cell are copied (i.e. memory is allocated off the heap, the data is copied, and likely the original string is de-allocated by the listview owner). This happens field by field, with the scrollbar details being updated after each row.

If you open a table in Orca that has thousands of rows, you can often see the thumb tab in the vertical scroll bar visibly shrink as the table is loaded into the listview. This is an indication that the rows are being added one by one.

In a virtualised listview, the rows and fields aren't "added". The listview is simply told: "there are 11654 rows and 9 columns".

At that point the listview adjusts the scrollbar, and calls back to the parent window to get the strings for rendering into the visible cells. The parent doesn't have to copy the strings for the listview to render them, it can simply pass a pointer to it's own copy of the string, saving on heap allocation and de-allocation.

So, why am I blogging about this boring topic? While recently testing some new field editors in InstEd Plus, I pulled out my largest handy msi, and ran some speed tests in Orca and InstEd. Orca loads it very quickly, which is nice. My guess is that is simply enumerates the tables and puts them into the listview on the left.

As you scroll down the tables on the left, the memory usage increases. Presumably Orca is loading the table data from the msi to put into the listview. As I clicked on the File table, there was a 12 second wait for it to load approximately 40000 rows. However the rub comes when clicking on another table and back onto the File table. It's another 12 second wait. So, in my test msi, moving from the File table to the FeatureComponents table and back again is a 17 second round trip. Every time you do it. That can take it's toll on the "quick" editing that you want to do.

In comparison, InstEd completely loads the msi into memory ready for near instantaneous rendering of the tables in less than 3.5 seconds (over a network). It calculates the entire range of relationships in 6 seconds. The soon to be available InstEd Plus field editors can generate a smart editor complete with 80000 file table entries and 22000 registry table entries available for auto-completion in about 1 second.

Don't get me wrong. I understand the choice to use a non-virtualised listview when Orca was written. It was probably never intended to become such a popular tool for editing databases. But it was an unfortunate decision, one that doesn't take much work to fix, and probably needs fixing given Orca's popularity over the years.

Or just use this: InstEd.

Thursday, April 2, 2009

New InstEd Plus video

I have added a new video to the InstEd Plus page showing the source tree management options in the latest version, such as extracting files from the msi or merge module, and adding files to the source tree.

Monday, March 30, 2009

Burgling the _Storages table.

I was recently asked why InstEd doesn't allow the user to save binary fields from the _Storages table. The simple answer is that the Windows Installer API doesn't support it.

The long answer is that InstEd does let you extract them, you just have to hoodwink it.

As some background, the _Storages table is a representation, in Windows Installer table form, of the OLE structured storage list of "storages". Some of the stuff that lives as "storages" are transforms and cabinets in patches, and language transforms in msi's.

For example, this article discusses how instmsi.msi (contained in instmsiw.exe) manages to install using the appropriate language gui. It embeds as "storages" transforms that change the gui text strings, and names them with the appropriate language id.

(As a side note, while the article says that Windows Installer can automatically detect, extract, and apply the relevant language transform (from version 2 forward), the instmsi(a/w).exe files that redistribute Windows Installer actually contain another stub exe called instmsi.exe that launches the instmsi.msi. This stub exe calls functions such as StgOpenStorage and GetUserDefaultLangID. Which makes me wonder if it is only with this stub exe that the language transforms are automatically applied, or whether it is only because of the bootstrapping that this functionality is required in the stub exe. i.e. Lanugage transformations may be automatically handled by Windows Installer for all subsequent msi's. Does anyone know the answer?)

The MSDN article on the _Storages table explicitly declares: Data cannot be read from the _Storages table.

Basically, it's an usual table, and InstEd doesn't have the logic yet to treat it differently from the other tables. So it can only be used as a way to list the storages.

However, becuase most of the useful data in a patch file is stored as "storages" InstEd provides the option when opening a patch file (from the File->Open menu), to extract all the storages.

So, by renaming an msi to give it a .msp extension and then opening it in InstEd, you will be given the option to extract all the "storages" from the files. In this way you can extract all the storages.

It's not a very pretty way to do it, but it works. In the future, this table may receive some work to make it easier to work with.

msidb.exe also allows you to extract/add/remove storages. And then are any number of OLE Structured Storage editors available on the web.

Microsoft's Heath Stewart has also written some blogs and tools related to this.

Thursday, March 5, 2009

InstEd Plus: Just what you need!

I am pleased to finally announce the availability of InstEd Plus.

InstEd Plus is a plugin module to InstEd.

InstEd continues to be the most effective table editor available and will remain free to download and use in any (legal) scenario. However it is acknowledged that some tasks are just time consuming when dealing with tables alone (e.g. adding files to an installation, editing dialogs, etc).

Rather than pollute InstEd with a whole raft of features that may or may not get used, a plugin architecture is being developed so that InstEd remains fast to load, fast to run, and has minimal runtime depencies so that you can "just grab it and go" no matter where you are.

So while InstEd remains the excellent FREE table editor that it has always been, InstEd Plus is the first plugin module available.

The motivation behind InstEd Plus is to provide fast, and yet completely customisable, methods of doing the "time consuming" tasks.

Its File Manager feature (the first of hopefully many) provides an intuitive yet advanced view of the Directory table and the installation's files. It allows drag drop addition of files to the installation, and allows easy CAB manipulation.

You can see it in action here:

InstEd Plus is the first foray into plugins, and will hopefully lead to an SDK allowing 3rd party plugins. One of the advantages of plugins is that their runtime dependencies don't impact InstEd's. So while InstEd Plus takes advantage of some of the nice features available with .NET's WPF to provide a first class interface, and therefore has a .NET Framework dependency, InstEd will still run fine without it.

InstEd Plus will require a license to run it for a nominal fee. However please note that InstEd will continue to be actively developed, and will continue to remain free.

I have a whole raft of features in mind to be added to both InstEd, and InstEd Plus. Very shortly, I will be releasing a new version of InstEd with some bug fixes and
hopefully a new feature or two.

So please, have a look at InstEd Plus, grab a copy if you think it valuable, and watch this space for further development of both tools.

Friday, February 27, 2009

Searching the easy, but hard, way

The Find dialog.

Accessible via CTRL+F.

Shrouded in mystery (because someone forgot to give it a title).

Contains a checkbox that may have no meaning for some users:
Use Regular Expressions

For many people Regular Expressions may simply refer to some form of toilet humour. However for others it is a very powerful tool in the search arsenal.

Regular Expressions are a method of describing very powerful pattern matching algorithms using text. I will not attempt to give a tutorial in this blog. Rather I hope to encourage you to investigate further so that you might be able to utilise the power of regular expressions in order to make your packaging more productive.

As always (well, often) the wikipedia article on regular expressions is a good place to start.

In InstEd, when you check the Use Regular Expressions checkbox, the Find text is interpreted not as a literal string to find, but rather a "regular expression" that describes a pattern to find.

In the simple case, the pattern can be the literal text. For example, searching on "InstEd" will find the same results regardless of whether regular expressions are used. This is because the string "InstEd" is a regular expression for the literal text "InstEd". Confused?

Perhaps a more complicated case will be useful. This string "^InstEd$" is a regular expression that will only find entries where no other text than "InstEd" is in the field. Specifically, "^" matches the start of the field and "$" matches the end of the field. So, the regular expression indicates that it will only match:

Start of field, followed by "InstEd", followed by end of field.

Suppose you want to find all the fields that contain the word "InstEd" but exclude terms such as "InstEdIt" or "TryThisInstEd". You could check the Match whole word checkbox. But under the hood, that checkbox builds a regular expression to do the heavy lifting. The regular expression would be "(^|\s)InstEd($|\s)". (Actually it's a bit more complicated than that but the detail is not necessary here.)

Now you will recognise the ^ and $ characters from before. They match the start and end of the field. The | character has added an alternative, an "OR" if you like. And the \s is shorthand for whitespace (spaces, tabs etc). So the regular expression now indicates:

Match the start of the field OR whitespace, followed by "InstEd", followed by the end of the field OR whitespace.

In other words, only match when InstEd is the complete word, excluding things such as "InstEdIt" or "TryThisInstEd".

Note that if you explicitly wrote "(^|\s)InstEd($|\s)" AND checked the Match whole word checkbox, you wouldn't match anything, because under the hood the search regular expression would become "(^|\s)(^|\s)InstEd($|\s)($|\s)" and it would never find two "start of the field"s
(but it might possibly find two spaces before and after).

Similarly if you wrote "(^|\s)InstEd($|\s)" and forgot to check the Use regular expressions checkbox, you would be lucky to find a field that contains such an arcane string.

This has just scratched the surface of the power of regular expressions, but I warn you, they can become awfully complicated.

For example, the regular expression InstEd uses to find references to properties, components, files etc in Formatted fields is:

That is not prettty, but trying to write code to find such fields would be way more complicated than using that regular expression.

(On a technical note, it's a little more permissive than required, but works fine.)

Internally, InstEd uses the boost regular expression engine, and the syntax is described here.

Some things that regular expressions are useful for finding:
  • File.File entries that don't have short filenames: "[^\|]*" (remember to use the Table and Column filter dropdowns in the Find dialog).
  • Strings that span other strings:

    "Microsoft.*97": would find references to excel, word, outlook etc

    "((SOFTWARE\\Classes\\CLSID)|(CLSID))\\{MyGuid}": would find references to {MyGuid} in reg keys only in "SOFTWARE\\Classes\\CLSID" (HKLM) or "CLSID" (HKCR)
If you have other useful regular expressions, please post them here as a comment.