Thursday, May 29, 2008

Is Null Null or Not Null

One of the people on the AppDeploy forums recently asked for the option of having null fields be represented in the tables by an empty string instead of the string "<null>".

Another person responded by saying that that would be bad, because a null string is not the same as an empty string.

Well, usually that is correct, and can be quite an important distinction. In most databases, a null field is certainly different from an empty string. However in the case of Windows Installer databases, after storing an empty string in the database, calling MsiRecordIsNull on that field will return true. So a non-null string (empty) can be a null string, at least for Windows Installer.

Of course, I have already examined the issue of null integers being represented by a special number. Again, a non-null integer can be a null integer.

But, by the time we get to binary fields (the last remaining fundamental type), we find suddenly that null binary fields really are null, and are distinct from every other binary value.

That is, the only way to store a null binary field is to pass NULL to the MsiRecordSetStream function. Passing a path to an empty file results in a non-null binary field of 0 bytes.

Actually, it's not strictly true that passing NULL to the MsiRecordSetStream function is the only way to set a binary field to null. You can actually call MsiRecordSetInteger with the MSI_NULL_INTEGER value. This will happily set a binary field to null.

So null is not null but it is null.

As for the original request to use an empty string to display null values, I added the option to the next release (accessible only in the registry settings at the moment). In fact you can use any string you like to represent null now.

Unfortunately, having an empty string to represent null values makes scanning the rows a bit difficult (there is no grid to help out), and also is indistinguishable from a string of white space. Which is rare but probably not quite as rare as "<null>". Back to the problem of special values.

Friday, May 23, 2008

Orca east bugs, again

Just to provide an update to the Vista SDK Orca bug, here is the Windows Installer Team Blog's description of the bug:

Updates that can be found in the Windows SDK for Windows Server 2008:
  • Orca crashes when a transform is generated and a row is deleted from the current table Orca crashes if the user attempts to generate a new transform and deletes a row from an existing table. (Using Orca, select New Transform, then delete a row from a table).

Of course it is more serious than that, because it can't even view a transform that deletes a row.

Just to be clear, the bug was not introduced as a special feature in the 2008 SDK version. It was fixed in that version.

Wednesday, May 14, 2008

Orca Eats Bugs

It's good to know that Orca is at least getting bug fixes.

Orca 4.0.5299.0 which comes with the Vista Platform SDK suffered a problem where deleting certain rows in a transform would cause it to crash.

The repeatable example (and it may not be confined to this) is a transform that deleted the only row in a table. You can reproduce it by starting Orca and opening "c:\program files\orca\orca.dat" (or equivalent). This is Orca's template msi. Perform a File->Save As to save the msi somewhere (not over orca.dat). Add a dummy row to the LockPermissions table (for example) and then File->Save.

Then select Transforms->New Transform, and delete the LockPermissions row. Bang, Orca beaches itself.

This came to my attention because Orca would crash when clicking the LockPermissions table when viewing a transform (built by another tool) that deleted the only row in the table. So it's not just that Orca can't generate these transforms, it can't even view them.

But, the good news is that the Orca version released in the Windows Server 2008 SDK seems to have fixed this problem. So if you must use Orca, definitely get the latest.

Of course, you will be far more productive with InstEd, at no extra cost (and a much smaller download).

Thursday, May 1, 2008

OpenMP utility code

The relationship building code in InstEd was running a bit slow on my very large test msi. It was taking 22 minutes to build the table relationships.

This was a bit long, so having noticed that Visual C++ 2005's compiler came with OpenMP support, I thought I would have a go at utilising the dual cores on my dev machine to speed this up.

The first task at hand was to determine whether the loops involved were suitable for breaking down into parallel partitions. Some were, and some weren't. My starting point was this loop :

//pseudocode
foreach( row in refing_table.rows() )
{

// get rows that row references
// store the relationships
}


This loop was suitable because except for storage of the relationships, the majority of the work didn't write any data anywhere, and was fairly computationally expensive.

Ideally, the OpenMP version would look like this:
#pragma omp parallel
{
#pragma omp for
//pseudocode
for( Rows::iterator row = refing_table.rows().begin();
row != refing_table.rows().end();
++
row )
{

// get rows that row references

#pragma critical
{
// store the relationships
}
}
}


However the #pragma omp for directive has some major limitations, primarily that the iteration variable must be an integer type. If Rows were a type that had random access iterators then the loop could be changed to use integers for iteration. However, Rows is actually a std::list. So advancing the iterator from begin on each iteration could be expensive.

What I needed was a function that would split the rows() list into partitions suitable for each thread working on the parallel block. And it would be great if the calculation of the partition suitable for a given thread could happen inside the parallel block, thereby further utilising the OpenMP benefits.

(Naturally, splitting the rows() list would happen as an abstraction through iterator ranges, not by actually copying the list items.)

This would be a common problem for any C++ programmers wanting to utilise OpenMP in loops that iterated over STL containers.

So, in an effort to make the solution generic and easy to utilise, I wrote some code for just such a purpose. It in turn utilises the excellent Boost.Range library and concepts, which allow the code to work on any container that models the Range concept, including native C style arrays, and STL containers.

The first task was to write a function to evenly split a given range into partitions. After my first ugly, clumsy, and inefficient effort, Nathanael Rensen came up with an excellent algorithm.

///////////////////////////////////////////////
//
// The returned sub range is such that if this function is called
// for each partition [0,partition_count), the entire "range"
// will be covered by all returned sub ranges, and distributed
// amongst the partitions in the most even size distribution possible.
//
// The size parameter must specify the size of the range.
// This overload, accepting a size, is preferable where
// range.size() may be expensive.
//
template<typename Range>
inline
boost::iterator_range< typename Range::iterator > split_range(
const
Range& range,
int
partition_count,
int
partition,
int
size )
{

Range::iterator begin = boost::begin( range );
Range::iterator end = boost::end( range );

if
( partition_count > 1 )
{

int
remainder = size % partition_count;
int
quotient = size / partition_count;

if
( partition < remainder )
{

std::advance( begin, partition * ( 1 + quotient ) );
end = begin;
std::advance( end, quotient + 1);
}

else

{

std::advance( begin, remainder + partition * quotient );
end = begin;
std::advance( end, quotient );
}
}


return
boost::make_iterator_range( begin, end );
}


///////////////////////////////////////////////
//
// The returned sub range is such that if this function is called
// for each partition [0,partition_count), the entire "range"
// will be covered by all returned sub ranges, and distributed
// amongst the partitions in the most even size distribution possible.
//
// Use this overload where range.size() is not expensive
// (i.e. Range::iterator models random_access_iterator )
//
template<typename Range>
inline
boost::iterator_range< typename Range::iterator > split_range(
const
Range& range,
int
partition_count,
int
partition )
{


return
split_range( range, partition_count, partition, range.size() );
}


Having got the partitioning out of the way, the next part was allowing it to be easily used from within an omp parallel block. This turned out to be surprisingly easy.



///////////////////////////////////////////////
//
// This function should be called within a #pragma omp parallel
// block, and returns a sub_range of the input range.
//
// The returned sub range is such that if this function is called
// by each thread in the parallel thread group, the entirety of "range"
// will be covered by all threads, and distributed amongst the threads
// in the most even size distribution possible.
//
// The size parameter must specify the size of the range.
// This overload, accepting a size, is preferable where
// range.size() may be expensive.
//
template<typename Range >
inline
boost::iterator_range< typename Range::iterator >
split_range_openmp(
const Range& range,
int size )
{

int
thread_count = omp_get_num_threads();
int
thread = omp_get_thread_num();


return
split_range( range, thread_count, thread, size );
}


///////////////////////////////////////////////
//
// This function should be called within a #pragma omp parallel
// block, and returns a sub_range of the input range.
//
// The returned sub range is such that if this function is called
// by each thread in the parallel thread group, the entirety of "range"
// will be covered by all threads, and distributed amongst the threads
// in the most even size() distribution possible.
//
// Use this overload where range.size() is not expensive
// (i.e. Range::iterator models random_access_iterator )
//
template<typename Range >
inline
boost::iterator_range< typename Range::iterator >
split_range_openmp( const Range& range )
{

return
split_range_openmp( range, range.size() );
}




Each thread operating on an OpenMP parallel block gets a thread number from 0 to thread_count - 1, which is perfect for the partitioning.

So, the usage is now as simple as:

#pragma omp parallel
{
boost::iterator_range< Rows::iterator > range
=
split_range_openmp(
refing_table.rows(),
refing_table.rows().size() );


for
( Rows::iterator row = boost::begin( range )
row != boost::end( range );
++
row )
{

// get rows that row references

#pragma critical
{
// store the relationships
}
}
}




And voila, the rows() list is kindly split into equal sections for each OpenMP thread to work on.

Well almost. Because the utility functions don't modify the input range, they accept it as a const reference. However, in the case of containers, this results in const_iterators being returned, which is incompatible with the stated return type using Range::iterator.

In order to work around this, you could either add non const versions of each of the utility functions, add a template parameter to each for the desired iterator type, or utilise make_iterator_range with non const iterators.

boost::iterator_range< Rows::iterator > range
=
split_range_openmp( boost::make_iterator_range(
refing_table->rows().begin(),
refing_table->rows().end() ),
refing_table->rows().size() );


But there it is. Some utility functions to make it easy to utilise multiple threads on loops with non-integer iterators.

The full header file can be found here: split_range.hpp.

Performance
And what was the result of utilising OpenMP for this loop? Well, I actually made the biggest improvement by comparing the binary data of each field for relationships, instead of comparing their display strings which were built each time. This dropped the time from 22 minutes to 6:45. Adding the OpenMP loop dropped it again to 5:05, but raised the total processor time by about 2 minutes. This was probably due in part to the overhead of splitting the range, and in part to the overhead of the OpenMP code.

But, it's worth noting that there can often be big performance increases found before resorting to multithreading.

Update
I subsequently found this paper which discusses OpenMP and non-conformant (non integer iterator)loops, and provides some alternative solutions.

Try this new InstEd instead

The next version of InstEd has been released.

Roll over to the InstEd web site to see the new features and download it.

I am sure you will find it the fastest Windows Installer Editor.