Invantive limits on WHERE IN () clause for performance optimizations

We are building a very specific script. In this script we have the following statement:

select * 
from   transactionlines@sqlserver
where id IN (select id from temp@inmemorystorage)

We don’t want to use a JOIN stmt because transactionlines@sqlserver contains more than 3 million rows, and it will trigger a full load of the SQL server table then make the join at Invantive’s side.
temp@inmemorystorage should be rather small, may be less than 20K records.

We found that if temp@inmemorystorage as small set records (<150) the above statement works fine.

Now if temp@inmemorystorage contains more results e.g. 14K results,
we found that we have a truncated result: it seems that Invantive does not forwards all the results of the IN clause to the SQL-container.

and if we execute the following:

select *
from   transactionlines@sqlserver
where  id IN (select id from temp@inmemorystorage LIMIT 100000)

then Invantive forward all the results in the IN clause and we get the results as expected.

Is the above approach (add a LIMIT clause to force forward all results to sql server) a good one ?

The statement should work fine with say 14K rows in temp@inmemorystorage.

An efficient and better way is probably to switch join strategy using the join_set hint (see Invantive UniversalSQL Grammar 23.0). join_set is used by default when the left-hand side has few entries (such as 5000) to be joined upon.

A join_set uses an IN for the respective platform instead of a full join, and is typically used for instance when joining a last of open sales orders with it’s associated lines. It can be hundred times faster when there are many sales orders, but only 1% is open.

An example:

select /*+ join_set(t1, id, 20000) */ *
from   temp@inmemorystorage t1
join   transactionlines@sqlserver t1
on     t2.id = t1.id

The documentation on Invantive UniversalSQL grammar describes the functionality of join_set.

Some other use cases are available at:

Please check in advance that both id columns have the same datatype. Otherwise an implicit datatype conversion will occur, triggering a very expensive join.