Summary
Combinations of a data sets can be ranked according to criteria establishing a probability of a combination referring to the same realworld object. An Invantive SQL statement is presented to calculate the match probability based upon multiple criteria.
Choosing the Right Data Set on a Realworld Object or Person
A certain group of users of Exact Globe value the ease of online web use of Exact Online above the superior features of Exact Globe. A continuous stream of users flows from Globe to Online and many remain on the Online platform.
Despite being engineered within the same company, most engineering of Exact Online was done years later and alternative choices were made. Therefor, Exact Online has a companywide list of counter parties, named “Accounts”, whereas it seems that in Globe two sets exist, namely Customers and Suppliers, where the same number may refer to either a customer, a supplier or both.
The default conversion translates these numbers into companywide unique Exact Online account numbers. However, when loading additional years of financial history from XML Audit Files, it is necessary to translate the customer and supplier number into their respective Exact Online account number.
By matching on multiple match probability aspects, a mapping can be made that even across large data sets correlates the parties across Globe and Exact Online.
Approach
The probability is assessed for a party in Exact Online and Globe to be the same based upon a combination of the following criteria:
 Exact Online account name ends with the account name of Globe.
 Exact Online number ends with the number of Globe.
 Exact Online behavior as a customer or supplier matches the group in Globe.
The more of these criteria are matched, the higher the probability of being an identical company.
Probability are multiplied in this scenario, but nonlinear approaches or correlated weights can also easily be introduced.
Query to Determine Probability
The following query determines for each combination between the Globe and Exact Online account whether they refer to the same counter party.
A value of match probability is assigned to each criteria. The reverse probabilities (so nonmatch probability) are all multiplied and then reversed again. A combination scoring on all criteria will therefore return a value closer to 100% than another combination.
create or replace table mapcustsupcandidates@inmemorystorage
as
select distinct
cpy.custsupid
, act.code act_code
, 1

( 
 Chance of wrong account name.

( 1

case
when act.name like '%'  cpy.custsupname
then 0.4
else 0
end
)
*

 Chance of wrong account code.

( 1

case
when act.code like '%'  trim(cpy.custsupid)
then 0.6
when trim(cpy.custsupid) like '%'  act.code
then 0.6
else 0
end
)
*

 Chance of wrong account properties.

( 1

case
when trim(cpy.custsupid) like 'D%'
then case
when act.status = 'C'
then 0.4
else 0
end
when trim(cpy.custsupid) like 'C%'
then case
when act.issupplier
then 0.4
else 0
end
else 0.2
end
)
)
chance_right
from CounterParties@xaf
join exactonlinerest..accounts@eol act

 When a Creditor and Debtors with the same number after
 the initial C or D, they
 are split into two different numbers in Exact Online.
 Exact Online conversion then adds a '10' to one of them
 but maintains the original number in the name of the account.

on act.name like '%'  cpy.custsupname
and case
when cpy.custsupid like 'D%'
then case
when act.status = 'C'
then true
else false
end
when cpy.custsupid like 'C%'
then act.issupplier
else true
end
= true
List Most Probable Matches
A simple query reports the most likely match:
select custsupid
, first(act_code) accountcode
from ( select custsupid
, act_code
, chance_right
from mapcustsupcandidates@inmemorystorage
order
by custsupid
, chance_right desc
)
group
by custsupid
Please note that the ‘first()’ group function with an ‘order by’ could also have been replaced by using the Invantive SQL ‘union distinct’ set operator since there are at most two groups of counter parties.