Probability Matching Exact Globe Customers and Suppliers in Online

Summary

Combinations of a data sets can be ranked according to criteria establishing a probability of a combination referring to the same real-world object. An Invantive SQL statement is presented to calculate the match probability based upon multiple criteria.

Choosing the Right Data Set on a Real-world Object or Person

A certain group of users of Exact Globe value the ease of online web use of Exact Online above the superior features of Exact Globe. A continuous stream of users flows from Globe to Online and many remain on the Online platform.

Despite being engineered within the same company, most engineering of Exact Online was done years later and alternative choices were made. Therefor, Exact Online has a company-wide list of counter parties, named “Accounts”, whereas it seems that in Globe two sets exist, namely Customers and Suppliers, where the same number may refer to either a customer, a supplier or both.

The default conversion translates these numbers into company-wide unique Exact Online account numbers. However, when loading additional years of financial history from XML Audit Files, it is necessary to translate the customer and supplier number into their respective Exact Online account number.

By matching on multiple match probability aspects, a mapping can be made that even across large data sets correlates the parties across Globe and Exact Online.

Approach

The probability is assessed for a party in Exact Online and Globe to be the same based upon a combination of the following criteria:

  • Exact Online account name ends with the account name of Globe.
  • Exact Online number ends with the number of Globe.
  • Exact Online behavior as a customer or supplier matches the group in Globe.

The more of these criteria are matched, the higher the probability of being an identical company.

Probability are multiplied in this scenario, but non-linear approaches or correlated weights can also easily be introduced.

Query to Determine Probability

The following query determines for each combination between the Globe and Exact Online account whether they refer to the same counter party.

A value of match probability is assigned to each criteria. The reverse probabilities (so non-match probability) are all multiplied and then reversed again. A combination scoring on all criteria will therefore return a value closer to 100% than another combination.

create or replace table mapcustsupcandidates@inmemorystorage
as
select distinct
       cpy.custsupid
,      act.code act_code
,      1
       -
       ( --
         -- Chance of wrong account name.
         --
         ( 1
           -
           case
           when act.name like '%' || cpy.custsupname
           then 0.4
           else 0
           end
         )
         *
         --
         -- Chance of wrong account code.
         --
         ( 1
           -
           case
           when act.code like '%' || trim(cpy.custsupid)
           then 0.6
           when trim(cpy.custsupid) like '%' || act.code
           then 0.6
           else 0
           end
         )
         *
         --
         -- Chance of wrong account properties.
         --
         ( 1
           -
           case
           when trim(cpy.custsupid) like 'D%'
           then case
                when act.status = 'C'
                then 0.4
                else 0
                end
           when trim(cpy.custsupid) like 'C%'
           then case
                when act.issupplier
                then 0.4
                else 0
                end
           else 0.2
           end
         )
       )
       chance_right
from   CounterParties@xaf
join   exactonlinerest..accounts@eol act
--
-- When a Creditor and Debtors with the same number after
-- the initial C or D, they
-- are split into two different numbers in Exact Online.
-- Exact Online conversion then adds a '10' to one of them
-- but maintains the original number in the name of the account.
--
on     act.name like '%' || cpy.custsupname
and    case
       when cpy.custsupid like 'D%'
       then case
            when act.status = 'C'
            then true
            else false
            end
       when cpy.custsupid like 'C%'
       then act.issupplier
       else true
       end
       = true

List Most Probable Matches

A simple query reports the most likely match:

select custsupid
,      first(act_code) accountcode
from   ( select custsupid
         ,      act_code
         ,      chance_right
         from   mapcustsupcandidates@inmemorystorage
         order
         by     custsupid
         ,      chance_right desc
       )
group
by     custsupid

Please note that the ‘first()’ group function with an ‘order by’ could also have been replaced by using the Invantive SQL ‘union distinct’ set operator since there are at most two groups of counter parties.