Xmltable method broken with certain type of content inside a node

We have a XML-file containing at the top of the XML, inside a node, some content that makes xmltable to fail to parse the file (Invantive Query Tool v20.2.205).

In our case the simplified file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02 camt.054.001.02.xsd">
    <BkToCstmrDbtCdtNtfctn>
        <GrpHdr>
            <MsgId>OIA20220402030407</MsgId>
            <CreDtTm>2022-04-02T03:04:07</CreDtTm>
            <MsgPgntn>
                <PgNb>1</PgNb>
                <LastPgInd>true</LastPgInd>
            </MsgPgntn>
        </GrpHdr>
    </BkToCstmrDbtCdtNtfctn>
</Document>

Here is the SQL-statement:

select *
from   files@Os('C:\', '*.xml', false) fle
join   read_file_text@Os(fle.file_path) rfe
join   xmltable
       ( '/Document/BkToCstmrDbtCdtNtfctn/GrpHdr'
         passing rfe.file_contents
         columns identification varchar2 path 'MsgId'
       ) xml

With the XML above, the statement will return no result.

But expected is: OIA20220402030407

Workaround

The outcome is correct when we remove the following content from the <Document> node

xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02 camt.054.001.02.xsd

Result:

The XML in that case is:

<?xml version="1.0" encoding="UTF-8"?>
<Document>
    <BkToCstmrDbtCdtNtfctn>
        <GrpHdr>
            <MsgId>OIA20220402030407</MsgId>
            <CreDtTm>2022-04-02T03:04:07</CreDtTm>
            <MsgPgntn>
                <PgNb>1</PgNb>
                <LastPgInd>true</LastPgInd>
            </MsgPgntn>
        </GrpHdr>
    </BkToCstmrDbtCdtNtfctn>
</Document>

We have tried to remove the challenging content that makes xmltable miss the content by using regexp_replace, but the regexp_replace on the challenging content:

xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02 camt.054.001.02.xsd

returns a also null result.

The use of XML-namespaces is by intent not yet supported in xmltable. XML namespace are a quite complex subject; in my personal opinion XML-namespaces despite it’s power is one the reasons JSON had a chance to take over the market for new innovations.

The problem can be reduced to the following query, which yields no results:

select *
from   xmltable
       ( '/Document/BkToCstmrDbtCdtNtfctn/GrpHdr'
         passing '<?xml version="1.0" encoding="UTF-8"?>'
                 || '<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02 camt.054.001.02.xsd">'
                 || '<BkToCstmrDbtCdtNtfctn><GrpHdr><MsgId>OIA20220402030407</MsgId></GrpHdr></BkToCstmrDbtCdtNtfctn></Document>'
         columns id varchar2 path 'MsgId'
       )

The variant without namespaces works:

select *
from   xmltable
       ( '/Document/BkToCstmrDbtCdtNtfctn/GrpHdr'
         passing '<?xml version="1.0" encoding="UTF-8"?>'
                 || '<Document>'
                 || '<BkToCstmrDbtCdtNtfctn><GrpHdr><MsgId>OIA20220402030407</MsgId></GrpHdr></BkToCstmrDbtCdtNtfctn></Document>'
         columns id varchar2 path 'MsgId'
       )

As a workaround, you can use regexp_replace or replace as shown below (replace is recommended when feasible since it is easier to use and faster).

Using regexp_replace:

select *
from   xmltable
       ( '/Document/BkToCstmrDbtCdtNtfctn/GrpHdr'
         passing regexp_replace
                 ( '<?xml version="1.0" encoding="UTF-8"?>'
                   || '<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02 camt.054.001.02.xsd">'
                   || '<BkToCstmrDbtCdtNtfctn><GrpHdr><MsgId>OIA20220402030407</MsgId></GrpHdr></BkToCstmrDbtCdtNtfctn></Document>'
                 , ' xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02 camt.054.001.02.xsd"'
                 , ''
                 )
         columns id varchar2 path 'MsgId'
       )

Using replace:

select *
from   xmltable
       ( '/Document/BkToCstmrDbtCdtNtfctn/GrpHdr'
         passing replace
                 ( '<?xml version="1.0" encoding="UTF-8"?>'
                   || '<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02 camt.054.001.02.xsd">'
                   || '<BkToCstmrDbtCdtNtfctn><GrpHdr><MsgId>OIA20220402030407</MsgId></GrpHdr></BkToCstmrDbtCdtNtfctn></Document>'
                 , ' xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02 camt.054.001.02.xsd"'
                 , ''
                 )
         columns id varchar2 path 'MsgId'
       )

Finally, it is also possible to work to a certain degree with namespaces. A simple solution is:

select *
from   xmltable
       ( '/*["Document"=local-name()]/*["BkToCstmrDbtCdtNtfctn"=local-name()]/*["GrpHdr"=local-name()]'
         passing '<?xml version="1.0" encoding="UTF-8"?>'
                   || '<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02 camt.054.001.02.xsd">'
                   || '<BkToCstmrDbtCdtNtfctn><GrpHdr><MsgId>OIA20220402030407</MsgId></GrpHdr></BkToCstmrDbtCdtNtfctn></Document>'
         columns id varchar2 path '*["MsgId"=local-name()]'
       )

with result:

In general, we recommend to try to avoid XML-namespaces as much as possible to reduce complexity and reduce required skills. The last sample just works around it. On the Internet, a search for “XML namespaces” and “XPath” may provide alternative approaches.

Thank you very much for your valuable help. Unfortunately, we can’t avoid namespaces because we don’t create the files we were using, we just get them.

We were finally able to solve our problem by using two replace functions nested.

Indeed, your method worked fine as long as we did not replace the xml part with rfe.file_contents. However, in order for our code to work on several files without having to modify it each time, we needed to use rfe.file_contents.

After several tries we succeeded with the following code:

Great! Yes, you also apply replace to a column with a CLOB (large text).