Processing very large text files in JSON, NDJSON, CSV and XML format in SQL

Go to Dutch version

Previously, Invantive SQL was incapable to process relational data from text files with a size larger than typically 1 GB. Starting BETA-release 22.1.56, a number of optimizations enable processing large files with text of over 1 GB in size in the following formats:

  • CSV (comma separated values), using csvtable
  • JSON, using jsontable
  • NDJSON, using ndjsontable
  • XML document, using xmltable

All four variants require using the passing file-syntax instead of the passing-syntax with a file retrieved in a previous step such as using read_file_contents_text@os. The use of passing file enables the large memory optimizations.

Both the exceltable and htmltable do not provide support for very large text files. However, data files in the associated file formats *.xlsx and *.html so far have never been seen in the wild by our team with a file size of over 1 GB.

Both consumption of CPU time internal memory needed can still be extremely large. For internal memory, please ensure presence of 8x the size of the file being processed. The actual memory consumption can vary wildly and even be very little, but for a first test make sure that sufficient memory is present.

Existing, but undiscovered barriers may surface. Please create a new topic when a problem occurs processing very large text files.

Rows with data can be returned streaming or batched, depending on the current architecture. Use a query with a low value for limit to test whether the large files can be processed streaming or batched in a specific format.

SQL-statements for large text files

The following SQL-statements can be used as a starting point for the required syntax:

select *
from   jsontable
       ( '[*]' 
         passing file 'c:\temp\large.json'
         columns c varchar2 path 'key'
       )
limit 10

select *
from   ndjsontable
       ( passing file 'c:\temp\large.ndjson'
         columns c varchar2 path 'key'
       )
limit 10

select *
from   csvtable
       (  passing file 'c:\temp\large.csv'
          columns c varchar2 position 1
       )
limit 10


select *
from   xmltable
       ( '/root' 
         passing file 'c:\temp\large.xml'
         columns c varchar2 path './name'
       )
limit 10