Previously, Invantive SQL was incapable to process relational data from text files with a size larger than typically 1 GB. Starting BETA-release 22.1.56, a number of optimizations enable processing large files with text of over 1 GB in size in the following formats:
- CSV (comma separated values), using
csvtable
- JSON, using
jsontable
- NDJSON, using
ndjsontable
- XML document, using
xmltable
All four variants require using the passing file
-syntax instead of the passing
-syntax with a file retrieved in a previous step such as using read_file_contents_text@os
. The use of passing file
enables the large memory optimizations.
Both the exceltable
and htmltable
do not provide support for very large text files. However, data files in the associated file formats *.xlsx and *.html so far have never been seen in the wild by our team with a file size of over 1 GB.
Both consumption of CPU time internal memory needed can still be extremely large. For internal memory, please ensure presence of 8x the size of the file being processed. The actual memory consumption can vary wildly and even be very little, but for a first test make sure that sufficient memory is present.
Existing, but undiscovered barriers may surface. Please create a new topic when a problem occurs processing very large text files.
Rows with data can be returned streaming or batched, depending on the current architecture. Use a query with a low value for limit
to test whether the large files can be processed streaming or batched in a specific format.
SQL-statements for large text files
The following SQL-statements can be used as a starting point for the required syntax:
select *
from jsontable
( '[*]'
passing file 'c:\temp\large.json'
columns c varchar2 path 'key'
)
limit 10
select *
from ndjsontable
( passing file 'c:\temp\large.ndjson'
columns c varchar2 path 'key'
)
limit 10
select *
from csvtable
( passing file 'c:\temp\large.csv'
columns c varchar2 position 1
)
limit 10
select *
from xmltable
( '/root'
passing file 'c:\temp\large.xml'
columns c varchar2 path './name'
)
limit 10