Heap Space Issue in POI while XLS conversion to CSV

RSReporter - an ELTL (Extract Load Transform Load) tool is designed to populate data from various source file types into a database. Validations are run here to give accurate data.

 

The part where a lot of research was required is the population of data from different types of files into the db with minimum processing time. The ancient method of reading a file line by line and entering into the db would take a lot of time for a file of huge volume and sometimes a error of heap space could occur

 

So, the better if not the best solution was to convert any file type to csv format and then run a simple command in sql for importing the csv file. Command available at

http://dev.mysql.com/doc/refman/5.1/en/load-data.html

 

Among the various type the most crucial were xls, xlsx. It seems very obvious to use POI for xls and xlsx. Yes, we did try it. We used the file XLS2CSVmra.java - http://poi.apache.org/apidocs/org/apache/poi/hssf/eventusermodel/examples/XLS2CSVmra.html

OR

http://jcs.mobile-utopia.com/jcs/10579_XLS2CSVmra.java

which is an xls to csv processor.

 

The problem with the above was that we encountered heap space error for large files.

www.artofsolving.com came to our rescue.

 

All we needed to do is run the following command, in a command prompt, under the open office folder in program files:

"soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard "

Which creates an OpenOffice service on Windows. In Linux, change the port to 2002 and there you go. Not only xls but xlsx is converted to csv format...

 

N.B.  You need to have installed the latest version of Open Office. We used version 3.1 as the lower versions did not work.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Copy the characters (respecting upper/lower case) from the image.