The Word Is POI

POI stands for Poor Obfuscation Implementation. The POI subproject HSSF (Horrible Spread Sheet Format) can access MS Excel files and HSD (Horrible Document Format), you guessed it, can access Word documents. I am working on a project and I am researching what POI can do for me given a Word document. I didn’t find a lot of example online so I deceived to look into the POI source code. Using the WordDocument class from the org.apache.poi.hdf.extractor package found in the poi-scratchpad 2.5.1.jar file I was able to write out the contents of a Word document as a plain text.

WordDocument doc = new WordDocument("word.doc");
Writer writer = new BufferedWriter(
     new FileWriter("text.txt"));

You can also print all the content of the Word document as a XSL-FO file, except that you have to fix a null pointer exception. The code for this is horrible to read with little or no comments. In the end, what I want to do is generate custom XML file given a Word document and I was able to hack what I wanted to do but I had to refactor the hell out of this code.

5 Responses to “The Word Is POI”

Leave a Reply