Nov 16 2005

The Word Is POI

POI stands for Poor Obfuscation Implementation. The POI subproject HSSF (Horrible Spread Sheet Format) can access MS Excel files and HSD (Horrible Document Format), you guessed it, can access Word documents. I am working on a project and I am researching what POI can do for me given a Word document. I didn’t find a lot of example online so I deceived to look into the POI source code. Using the WordDocument class from the org.apache.poi.hdf.extractor package found in the poi-scratchpad 2.5.1.jar file I was able to write out the contents of a Word document as a plain text.

WordDocument doc = new WordDocument("word.doc");
Writer writer = new BufferedWriter(
     new FileWriter("text.txt"));
doc.writeAllText(writer);
writer.flush();
writer.close();

You can also print all the content of the Word document as a XSL-FO file, except that you have to fix a null pointer exception. The code for this is horrible to read with little or no comments. In the end, what I want to do is generate custom XML file given a Word document and I was able to hack what I wanted to do but I had to refactor the hell out of this code.


Nov 15 2005

Print HTML Using IE

I’ve might have mentioned before that I am not well versed in Visual Basic. Here is a small Visual Basic script snippet that took me an afternoon to figure out. You can use this code to print an HTML file using Internet Explorer. After the HTML file has been printed Internet Explorer will close.

Sub PrintHtml(fileName)
    Dim objIE
    Set objIE = WScript.CreateObject( _
        "InternetExplorer.Application", "ie_")
    objIE.Visible = True
    objIE.Navigate filename
    do until objIE.readystate = 4 : wscript.sleep 20 : loop
    print_done=false
    ' 6 = PRINT, 2 = NO USER PROMPT
    objIE.ExecWB 6, 2
    ' Wait until printing id done.
    do while not print_done : wscript.sleep 50 : loop
    objIE.Quit
End Sub

' Listen to ie print events
sub ie_PrintTemplateTeardown(pDisp)
    wscript.sleep 200
    print_done=true
end sub

Nov 12 2005

Search Engine Optimization

Want to improve your site’s page rank? Well I do. What I have been trying to do is make sure to use the abstract, keywords, and description meta tags. To optimize your site make sure your site defines the following tags in each page’s header section.

<META name='abstract' content='...'>
<META name='description' content='...'>
<META name='keywords' content='...'>

In your content add H1 and H2 headers to describe keywords and content title as they are weighted heavily. Make sure that your headers are descriptive of your content. I also make sure to always use the title attribute for all anchor tag and an alt attribute for all my images.

I would also recommend you define a sitemap page that links to all the pages in your site. In fact, Google has defined an XML format for you to describe your page’s sitemap and register it with Google Sitemaps. But perhaps the most important thing is to update you site on a regular basis, daily if you can but at a minimum 3 times a week.


Nov 10 2005

Eclipse Tool Tip #4: Key Assist

Every once in a while you need an assist, well, Eclipse gives it to you. Go to Help > Key Assist to open a dialog containing a lot of short cuts to different Eclipse functionality such as Content Assist (ctrl+space) to Word Complete (alt+/). The hot key short cuts I use the most are ctrl+shift+r to open a resource, ctrl+shift+t to open a type, and ctrl+t to display the type hierarchy, ctrl+shift+f for code format and ctrl+space for content assists amonst others. But if you forget all other hot keys, just remember ctrl+shift+l to open the Key Assist.


Nov 9 2005

More Ways To Open A Doc

As the Perl guys like to say and what I tell my girlfriend, there is more than one way to do it. This is also true for a simple task such as opening a Word document using Visual Basic script. The following code opens a MS Word document:

Sub OpenWord(fileName)
    Set Word = WScript.CreateObject("Word.Application")
    Word.Visible = True
    Set doc = Word.Documents.Open(fileName)
End Sub

I have seen some issues with such code if the word document has a mail merge data source associated with it and you try to execute it (see Word Mail Merge).

doc.MailMerge.Execute True

Trying to execute the mail merge generates a ‘This method or property is not available because the document is not a mail merge main document.’ But I know this is wrong because double clicking on the file reveals a data source associated with the document. Another way to open a word document and circumvent this issue is to use the run command such as:

Sub OpenWord(fileName)
    Set WshShell = WSCript.CreateObject("WScript.Shell")
    WshShell.Run fileName, 8, False
End Sub

In fact, the file name could be any file type and this code will try to open it up with its default application. Just to compare some code, this is the code you can use to open an Excel document:

Sub OpenExcel(fileName)
    Set Excel = WScript.CreateObject("Excel.Application")
    Excel.Visible = True
    Excel.Workbooks.Open(fileName)
End Sub

Nov 8 2005

Using Dom4J: Reading An XML File

For almost every project I have worked on I have had to work with XML files. I’ve used SAX and DOM parsers and have even written my own XML writers. But now for most of my XML needs I use Dom4J. Dom4J will get you started quickly. These four little lines will read in an XML file:

File xml = new File("simple.xml");
SAXReader reader = new SAXReader();
Document doc = reader.read(xml);
Element root = doc.getRootElement();

Of course, you will need to import the Dom4J SAXReader, Document and Element classes from the correct package. The SAXReader read method is heavily overloaded and you can read from a String, URL, InputStream, etc. Once you have an Element object you can get the name, attributes, and child elements. The following code will iterate through the child elements:

for(Iterator i = root.elements().iterator(); i.hasNext();)
   Element elem = (Element)i.next();

This is all the code you need to start reading in an XML document.