Jan 25 2006

Word To PDF

Why isn’t there are good and simple command line doc2pdf application? I just can’t find any good command line programs that can faithfully produce a PDF document given a Word document. There are a lot of commercial and some open source applications that can create a PDF document but I can’t find a simple command line tool that does this. For example, PDFCreator is an open source application that allows you to create a PDF document from Word by ‘printing’ the document to a virtual PDFCreator printer. Several commercially available Word to PDF solutions do this same thing; installing a ‘printer’ to print a document as a PDF. This solution is really a hack that exploits the fact that documents sent to the printer need to be transformed to what is essentially PostScript. Once you have a document in its PostScript format you can create a PDF using Adobe Acrocat Distiller or GhostScript’s ps2pdf.cmd batch file.

PDFCreator does not provide a nice command line interface but that is easy to get past that limitation with some simple Visual Basic. You can write some simple Visual Basic script code that opens a Word document, sets the default printer to PDFCreator, and ‘prints’ out document allowing PDFCreator to create a PDF for you. You might want to edit the PDFCreator’s auto-save options otherwise you will be prompted where to save the new PDF. Here is some sample Visual Basic code that does just what I described above.

Set word = CreateObject("Word.Application")
Set docs = wdo.Documents

' Remember current active printer
Set sPrevPrinter = wdo.ActivePrinter

' Select the PDFCreator as your printer
word.ActivePrinter = "PDFCreator"

' Open the Word document
Set document = docs.Open(sMyDocumentFile)

' Print the document file to the PDFCreator

document.Close WdDoNotSaveChanges
word.ActivePrinter = sPrevPrinter
word.Quit WdDoNotSaveChanges

For completeness sakes let me mention how to create a PDF document using the Apache POI project. You can of course convert a Word document to PDF using the Apache POI API. Using POI you can create a XSL-FO version of your document which can be transformed into a PDF using Apache FOP. It has been my experience that the results generated by POI are not perfect but here is some code for you go get started. The POI scratch pad jar contains a WordDocument class that will create a XSL-FO version of the Word document. The WordDocment might have been intended to be just a command line application because it throws a NullPointerException if you try to use it in your code so you will have to modify this class. Once you fix the exception you can code the following two lines to produce an XSL-FO for a given Word document:

WordDocument file = new WordDocument(wordDocumentPath);

Of course once you have the XSL-FO version of your document you can transform it to a PDF using Apache FOP. One word of warning, the WordDocument class is in the scratch pad jar and might not be as stable as you might think.

Dec 27 2005

TechKnow Year In Review 2005

It is that time of year where we reflect on the accomplishments of the passing year and look forward to the one to come. Here is a window of the past year in technology through past posts.

TechknowZenze: First Post – How it all started.
Import Script/CSS/PHP
Page Redirect – PHP, HTML, and JavaScript code to redirect an HTML page to another.
MySQL Admin – Quick tutorial for MySQL administrative tasks.
Put JavaScript To Sleep – Tutorial describing how to set JavaScript functions to timeout.
Word Mail Merge – Visual Basic Script code to manipulate Word’s Mail Merge functionality
JavaScript FX – JavaScript code to hide/show HTML elements.
Style and Class – Working with style attributes on HTML tags using JavaScript.
Search Engine Optimization
The Word is POI – Java library for working with MS Office documents.

Seasons Greetings

Technorati Tags: , , , , , , , , , , , ,

Nov 25 2005

Visual Kill -9

Here is some Visual Basic script code which allows you to terminate a process given a process id number.

' Kills a program given its process id.
Function ProgKill(strProcessId)
   ' Declare used variables
   Dim strWQL
   Dim objProcess
   Dim objResult
   Dim intReturnCode
   Dim wmi

   Set wmi = GetObject("winmgmts:")
   ' Get Process by WMI
   strWQL = _
      "select * from win32_process where ProcessId='" _
      & strProcessId & "'"
   Set objResult = wmi.ExecQuery(strWQL)

   ' Kill all found process
   For Each objProcess in objResult
      ' Try to kill the process
      intReturnCode = objProcess.Terminate(0)
End Function

You can use code like this to kill a process started in your script after a given event or set time.

Nov 15 2005

Print HTML Using IE

I’ve might have mentioned before that I am not well versed in Visual Basic. Here is a small Visual Basic script snippet that took me an afternoon to figure out. You can use this code to print an HTML file using Internet Explorer. After the HTML file has been printed Internet Explorer will close.

Sub PrintHtml(fileName)
    Dim objIE
    Set objIE = WScript.CreateObject( _
        "InternetExplorer.Application", "ie_")
    objIE.Visible = True
    objIE.Navigate filename
    do until objIE.readystate = 4 : wscript.sleep 20 : loop
    ' 6 = PRINT, 2 = NO USER PROMPT
    objIE.ExecWB 6, 2
    ' Wait until printing id done.
    do while not print_done : wscript.sleep 50 : loop
End Sub

' Listen to ie print events
sub ie_PrintTemplateTeardown(pDisp)
    wscript.sleep 200
end sub

Nov 2 2005

Word Mail Merge

“Learn something new every day.” That is my personal motto and I really feel that everyday I learn something new. Well, the other day I learned about MS Word’s Mail Merge capabilities. Word’s Mail Merge feature allows you to define a Word document template to be used for every row in a data source file. Your data source can be an Excel file or a Word document with a single table. Once you have a data source file you can merge it with the template using the following Visual Basic script code:

Sub OpenWord(fileName, datasource)
   Set Word = WScript.CreateObject("Word.Application")
   Word.Visible = True
   Set doc = Word.Documents.Open(fileName)
   doc.MailMerge.Execute True
End Sub

Oct 27 2005

Yes Comments

There is more than one way to say it. A rose in any other name is still a rose, or better yet a comment in any other language is still a comment. Every C++ and Java programmer will tell you that end of line comments start with two forward slashes such as:

// This is a C/C++/Java comment.

For some reason, many script languages such as Perl, Ruby, and Jython choose the pound/bang sign for end of line comments.

# This is a script comment.

C inspired languages, such as C++ and Java have comment blocks. Comment blocks start with /* and end with */ and everything in between will be treated as comments such as:

 * This is a C/C++/Java
 *    comment
 *       block

Of course, Java can produce online documentation if you use the /** variation of the comment block. You can also place comments in HTML/XML documents, for example:

<!-- This is an HTML/XML comment -->

If you work with XSLT files, which are just XML files, and want to produce a comment you need to place your comments inside xsl:comment tags, such as:

   This is not a comment.
   This will produce a HTML/XML comment.

And of course, you can comment a JSP page using the HTML comment construct but if that comment is JSP specific and is not supposed to be sent to the client you can use the following comment construct:

   This is a JSP comment,
      won’t be sent to the client.

And like everything else that Microsoft does, Visual Basic comments are not based on any of the constructs listed here. Visual Basic end of line comments start with the single quote character. The following is a Visual Basic comment:

' This is a VB comment.

Just remember that developers can speak in code but not everyone that reads your code is a developer. So whatever you’re preferred language is please comment your code, at a minimum fill in the pre/post conditions.