PDF ... inside and outside

Montag, 29. März 2010

From PDF to SAP SmartForms ... Automatically

My dear readers!

The reason in short ...
We had planned a banking product for several eligible customers based on an extensive form management in SAP Smartforms. We were facing problems like "only pdf-forms available" or "old pdf-forms should be completely redesigned". For a bank this can mean that hundreds of forms have to be converted... and the time is always short ;-)

Starting this project we knew that the needed new form creation in SAP Smartforms and the insertion of the recent pdf-forms into SAP would be the biggest time-package – not easy to be calculated. We discussed the idea to create a converter to manage at least the simple tasks of converting in an automatic way.

We had to regard two starting positions:
There were pdf-forms which should be transfered to SAP.
There were pdf-forms which should be completely redesigned before transfering to SAP.

Our basically idea was to extract the pdf-formfield data and properties, insert the data into an xml-structure and using the xml-uploadfunction in Smartforms as the final step. There were forms with less data and a clear structure but also very detailed and overcrowded structures. So we kept in mind that sometimes it would be probably necessary to turn a few screws directly in the converter source. The second part of the work should be the new designed forms. Here we started directly from scratch, creating doc-prototypes with associated technical files containing the formfield-properties. So no existing pdf-form for us. We decided to manage this problem with a different version of the converter. Both converter versions should be developed as a .NET-application. We used C# as ide.

Behind the converter-gui there are batch-modules (developed with Delphi as commandline-tools) doing three jobs for us:
• Extracting the main form-properties like used fonts, the form dimensions, date and time of creation, and so on.
• Extracting all form-fields with name, position values and field-lengths.
• Converting the displayed form content into a tiff-file, regarding the SAP tiff-specifications and the needed dpi-value as a backgoundimage for Smartforms.

The next point was a valid xml-structure to have a look inside. We got it doing a local xml-download of an existing form from Smartforms. We analyzed it, determined the parts which would be always the same and the parts which would be changed programmatically with variable values. We splitted the xml-structure into constant and variable templates. In the templates we signed the significant positions with unique placeholders. Our converter should transform all these things like form properties, field data, reference to the backgroundimage, constant and modified templates as the final step into one new xml-file for the Smartforms-upload.

To prepare Smartforms for the xml-upload first we have to create one single time a formstyle with all possible fontstyles used in the uploaded forms. Another point are the backgroundimages. They are created automatically while generating the upload-xml-structure but the local tiff-files still need to be transported into the SAP Form Graphics Administration (transaction SE78). At this time the referenced link is already in the xml-structure.

So the steps for existing pdf-forms are:
• Starting the converter.
• Selecting a pdf-form and moving through the converter-steps.
• Uploading the new tiff-file via transaction SE78 into SAP.
• Uploading the new xml-file into SAP Smartforms.
• Activating the new form in Smartforms. …That’s it!

At least the converter version for the non-existing forms… In this case the workflow is a bit different. The form properties are already extracted `cause we have the ascii-files with all form- and formfield-properties and bmp- or doc-prototypes.

First step is to convert the bmp- or doc-file into the tiff-format according to the SAP specifications. We’re using for this job the free graphic application „Gimp“. Although „Irfan View“ would be a good candidate for this job we should keep in mind that this application is only free for personal use. Then these tiff-files will be transfered into the SAP Form Graphics Administration (transaction SE78), too. Instead of grabbing the form- and field- properties from the pdf-form via commandline-tools the second version of the converter can read the needed data out of these technical ascii-files which come along with the bmp-prototype. At this stage the flow is the same. The xml-file will be created … uploaded …

So the steps for completely new forms are:
• Converting the bmp-file into tiff-format
• Uploading the new tiff-file via transaction SE78 into SAP.
• Selecting the technical form-data-file moving through the converter-steps.
• Uploading the new xml-file into SAP Smartforms.
• Activating the new form in Smartforms. …That’s it!

There’s one restriction: The described procedures concentrates themselves on the main task – creating single-page-forms. Sure it’s possible to enhance the converters for multi-page-forms but in our special case the cost-benefit ratio wouldn’t have a good relationship.

All together we had to convert approximately 300 forms. Normally this work would have lasted 100 days. With our converters we could do this job in less than 10 days!

Mittwoch, 18. November 2009

"Schnelle Webanzeige" ... Einmal anders!

Meine lieben Leser!

Bieten Sie auf Ihren Webseiten PDF-Dokumente zum Download an?
Sind es sehr große Dokumente?
Haben Sie nur begrenzten Inklusiv-Traffic in Ihrem Web-Package?

Um böse Überraschungen zu vermeiden und neue interessierte Besucher zu gewinnen, sollten Sie mal meine Anwendung PDF-Analyzer Pro oder (für den Batch-Einsatz) PDFIndexCut ausprobieren.
Ich werde Ihnen erzählen warum ...

Wenn Sie ein neues, umfangreiches PDF-Dokument auf Ihre Online-Präsenz laden, sollten Sie auch an den Leser mit sehr begrenzter Bandbreite bzw. schlechter Internetanbindung denken.
Auch aktivierte Optionen wie "schnelle Webanzeige" beim Erstellen des Dokuments können nicht verhindern, dass sich zwar die ersten Seiten dem interessierten Leser schnell öffnen... im Hintergrund aber immer mehr vom Dokument auf dem lokalen Speicherplatz heruntergeladen wird. Oft genug merkt der Leser nach wenigen Seiten, dass das Dokument für ihn nicht hilfreich ist. Ein nutzloser Download für ihn und nutzloser Traffic für den Website-Betreiber.

PDFIndexCut ermöglicht die Trennung eines Dokumentes in zwei Teile - Das erste können wir Indexteil nennen während der zweite Teil das eigentliche Dokument enthält. Der Indexteil sollte nur die Titelseite des Dokuments, das Inhaltsverzeichnis und event. noch ein paar Einstiegsseiten enthalten. Im Indexteil gibt es frei positionierbare Links, die auf den Hauptteil verweisen.

Sind diese beiden Dokumentteile online, kann der interessierte Besucher den kurzen Indexteil lesen und einen ersten Eindruck darüber gewinnen, ob das gesamte Dokument für ihn nützlich und lesenswert sein kann. Wenn dem so ist kann er im Indexteil den Link zum Hauptdokument aktivieren, um das gesamte Dokument zu lesen bzw. herunterzuladen. Wenn schon der Indexteil nicht den Vorstellungen entsprach, gibt es nur einen sehr kleinen Download und keinen mehrere Mbytes großen nutzlosen PDF-Download, der den lokalen Speicherplatz zumüllt und Sie als Website-Betreiber halten den Traffic niedrig.

PDFIndexCut hat Parameter für die Seitennummer, nach der das Dokument getrennt werden soll, für die Linkposition zum Hauptdokument, für den angezeigten Linktext, ... PDFIndexCut ist sehr flexibel und wird Ihren Ansprüchen genügen.
Haben Sie nur wenig Dokumente zu bearbeiten, können Sie auch meine Anwendung PDF-Analyzer Pro (für den Dialogbetrieb) verwenden - es hat die Funktionalität aus PDFIndexCut implementiert.
Versuchen Sie es mal ... Die Testversion gibt's online.

Mittwoch, 23. September 2009

Unter der Oberfläche ...

Haben sie sich beim Ansehen einer PDF-Datei schonmal überlegt, wie es wohl "da drinnen aussieht"? Ein gewisses technisches Interesse vorausgesetzt, kann das ganz interessant sein und ihnen manches "Aha-Erlebnis" bieten.

Was benötigen sie dafür? Nichts, was sie nicht schon hätten ... Probieren sie einfach den Editor "Notepad" aus ihrem Windowssystem (sie finden ihn unter Programme -> Zubehör -> Editor). Mit etwas Glück werden sie feststellen, dass auch der interne PDF-Code durchaus lesbar sein kann.

Die erste Information erhalten sie gleich am Dateianfang. Da taucht dann z.B. "%PDF-1.3%âãÏÓ" auf.
Einige der Zeichen können wir vernachlässigen aber das "PDF" sagt uns schonmal (als hätten wir's gewußt), dass es sich hier um eine PDF-Datei handelt und das "1.3" zeigt an, dass diese Datei bei der Erstellung funktional an die (schon etwas ältere) PDF-Spezifikation 1.3 angelehnt wurde.
Jetzt sollten sie die Suchen-Funktion ihres Editors benutzen:
Suchen sie mal nach "FontName". Sicher werden sie diesbezüglich öfter im Code auf entsprechende Einträge stoßen und so alle eingebetteten Schriften in ihrem Dokument finden. Ein Eintrag könnte z.B. so aussehen: "/FontName/Arial-BoldItalicMT/".
Interessante Tags zum Suchen sind außerdem "Creator", "CreationDate", "Producer", "ModDate", "Title", "Keywords" oder z.B. "Subject". Nicht alle Tags müssen vorhanden sein. Gibt es keinen gepflegten Titel für das Dokument, fehlt auch der entsprechende Tag "Title". Ist das Dokument verschlüsselt, ist der Text hinter den Tags so einfach leider nicht lesbar - gibt ihnen so aber auch die Information, dass das Dokument verschlüsselt ist.
Zum Abschluß noch den interessanten Tag "Count" oder auch "/Count". Dahinter folgt die Seitenanzahl des PDF-Dokuments. Das könnte dann so aussehen: ".../Count 9/...".

Habe ich etwas Appetit auf PDF gemacht? Gehen sie mal auf Entdeckungsreise ;-)

Samstag, 20. Juni 2009

PDF und Formulare

Meine lieben Leser!

Heute möchte ich mal über PDF und Formulare reden. Ein gutes Thema für Adobe - kann man mit diesem Thema doch noch gutes Geld verdienen ;-)

Was haben wir uns nicht immer gefreut, wenn wir ein PDF-Formular sauber, digital (und nicht handschriftlich) am PC ausfüllen konnten und dann ... ja und dann ausdrucken konnten (weil ein Speichern leider nicht möglich war) ... und beim nächsten Mal wieder neu ausfüllen mußten :-(

Das waren noch Zeiten... als man zum Lesen von PDF-Dokumenten eigentlich nur den kostenlosen Adobe Reader kannte.

Nun ist das anders. Schon eine ganze Weile gibt es den allgemein sehr viel schnelleren, "schlankeren" Foxit Reader. Der kann eigentlich auch alles, was der "normale Anwender" vor dem Monitor braucht. ...Und er ermöglicht das Ausfüllen von PDF-Formularen, das Abspeichern dieser ausgefüllten Formulare, das erneute Aufrufen und Ändern der eingetragenen Werte, ...! Den Foxit Reader kann ich sehr empfehlen!

Was tut nun der "große Erfinder des PDF"? Nein... er zieht nicht nach und ermöglicht nicht die Verarbeitung von Formularfeldern mit seinem Reader. Seine Produkte zur Formular-Erstellung integrieren bestimmte Kennzeichen und Routinen um zu prüfen, ob die originalen Formulare verändert wurden. Was passiert nun? ...Mit Adobe-Produkten erstellte Formulare werden mit Foxit ausgefüllt und sind anschließend bei Verwendung mit dem Adobe Reader ihrer Formularfeld-Funktionalität beraubt!

Dieser Umstand ist mir gerade aktuell bei einem Formular aufgefallen, dass Mit dem Adobe Produkt "InDesign" erstellt wurde.

Aber es gibt ja nichts, wofür man keine Lösung ersinnen könnte ;-)
ich werde ein Tool erstellen (als DLL oder richtige Applikation ... mal sehen ...), mit dem man die Formularbestandteile herauslösen und als neues Formular abspeichern kann. Irgendwelche Prüfroutinen werden so ignoriert und das neue Formular sieht (fast) aus wie das Original - läßt sich aber von allen bekannten PDF Readern be- bzw. auch verarbeiten.

[Alle benutzten Warenzeichen und Firmenbezeichnungen unterliegen dem Copyright der jeweiligen Firmen.]

Viele Grüße und einen schönen Tag,
Ingo Schmökel