Tuesday, February 4, 2014

pdf2htmlEx for Windows

I had to do a few modifications to the program for it to work under Windows. Specifically for FontConfig to work, I had to export an environment variable and embed fonts.conf in the data directory.

You can find the modified code on GitHub.

Here is the link to the compiled executable for Windows.

Compiling pdf2htmlEx under MingW

I was looking for a good PDF to HTML conversion program and found pdf2htmlEx by Lu Wang. I found the results to be really good and the available options to tune the results quite good.

I wanted to have this compile under Windows using MingW. I found information on how to do this in Japanese right here. I had to do small modifications to these instructions, so here they are:

Building on Windows using MinGW
Install MinGW & G++
Install pkg-config by extracting in your /MinGW directory


  1. Create file fstab in /mssys/1.0/etc to point to the right path
  2. There is bug with mingw. The function _set_invalid_parameter_handler is defined in stdlib.h is not available in msvcrt.dll. I had to put it in #if 0.#if 0
    typedef void
    (* _invalid_parameter_handler) (
       const wchar_t *,
       const wchar_t *,
       const wchar_t *,
       unsigned int,
       uintptr_t);
    _invalid_parameter_handler _set_invalid_parameter_handler
    (_invalid_parameter_handler);
    #endif
  3. Create a .profile file in your home directory with:
export PATH="$PATH:/c/apps/utilities/cmake/bin"
unset LIB
unset INCLUDE


Building Poppler on Windows using MinGW (taken from here)

These instructions include building Poppler with full Kanji support

ZLib

  • tar xvf zlib-1.2.8.tar.xz
  • pushd zlib-1.2.8
  • make -f win32/Makefile.gcc BINARY_PATH=/mingw/bin INCLUDE_PATH=/mingw/include LIBRARY_PATH=/mingw/lib SHARED_MODE=1 install
  • popd

libffi

GLib

 for (i = 0; i < G_N_ELEMENTS (seed); i++)
   //rand_s (&seed[i]);
   seed[i] = rand();
  • CFLAGS="-Wall -Ofast"; ./configure --prefix=/mingw
  • make; make install; popd

Libpng

Libjpeg

XZ Utils

LibTIFF

Libtiff is now configured for i686-pc-mingw32

 Installation directory:             /
mingw
 Documentation directory:            ${prefix}/share/doc/tiff-4.0.3
 C compiler:                         gcc -g -O2 -Wall -W
 C++ compiler:                       g++ -g -O2
 Enable runtime linker paths:        no
 Enable linker symbol versioning:    no
 Support Microsoft Document Imaging: yes
 Use win32 IO:                       yes

Support for internal codecs:
 CCITT Group 3 & 4 algorithms:       yes
 Macintosh PackBits algorithm:       yes
 LZW algorithm:                      yes
 ThunderScan 4-bit RLE algorithm:    yes
 NeXT 2-bit RLE algorithm:           yes
 LogLuv high dynamic range encoding: yes

Support for external codecs:
 ZLIB support:                       yes
 Pixar log-format algorithm:         yes
 JPEG support:                       yes
 Old JPEG support:                   yes
 JPEG 8/12 bit dual mode:            no
 ISO JBIG support:                   no
 LZMA2 support:                      yes
 C++ support:                        yes
 OpenGL support:                     no
  • make; make install; popd

Little CMS

OpenJPEG 1.5.1

 Debug...............: no


 Optional support:
   libpng............: yes
   libtiff...........: yes
   libcms............: lcms version 2.x


 Documentation.......: no
   Build.............: make doc


 mj2.................: yes
 jpwl................: yes
 jpip................: no
 jpip server.........: no
  • make; make install; popd

Freetype

  • tar xvf freetype-2.5.0.1.tar.bz2
  • pushd freetype-2.5.0.1
  • I had to fix off64_t & off_t to _off64_t & _off_t in: io.h, unistd.h & zconf.h
  • ./configure --prefix=/mingw
  • make; make install; popd
  • Modify bin/freetype-config to fix the path
    • prefix="/mingw"
    • exec_prefix="/mingw"
    • exec_prefix_set="no"
    • includedir="/mingw/include"
    • libdir="/mingw/lib"

nkf (Network Kanji Filter)

libxml2

Add some more variables

  • export LIBXML2_CFLAGS=-I/mingw/include/libxml2
  • export LIBXML2_LIBS=-lxml2
  • export FREETYPE_CFLAGS=-I/mingw/include/freetype2
  • export FREETYPE_LIBS=-lfreetype


fontconfig

Pixman

cairo

  • tar xvf cairo-1.12.16.tar.xz
  • pushd cairo-1.12.16
  • ./configure --prefix=/mingw
    cairo (version 1.12.16 [release]) will be compiled with:

    The following surface backends:
     Image:         yes (always builtin)
     Recording:     yes (always builtin)
     Observer:      yes (always builtin)
     Mime:          yes (always builtin)
     Tee:           no (disabled, use --enable-tee to enable)
     XML:           no (disabled, use --enable-xml to enable)
     Skia:          no (disabled, use --enable-skia to enable)
     Xlib:          no (requires X development libraries)
     Xlib Xrender:  no (requires --enable-xlib)
     Qt:            no (disabled, use --enable-qt to enable)
     Quartz:        no (requires CoreGraphics framework)
     Quartz-image:  no (disabled, use --enable-quartz-image to enable)
     XCB:           no (requires xcb >= 1.6 xcb-render >= 1.6 http://xcb.freedesktop.org)
     Win32:         yes
     OS2:           no (disabled, use --enable-os2 to enable)
     CairoScript:   yes
     PostScript:    yes
     PDF:           yes
     SVG:           yes
     OpenGL:        no (disabled, use --enable-gl to enable)
     OpenGL ES 2.0: no (disabled, use --enable-glesv2 to enable)
     BeOS:          no (disabled, use --enable-beos to enable)
     DirectFB:      no (disabled, use --enable-directfb to enable)
     OpenVG:        no (disabled, use --enable-vg to enable)
     DRM:           no (disabled, use --enable-drm to enable)
     Cogl:          no (disabled, use --enable-cogl to enable)

    The following font backends:
     User:          yes (always builtin)
     FreeType:      yes
     Fontconfig:    yes
     Win32:         yes
     Quartz:        no (requires CoreGraphics framework)

    The following functions:
     PNG functions:   yes
     GLX functions:   no (not required by any backend)
     WGL functions:   no (not required by any backend)
     EGL functions:   no (not required by any backend)
     X11-xcb functions: no (disabled, use --enable-xlib-xcb to enable)
     XCB-shm functions: no (requires --enable-xcb)

    The following features and utilities:
     cairo-trace:                no (requires dynamic linker and zlib and real pthr
    eads)
     cairo-script-interpreter:   yes

    And the following internal features:
     pthread:       yes
     gtk-doc:       no
     gcov support:  no
     symbol-lookup: no (requires bfd)
     test surfaces: no (disabled, use --enable-test-surfaces to enable)
     ps testing:    no (requires libspectre)
     pdf testing:   no (requires poppler-glib >= 0.17.4)
     svg testing:   no (requires librsvg-2.0 >= 2.15.0)
     win32 printing testing:    no (requires ghostscript)
  • in util/cairo-missing/cairo-missing.h
    • Comment out typedef SSIZE_T ssize_t;
  • make; make install; popd


OpenSSL

libssh

version:          1.4.3
 Host type:        i686-pc-mingw32
 Install prefix:   /mingw
 Compiler:         gcc
 Compiler flags:    -DLIBSSH2_WIN32
 Library types:    Shared=yes, Static=yes
 Crypto library:   openssl: yes (AES-CTR: no) libgcrypt: no
 Debug build:      no
 Build examples:   yes
 Path to sshd:      (only for self-tests)
 libz compression: yes
  • make; make install; popd

curl

curl version:     7.33.0
 Host setup:       i686-pc-mingw32
 Install prefix:   /mingw
 Compiler:         gcc
 SSL support:      enabled (OpenSSL)
 SSH support:      enabled (libSSH2)
 zlib support:     enabled
 GSSAPI support:   no      (--with-gssapi)
 SPNEGO support:   no      (--with-spnego)
 TLS-SRP support:  enabled
 resolver:         default (--enable-ares / --enable-threaded-resolver)
 ipv6 support:     no      (--enable-ipv6)
 IDN support:      no      (--with-{libidn,winidn})
 Build libcurl:    Shared=yes, Static=no
 Built-in manual:  no      (--enable-manual)
 --libcurl option: enabled (--disable-libcurl-option)
 Verbose errors:   enabled (--disable-verbose)
 SSPI support:     no      (--enable-sspi)
 ca cert bundle:   no
 ca cert path:     no
 LDAP support:     enabled (winldap)
 LDAPS support:    enabled
 RTSP support:     enabled
 RTMP support:     no      (--with-librtmp)
 metalink support: no      (--with-libmetalink)
 HTTP2 support:    disabled (--with-nghttp2)
 Protocols:        DICT FILE FTP FTPS GOPHER HTTP HTTPS IMAP IMAPS LDAP LDAPS POP3 POP3S RTSP SCP SFTP SMTP SMTPS TELNET TFTP


  • make; make install; popd


gettext

pkgconfig

poppler

  • tar xvf poppler-0.24.3.tar.xz
  • pushd poppler-0.24.3
  • Had to change the order of inclusion in
    • CurlPDFDocBuilder.cc
    • CurlCachedFile.cc
    • Make #include "CurlCachedFile.h" after <config.h>
  • Had to change sleep into _sleep in qt5/tests/stress-threads-qt5.cpp
  • Make sure that /d/Apps/Qt/5.1.1/mingw48_32/bin is in PATH
  • POPPLER_QT5_CFLAGS="-I/d/Apps/Qt/5.1.1/mingw48_32/include" POPPLER_QT5_LIBS="-L/d/Apps/Qt/5.1.1/mingw48_32/lib -lQt5Core -lQt5Gui -lQt5Xml -lQt5Widgets -lQt5Test" POPPLER_QT5_TEST_CFLAGS="-I/d/Apps/Qt/5.1.1/mingw48_32/include" POPPLER_QT5_TEST_LIBS="-L/d/Apps/Qt/5.1.1/mingw48_32/bin" LIBOPENJPEG_CFLAGS="`pkg-config --cflags libopenjpeg`" LIBOPENJPEG_LIBS="`pkg-config --libs libopenjpeg`" ./configure --prefix=/mingw --with-font-configuration=fontconfig --enable-xpdf-headers --enable-zlib --enable-libcurl --enable-poppler-glib --disable-gtk-testBuilding poppler with support for:
     font configuration: fontconfig
     splash output:      yes
     cairo output:       yes
     qt4 wrapper:        no
     qt5 wrapper:        yes
     glib wrapper:       yes
       introspection:    no
     cpp wrapper:        yes
     use gtk-doc:        no
     use libjpeg:        yes
     use libpng:         yes
     use libtiff:        yes
     use zlib:           yes
     use libcurl:        yes
     use libopenjpeg:    yes
     use cms:            yes
         with lcms2
     command line utils: yes
     test data dir:

     Warning: Using zlib is not totally safe
  • make; make install; popd
Poppler data

Building PDF2HTMLEX on Windows using MinGW (taken from here)

Pango

PLibC

Fontforge

  • curl --insecure -R -L -o fontforge-fontforge.tar.gz https://github.com/fontforge/fontforge/tarball/master
  • tar xvf fontforge-fontforge.tar.gz
  • pushd fontforge-fontforge*
  • export CPPFLAGS="$CPPFLAGS -DWINDOWS -D_WIN32_IE=0x0501"
  • ./autogen.shPreparing the fontforge build system...please wait
    Found GNU Autoconf version 2.68
    Found GNU Automake version 1.11.1
    Found GNU Libtool version 2.4

    Automatically preparing build …
  • ./configure --prefix=/mingwSummary of optional features:

     real (floating pt) double
     programs           yes
     native scripting   yes
     python scripting   no
     python extension   no
     freetype debugger  no
     capslock for alt   no
     raw points mode    no
     tile path          no
     gb12345 encoding   no

    Summary of optional dependencies:

     cairo              yes        http://www.cairographics.org/
     giflib             no         http://giflib.sourceforge.net/
     libjpeg            yes        http://en.wikipedia.org/wiki/Libjpeg
     libpng             yes        http://www.libpng.org/
     libtiff            yes        http://en.wikipedia.org/wiki/Libtiff
     libxml             yes        http://www.xmlsoft.org/
     libspiro           no         http://libspiro.sourceforge.net/
     libuninameslist    no         https://github.com/fontforge/libuninameslist
     libunicodenames    no         https://bitbucket.org/sortsmill/libunicodenames
     zeromq             no         http://www.zeromq.org/
     libreadline        no         http://www.gnu.org/software/readline
     X Window System    no
  • make; make install; popd

ttfautohint

  • tar xvf ttfautohint-0.97.tar.gz
  • pushd ttfautohint-0.97
  • export CPPFLAGS="$CPPFLAGS -I/d/Apps/Qt/5.1.1/mingw48_32/include"
  • export LDFLAGS="$LDFLAGS -L/d/Apps/Qt/5.1.1/mingw48_32/lib"
  • export QT_LIBS="-lglu32 -lopengl32 -lgdi32 -luser32 -lmingw32 -lqtmain -L/d/Apps/Qt/5.1.1/mingw48_32/lib -lQt5Gui -lQt5Core -lQt5Widgets"
  • ./configure --prefix=/mingw
  • pushd lib
  • make; make install; popd; popd

pdf2htmlEX

  • curl --insecure -R -L -o coolwanglu-pdf2htmlEX.tar.gz https://github.com/coolwanglu/pdf2htmlEX/tarball/master
  • tar xvf coolwanglu-pdf2htmlEX.tar.gz
  • cd coolwanglu-pdf2htmlEX*
  • mkdir build; cd build
  • Remove CDECL from /mingw/include/poppler/error.h
  • Add the following lines before: # debug build flags (overwrite default cmake debug flags)# Add additional dependencies
    set(PDF2HTMLEX_LIBS ${PDF2HTMLEX_LIBS} intl iconv gettextlib gettextpo gutils png jpeg openjpeg glib-2.0.dll z xml2 tiff gio-2.0.dll ltdl plibc.dll)
  • cmake .. -G "MSYS Makefiles" -DCMAKE_INSTALL_PREFIX=/mingw -DENABLE_SVG=ON
  • make; make install