Test dataįor testing, we first need a corpus of files that are known violate one or more objectives of our policy. COULD NOT START THE EPUB CHECKER PDFThe general approach is similar to earlier work on the JP2 and PDF formats, as well as the British Library’s Flint tool. Since Epubcheck is capable of reporting its results in XML format, we can use Schematron rules for the final assessment step. The Epubcheck validator is the obvious candidate for steps 1 and 2. assess the results of steps 1 and 2 against our policy.extract technical information that tells us something about DRM and file resources inside the EPUB.test for validity against the format’s standard.To check if an EPUB conforms to the above policy, we need to: Support is already limited with current EPUB reading software: both the popular Calibre and Readium viewers are unable to process EPUBS with DTBook content (although my Sony Reader device handles them without problems). Support for DTBook was dropped in EPUB 3. Rationale: EPUB 2 offered the option to use the DTBook ( DAISY Digital Talking Book) format as an alternative to XHTML 1.1. This requirement minimises the risk of accepting files that contain content that may not be rendered correctly by some readers. Foreign resources are resources that are not part of this set, and the KB’s policy is to not accept them. Rationale: the Core Media Types define a set of file formats that must be supported by all conforming EPUB readers. This technology is merely meant as a stumbling block to discourage third parties from re-using embedded fonts, and it doesn pose a serious threat to long-term accessibility. An edge case here is font obfuscation, which mangles some leading bytes in embedded fonts. Rationale: this minimises the risk that files become inaccessible. Rationale: this minimises the risk of interoperability problems. The KB’s policy on EPUB is made up of the following objectives:įile must be valid EPUB (either version 2 or 3) This blog explores to what extent it is possible to automatically assess the EPUBs that we receive against our policy using a combination of the Epubcheck tool and Schematron rules. The policy largely follows the recommendations from the 2012 report. The KB will soon start receiving publications in this format, and in anticipation of this, our Collection Care department has formulated a policy on the minimum requirements an EPUB must meet to ensure long-term accessibility. Defining TOC levels instead of splitting may be a solution, but I haven't found a way yet to *not* have Calibre split my XHMTL into multiple files, and just keeping one big HTML file with multiple TOC elements.Back in 2012 the KB conducted a first investigation of the suitability of the EPUB format for long-term preservation. I think these have to do with the process of Calibre splitting the XHTML document into multiple HTML files before compressing them into epub format. The best I got was the tool finishing with a few 'unfinished element' messages. Which in itself is rather tough already, when you're working on educational material (consequently: lots of tables, figures, lists, footnotes, example boxes, remark boxes, notification boxes, and the like).īut even XHTML that validates, runs into errors when converting them into epub. Most of the work to get around these errors has to be done by creating valid XHTML files. Unfortunately, up till now I haven't managed to create an epub book that passes 100%. COULD NOT START THE EPUB CHECKER CODEI'm using the java epub check tool up on Google Code to validate my epub files.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |