Archiving the EPUB

The old days

ChatGPT is becoming my goto resource for technical issues. I consider myself to be a creator. I tend not to dive any deeper into a technical issue than I need to in order to solve a problem I’ve run into. Some people want to know every minute detail about everything. While I can certainly appreciate that passion, I’d rather put that time and effort into other other things.

I first started with computers in the late 1980s. I used PCs at college to type up my papers. I learned 3D CADD on an Intergraph drawing terminal that was networked into a Vax mini-mainframe. I was on work study. My professor made me the admin of the CAD lab. He said, back up the system everyday onto these platters. He handed me a hand written set of notes and pointed me to where the manuals were if I wanted to understand more. That is where my first experience of the CLI (Command line interface) started.

I learned how to write a little basic and fortran using the terminals in the computer lab. Everything was CLI at the time. Graphic User Interfaces didn’t exist, i.e. not to access the system, move around directories, copy and move files, and open programs. If you couldn’t grasp the CLI, you were dead in the water. Necessity demanded I learn my way around

EPUB

Fast forward many years, say 39 years, it is now 2024 and I’m learning how to create EPUB files. I’m a web designer. I taught web coding for 18-years as an Assistant Professor. I retired in 2023. That means I understand HTML, CSS, and Javascript well enough to go what I need to do.

EPUB files are created with the same languages that websites are. I also know that authoring packages, Vellum, Atticus, InDesign, etc are all going to code the files in their particular way. From having worked with FrontPage, Macromedia, and Dreamweaver in the past, I know that authoring programs invariably bloat their output code. “Real” coders hand code their creations. Lean and to the point is the name of the game. Don’t use two nested div tags when one will do. WordPress’s motto is “code is poetry.”

ChatGPT

I’m been playing around with ChatGPT. I have a paid version. As I’m learning to code my own EPUBs for sale, I turned to ChatGPT to guide me through the process. It helped me to understand the document structure, required elements, and best practices for what I’m trying to accomplish, which is create an EPUB comic book. I already have the pages written and drawn. Now it’s time to assemble them into an EPUB archive and see if an eBook reader application will render them.

Archive

An EPUB document is a ZIP archive, that contains very specific files, in addition to HTML, CSS, and XML content. The extension is then renamed from ZIP to EPUB. I created the files as required and today decided to see if my files were done correctly. I’m not done with the project. I just wanted to see if what I’ve done so far works. Better to know now before I put too much time into it. Right?

I zipped my folder using the built-in compress command (I’m on a Mac). I changed the extension from zip to epub. Did it open with my ereader?

Nope.

EPUB Check

As a coder, one of the first things you learn to do is let the robot check your code for errors. There are validators for HTML, CSS, and Javascript. There are compilers for lower-level programming languages. There is an EPUB validator for EPUB archives. So, if the reader is not reading the contents of my file, I wanted to know if I did something wrong.

There are EPUB checkers online. The W3C has an opensource EPUB validator. I downloaded the opensource project. It is written the JAVA language.

Archiving Files

I won’t go into all the details on what I needed to do to finally get the EPUB validator to work. If you want to know, email me. I did learn something interesting.

Once I got Java installed, then the EPUB checker and learned how to formulate the command statement (it’s not that hard), I got an error.

The mimetype file has an extra field of length 32. The use of the extra field feature of the ZIP format is not permitted for the mimetype file.

What the heck does that mean? I plugged that error message into ChatGPT and asked it what it meant. The answer was very interesting.

“The error you’re encountering indicates that the mimetype file within your EPUB file is not being stored in the proper format. The EPUB file is essentially a ZIP archive with a specific structure, and the mimetype file must be stored in a very precise way:

1. Uncompressed: The mimetype file must be stored without any compression.

2. First in the ZIP archive: The mimetype file must be the very first file in the archive.

3. No Extra Fields: The file must not have any extra fields in its ZIP header, which is the specific error you’re encountering.”

It then recommended I zip (compress) the folder using a command line interface and a lower-level zip utility. I had compressed the file into a zip archive using the built-in Mac utility. You simply right click on the folder and tell it to compress. You get a zip file.

So I asked ChatGPT, “
I compressed the directory using Mac’s built-in compression utility. Does that add artifacts?”

Apparently it does. Here is what I found out.

From ChatGPT:

“Yes, using macOS’s built-in compression utility (the “Compress” option in Finder) to create a ZIP file may inadvertently add extra metadata or compress the mimetype file, which can result in the exact issue you’re facing. Specifically:

  1. Compression: The macOS utility typically compresses all files in the archive, including the mimetype file, which needs to remain uncompressed in an EPUB file.
  2. Extra Metadata: macOS might include additional metadata or extended attributes in the ZIP file that are not compliant with the EPUB standard. This can result in the “extra field of length 32” error you’re seeing.

To avoid these issues, it’s important to use a tool or method that ensures the mimetype file is stored correctly in the ZIP archive, without compression and without extra fields. The command-line approach I provided earlier is specifically designed to handle this requirement. Using the zip command with the appropriate flags ensures the mimetype file is added in a compliant way, which macOS’s built-in compression tool does not guarantee.”

It then proceeded to tell me exactly how to zip the file in such a way that the mimetype file does not get compressed. I followed its instructions exactly, and what do you know, it worked like a charm.

Again, I’m not going into the technical details because your situation may be different and ChatGPT is available to you as well. If I need the information, my chat is saved and I can get to it later. Perhaps I’ll write another post on the process.

My takeaway

Being a creator, and using digital tools as my palette, never ceases to amaze and challenge me. The guitar has its challenges. Singing has its own. Any drawing medium, be it water color, ink, oils, or digital like Photoshop or Clip Studio Ex, they all come with their own strengths and limitations. Its the designer’s job to work within the limitations of the medium to get what they want.

Design is about problem-solving. My undergraduate is in a design discipline. My graduate degree is in a different design discipline. Both disciplines taught me to look at the entire project. The same is true with zipping a file for an EPUB archive. It’s an element in making an eBook. It’s part of the design process.

ChatGPT read my issue and provided a detailed explanation of what the issue is and how to fix it. As tools go, it’s pretty amazing and I find myself using it more and more. It saved me hours, maybe days, of work reading endless website resources and old Reddit posts trying to figure out what is going on and how to fix it. It is a tool I intend to continue using.