Skip to content

ISCC - EPUB Processing#

EPUB handling module.

epub_thumbnail(fp) #

Create thumbnail from EPUB document cover image.

Parameters:

Name Type Description Default
fp

Filepath to EPUB document.

required

Returns:

Type Description

Thumbnail image as PIL Image object

epub_meta_embed(fp, meta) #

Embed metadata into a copy of the EPUB file.

Parameters:

Name Type Description Default
fp

Filepath to source EPUB file

required
meta IsccMeta

Metadata to embed into EPUB

required

Returns:

Type Description

Filepath to the new EPUB file with updated metadata

epub_cover(fp) #

Extract the cover image bytes from an EPUB file.

This function attempts to locate the cover image by: 1. Checking the EPUB2 metadata cover reference 2. Checking the EPUB3 cover-image manifest property 3. Scanning for image files with 'cover' in the name

URL-encoded paths in the OPF manifest are decoded before zip archive lookup. Entry names are also recovered when the EPUB stores UTF-8 filenames without the ZIP UTF-8 flag set. If no cover is found, it raises an error.

Parameters:

Name Type Description Default
fp

Filepath to EPUB file

required

Returns:

Type Description

Raw bytes of the cover image

Raises:

Type Description
IsccThumbExtractionError

If no cover image can be found or the declared cover is absent from the archive.

IsccExtractionError

If the EPUB structure is invalid or corrupt.

epub_process_container(fp, **options) #

Extract and process images from EPUB file. Skips processing for fixed layout EPUBs.

Parameters:

Name Type Description Default
fp

Filepath to EPUB file

required
options

Processing options

{}

Returns:

Type Description

List of IsccMeta objects for embedded images

epub_extract_images(fp, output_dir) #

Extract images from EPUB file to output directory where both width and height are >= min_image_size.

Parameters:

Name Type Description Default
fp

Filepath to EPUB file

required
output_dir

Directory to extract images to

required

Returns:

Type Description

List of paths to extracted images

is_fixed_layout_epub(fp) #

Check if an EPUB is a fixed layout publication.

Parameters:

Name Type Description Default
fp

Path to EPUB file

required

Returns:

Type Description

True if the EPUB is fixed layout, False otherwise

resolve_archive_path(archive, target) #

Resolve a UTF-8 path to an actual zip entry name.

Some EPUBs store UTF-8 filename bytes without setting the ZIP UTF-8 flag (bit 11). Python's zipfile then decodes those names as CP437, producing mojibake. Re-encoding CP437→UTF-8 recovers the original UTF-8 name.

Parameters:

Name Type Description Default
archive

Open zipfile.ZipFile instance

required
target

Target path (POSIX, UTF-8 string)

required

Returns:

Type Description

Actual archive entry name, or None if no match