ISCC - EPUB Processing#
EPUB handling module.
epub_thumbnail(fp)
#
Create thumbnail from EPUB document cover image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fp
|
Filepath to EPUB document. |
required |
Returns:
| Type | Description |
|---|---|
|
Thumbnail image as PIL Image object |
epub_meta_embed(fp, meta)
#
Embed metadata into a copy of the EPUB file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fp
|
Filepath to source EPUB file |
required | |
meta
|
IsccMeta
|
Metadata to embed into EPUB |
required |
Returns:
| Type | Description |
|---|---|
|
Filepath to the new EPUB file with updated metadata |
epub_cover(fp)
#
Extract the cover image bytes from an EPUB file.
This function attempts to locate the cover image by: 1. Checking the EPUB2 metadata cover reference 2. Checking the EPUB3 cover-image manifest property 3. Scanning for image files with 'cover' in the name
URL-encoded paths in the OPF manifest are decoded before zip archive lookup. Entry names are also recovered when the EPUB stores UTF-8 filenames without the ZIP UTF-8 flag set. If no cover is found, it raises an error.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fp
|
Filepath to EPUB file |
required |
Returns:
| Type | Description |
|---|---|
|
Raw bytes of the cover image |
Raises:
| Type | Description |
|---|---|
IsccThumbExtractionError
|
If no cover image can be found or the declared cover is absent from the archive. |
IsccExtractionError
|
If the EPUB structure is invalid or corrupt. |
epub_process_container(fp, **options)
#
Extract and process images from EPUB file. Skips processing for fixed layout EPUBs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fp
|
Filepath to EPUB file |
required | |
options
|
Processing options |
{}
|
Returns:
| Type | Description |
|---|---|
|
List of IsccMeta objects for embedded images |
epub_extract_images(fp, output_dir)
#
Extract images from EPUB file to output directory where both width and height are >= min_image_size.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fp
|
Filepath to EPUB file |
required | |
output_dir
|
Directory to extract images to |
required |
Returns:
| Type | Description |
|---|---|
|
List of paths to extracted images |
is_fixed_layout_epub(fp)
#
Check if an EPUB is a fixed layout publication.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fp
|
Path to EPUB file |
required |
Returns:
| Type | Description |
|---|---|
|
True if the EPUB is fixed layout, False otherwise |
resolve_archive_path(archive, target)
#
Resolve a UTF-8 path to an actual zip entry name.
Some EPUBs store UTF-8 filename bytes without setting the ZIP UTF-8 flag (bit 11). Python's zipfile then decodes those names as CP437, producing mojibake. Re-encoding CP437→UTF-8 recovers the original UTF-8 name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
archive
|
Open zipfile.ZipFile instance |
required | |
target
|
Target path (POSIX, UTF-8 string) |
required |
Returns:
| Type | Description |
|---|---|
|
Actual archive entry name, or None if no match |