ISCC - PDF Processing#
PDF handling module.
pdf_text_extract(fp)
#
Extract PDF text with pypdfium2 + reading order recunstruction and hyphen removal
pdf_thumbnail(fp)
#
Create a thumbnail from PDF document.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fp
|
Filepath to PDF document. |
required |
Returns:
Type | Description |
---|---|
Thumbnail image as PIL Image object |
pdf_meta_embed(fp, meta)
#
Embed metadata into a copy of the PDF file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fp
|
Filepath to source PDF file |
required | |
meta
|
Metadata to embed into PDF |
required |
Returns:
Type | Description |
---|---|
Filepath to the new PDF file with updated metadata |