Welcome to python-pdf-analytics-client API’s documentation!¶
PDFAnalytics is a web service which lets you use to verify PDF content for free.
This python-pdf-analytics-client library allows you to automate most common PDFAnalytics operations using Python 2 or Python 3.
python-pdf-analytics-client can be installed from the pip
tool or downloaded from PyPI: https://pypi.python.org/pypi/python-pdf-analytics-client
The source is available on: https://github.com/pdf-analytics/python-pdf-analytics-client
Contents:
Introduction¶
Purpose¶
The purpose of python-pdf-analytics-client is to provide a library that would help you to automate most common PDFAnalytics using its REST API.
python-pdf-analytics-client can verify :
- textural content, like text, font style, its location (using coordinates and page number)
- image content based on a locally stored image (pixel-by-pixel comparison), its actual size and location in the PDF
- pdf-to-pdf comparison, compare an uploaded PDF with a local one pixel-by-pixel, page-by-page
Examples¶
This example asserts there is the figure.png image on page 4 inside the demo.pdf PDF file.
>>> from pdf_analytics_client import APIClient
>>> server = APIClient(token='my_token')
>>> pdf_job = server.create_job(local_file='/Users/tester/demo.pdf')
>>> pdf_job.verify_image(local_img='/Users/tester/figure.png', top=24, left=64, page=4)
Dependencies¶
python-pdf-analytics-client has only one dependency python-requests . All the dependencies shall be installed automatically when you will install the python-pdf-analytics-client module with pip.
Examples¶
You may find the examples at the GitHub repository : https://github.com/pdf-analytics/python-pdf-analytics-client/tree/master/examples
To run the examples you need to have registered to the site pdf-analytics and to get your token number.
To run the examples:
$ cd examples
# Install the dependecies
$ pip install -r requirements_examples
# Run the examples
$ behave -D token=<your_token_id>
Installation¶
Python 3¶
To install python-pdf-analytics-client, install the python-pdf-analytics-client
package from PyPI and dependencies.
On Windows, this is:
C:\Python36\pip.exe install python-pdf-analytics-client
(Though you may have a different version of Python installed other than 3.6)
On OS X, this is:
pip3 install python-pdf-analytics-client
On Linux, this is:
pip install python-pdf-analytics-client
Python 2¶
To install python-pdf-analytics-client, install the python-pdf-analytics-client
package from PyPI and dependencies.
On Windows, this is:
C:\Python27\pip.exe install python-pdf-analytics-client
(Though you may have a different version of Python installed other than 2.7)
On OS X, this is:
pip install python-pdf-analytics-client
On Linux, this is:
pip install python-pdf-analytics-client
python-pdf-analytics-client will try to install the only dependency i.e. the python-requests library. This happens when pip installs python-pdf-analytics-client.
References¶
This is a quickstart reference to using PyPDFAnalyticsClient.
PDF Analytics Client¶
The PDF Analytics Client is a high level module that enables the verification of the images and text of a local PDF file.
-
class
api_client.
APIClient
(token, url=u'https://pdf-analytics.com/api/')[source]¶ Main API client class
-
create_job
(local_file, wait_to_complete=True)[source]¶ Create a PDF analysis job
Parameters: - local_file – the path of the local PDF file that needs to be uploaded to the server for the analysis
- wait_to_complete – wait for the PDF analysis to complete. Default value is True.
Returns: The JobClass object,
-
-
class
api_client.
JobClass
(id, client)[source]¶ Basic PDF analysis Job class
-
get_item
(left, top, page, type=u'any')[source]¶ Get any item from the PDF (TODO: get figure)
Parameters: - left – Distance from the left of the page in points. Accepts single integer. e.g. 150
- top – Distance from the top of the page in points. Accepts single integer. e.g 200
- page – Number of page, e.g. 4
- type – Type of the the item.
Returns: A JSON object with the item’s information
-
get_metadata
()[source]¶ Get the metadata of the PDF
Returns: A JSON object with the metadata of the PDF
-
get_status
()[source]¶ Get the status of the PDF analysis
Returns: The analysis status as string. The string can be “In progress”, “Error” or “Complete” Return type: str
-
verify_image
(path, left, top, page, compare_method=u'pbp', tolerance=0.0)[source]¶ Verify a local image file exists in the PDF
Parameters: - path – The absolute or relative path of the locally stored image e.g. ‘/User/tester/apple.png’
- left – Distance from the left of the page in points. Accepts single integer. e.g. 150
- top – Distance from the top of the page in points. Accepts single integer. e.g 200
- page – Number of page, e.g. an integer 4 or a string ‘all’, ‘last’, ‘1-4’
- compare_method – Image comparison method
- tolerance – Comparison tolerance. Default value 0.0. Example: 0.02
Returns: If the request is successful it returns 200. If it is not successful it returns the error message.
Return type: JSON
-
verify_pdf
(path, excluded_areas=u'', tolerance=0.0)[source]¶ Verify a local PDF file with the uploaded job’s PDF
Parameters: - path – The absolute or relative path of the locally stored PDF ilfe e.g. ‘/User/tester/report.pdf’
- excluded_areas – Excluded areas. List field. Example : [ {‘left’:146, ‘top’:452, ‘width’:97, ‘height’:13,’page’:2}, {‘left’: 414, ‘top’: 747, ‘width’: 45, ‘height’: 16, ‘page’: ‘all’},]
- tolerance – Comparison tolerance. Default value 0.0. Example: 0.02
Returns: If the request is successful it returns 200. If it is not successful it returns the error message.
Return type: JSON
-
verify_text
(text, left, top, page, method=u'contains')[source]¶ Verify a text exists in the PDF
Parameters: - text – The expected textural content. Accepts string. e.g. ‘This is the expected text’
- left – Distance from the left of the page in points. Accepts single integer. e.g. 150
- top – Distance from the top of the page in points. Accepts single integer. e.g 200
- page – Number of page, e.g. an integer 4 or a string ‘all’, ‘last’, ‘1-4’
- method – Text comparison method
Returns: If the request is successful it returns 200. If it is not successful it returns the error message.
-
wait_analysis_to_complete
()[source]¶ Wait for the PDF analysis to complete
After you submit the PDF to PDF Analytics website, the takes some seconds until it is ready to be used for verification.
Returns: If the analysis is completed and returns True else if in 20 seconds the job is not complete, returns False Return type: bool
-
Changelog¶
This document will track major changes in the project.
1.0.5, December 11, 2017¶
- Fix ModuleNotFoundError
1.0.4, November 25, 2017¶
- Fix the documentation
- Add logo to the documentation
1.0.3, November 24, 2017¶
- Add Python 3 support
1.0.2¶
- Add Categories and keywords in pip / setup
1.0.1¶
- Fix the SSL cert verification
- Fix the documentation
- Cosmetic changes to the python-behave examples
1.0.0¶
- First release