Incompatibility - When using action = extracttext with cfpdf

Description

I made a test with below example. In cfpdf tag using action - extractttext, seemed the result doesn't return the content as correct from the pdf.

<cfdocument format="pdf" filename="#expandpath('lucee.pdf')#" overwrite="true"> <p>This is <strong> PDF </strong> example <b> document <small> for </small> the <strike> test with CSS styles. </strike> </b></p> <p>This is PDF example document.</p> </cfdocument> <cfpdf action = "extracttext" type="xml" name="result" source="lucee.pdf"> <cfdump var="#result#" />

The result should be,

This is PDF example document for the test with CSS styles. This is PDF example document as xml format

But, lucee returns,

This is example PDF document the for test with CSS styles. This is PDF example document as xml format.

Environment

None

Activity

Show:

Brad Wood 20 May 2022 at 15:44

Awesome, thanks!

Pothys - MitrahSoft 20 May 2022 at 13:20
Edited

FYI, Now cfpdf action=extracttext returns the expected resultslightly smiling face

Pothys - MitrahSoft 12 May 2022 at 11:27

I added a fix to this ticket

Pull Request: https://github.com/lucee/extension-pdf/pull/40

Brad Wood 25 March 2022 at 14:28

I'm looking to use this on external PDFs that the user provides, and I have no control over how my users create their PDFs. It sounds like the function is basically unreliable and no fix has been found. As Zac pointed out, we're on a old version of PDFBox. I wonder how much work it is to update the library.

Fixed

Details

Assignee

Reporter

Priority

Fix versions

New Issue warning screen

Before you create a new Issue, please post to the mailing list first https://dev.lucee.org

Once the issue has been verified, one of the Lucee team will ask you to file an issue

Created 20 August 2020 at 12:40
Updated 20 May 2022 at 15:44
Resolved 20 May 2022 at 13:21

Flag notifications