Pdf uploading to trash


(Tim Mcqueen) #1

Hello all,

 

Unfortunately I have another, different, issue. I note that it seems perhaps similar to

http://forums.squizsuite.net/index.php?showtopic=11428&hl=%2Bupload+%2Bpdf+%2Berror

 

Some other editors were trying to upload a PDF in Edit+ and it says that is has failed to upload. Attached screenshot.

We have been able to upload the PDF in the backend only. Myself and another Sys Admin have both tried in Edit+ and also cannot upload it there.

 

I have tried saving it in a variety of locations, and also saving it from a variety of pages. I have tried uploading a variety of different PDFs. Nothing has worked.

 

None of the PDF are password protected.

 

This is the error messages in the log I receive:

 

 Raw Entry: [25-Jul-2013 08:18:49] Raw Entry: 0: js_api_error_handler (/var/www/efs4103/packages/web_services/js_api/js_api_error_handler.inc line 31): Raw Entry: ERROR: Error occured while getting contents for pdf File Asset #286863 : Error: Document has not the mandatory ending %EOF. Error Number:512 8:19:06 - 25 Jul User: Emily xxx
Level: MySource Warning

(/core/assets/files/pdf_file/pdf_file.inc:111) - Error occured while getting contents for pdf File Asset #286863 : Error: Illegal entry in bfchar block in ToUnicode CMap

 

 

I note it says "Error: Document has not the mandatory ending %EOF. Error Number:512"

 

I have looked at the coding when the PDF is opened in Notepad++, and it does end with %%EOF.
I believe this file was created in Adobe InDesign, and packed/exported to PDF.

 

 

I note there is a second set of warnings that seem to come up in conjunction with each attempt at uploading:

Error: Illegal entry in bfchar block in ToUnicode CMap

This is caused by the pdftohtml process, from what I have read.

 

Does any of that explain the error screen in Edit+, and the fact that whilst it says "Failed" it actually has uploaded, but moved it also to trash ??

I have checked our triggers, and we have none relating to moving PDF files, or any relating to moving assets to trash.

 

 

Our non-backend editors need to be able to upload their PDFs using Edit+.

I am not sure exactly when this has begun. I have only just been notified of this one, which the user reported on 19/07/2013.

pdftohtml is a useful tool, we do not want to disable it.

 

This is our External Tools config setup:

Enable HTML Tidy Yes HTML Tidy Accessibility Check Level 0 HTML Standard to use for HTML Tidy HTML 4.01 Transitional Enable pdftohtml Yes Enable Antiword Yes Enable Photoshop Metadata Extraction No Enable Markdown No Enable Markdownify No Enable getID3 Extraction No Enable OGG Metadata Extraction No Enable Virus Checker No

 

Keyword Extraction Tools section_icon.gif Path to pdftohtml /usr/bin/pdftohtml Path to Antiword /usr/bin/antiword Path to Antiword Mappings /usr/share/antiword

 

 

 

Any advise as to how to solve this would be greatly appreciated.

 

Kind regards,

Emily. uploading_pdf.jpg (58.8 KB)


(Nic Hubbard) #2

I wonder if this post will help: http://www.eprints.org/tech.php/16939/attachment/message.html

 

Talks about upgrading xpdf which will allow more characters in CMap.


(Edison Wang) #3

The error is cropping up from pdftohtml. It's a tool Matrix uses to extract content for search indexing.

 

The command Matrix uses is:

/path/to/pdftohtml -i -nomerge -noframes -stdout -opw password /path/to/pdf

 

So you can use this command to replicate the error that you saw on matrix. The tool needs to be fixed.

 

I have no idea how to resolve this issue, you might need to google around and probably try to upgrade the tool. That's all i can advise.