Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
@helloworld tried it , doesn't work.
Everything that exists in that pdf is an image .
I tried OCR by converting it to individual images and running it through cognitive services as well, didn't work out.
I don't understand why people make such malformed pdfs -
@skprog that's the best sol probably but it's been 2 hrs and 5 pages are complete
The page is full of specs about hardware :3 -
Ah, I see. The artwork has been done in the most shittiest way, probably photoshop and then rasterised and saved as pdf. It would be impossible to extract text as no text exists. Can you not simply go back to him and ask for the original artwork for the brochure? or tell him that all text from the brochure will have to be re-keyed at a cost of xxx. I guess you will need the images without the text, you’ll have to get the artwork somehow. what a fucking pain in the arse. i feel for you dude.
-
@helloworld the point is the client had made the brochure years ago :3
And he is a typical businessman, who doesn't care how it's done :3
Related Rants
So, there is this one client, who wants a website to be made for his hardware shop, and wants the inventory display and has given me their brochure's PDF and that fucking PDF contains Images and no text and he fucking expects me to write that shit down >:(
Tried all techniques to get text from the brochure , parser , OCR , everything.
None worked.
And the PDF is 100 pages long and I'm dire need of money .
FML :(
rant
clients
stupid people