[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
I have an online textbook within a flash player wrapper, which
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /g/ - Technology

Thread replies: 13
Thread images: 2
I have an online textbook within a flash player wrapper, which I want to extract into pdf or word doc. The search functionality is shit. I've already paid £450 for the physical version, so I'm not willing to pay again for something I own already just so I can ctrl+f. Any idea how I can proceed lads?
>>
File: sam-griner-success-kid.jpg (20 KB, 635x397) Image search: [Google]
sam-griner-success-kid.jpg
20 KB, 635x397
Super interested in this too.
>>
>>53625719
>I've already paid £450 for the physical version
It better be fucking gold plated
>>
Sorry OP, can't help. I am also interested though.
>>
Just screenshot each page.
>>
>>53625873
Three hundred and fifty pages. I'm sure there's a better solution.
>>
Get the browser inspector, refresh the page, see the downloaded stuff.
Werks every time on my university virtual library
>>
>>53625883
Script it and run it through OCR?

IF its web based, use selenium to time clicks.
>>
>>53625889
I can from the source that each page is stored as a jpeg at a publicly available URL, but that's not the source as its saved as text within the flash player.
>>
>>53625928
Just use that, or find the text source (I'm guessing an xml file with the same name)

Any more assistance will require for you to post a sharpie in the pooper with a timestamp.

Cheers
>>
You need to write a scraper using whatever language you're most comfortable with. This scraper needs to be able to drive selenium in order to support flash under a headless browser.

Python works pretty well with this type of project.
>>
Give us the URL or tell us the book title faggot.
>>
>>53627017

Do this already.
Thread replies: 13
Thread images: 2

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.