[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
Help with Python script
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /g/ - Technology

Thread replies: 11
Thread images: 1
Need some help with implementing ccertain logic into the .py script.

The subject is a command-line app for downloading videos from FC2 portal. The problem is most of recent videos fail to download. I've found a possible solution.

Original file: https://github.com/h-collector/youtube-dl/blob/master/youtube_dl/extractor/fc2.py#L79
What needs to be done: with "info_url", send an additional parameter named gk (like &gk=n27cXGgTDW). Its value is calculated and hidden/scrambled in each webpage inside the cass() javascript function. You can get the value if you run alert(cass()) through browser console. However this logic needs to be included in the .py script itself. I'm too dumb to do this myself.

The file is from outdated youtube-dl fork and works better than original. Original authors never bored to fix FC2 script properly and it still has other problems, in addition to this one.
>>
cass() isn't showing up as a function
>>
>>53863908
If you don't mind evaluating the javascript source it's esasily doable with spidermonkey or js2py, but it probabily won't be accepted if you open a pull request. Have you seen that cass calls another function ca*** that scrambles for each page?
>>
>>53863908
Just update to the latest youtube-dl mate
>>
>>53863908
>>53866536
However, I strongly suspect there is a better way to get the video url. I remember some time ago i watched a video tutorial on how to reverse it using the network panel in the firefox/chrome tools.
>>
>>53866536
Including any additional parser would probably do no good. You can tell the sequence just from looking at the source of cass(), its (char_number, char) array. Also, not going to do a pull request. Looks like both projects are as good as dead in terms of FC2 support.

>>53866552
Latest youtube-dl has the same bug, crashing on same videos. Example is http://video.fc2.com/en/a/content/20160404BTueH9Sx/
Also, it fails to save non-ASCII characters into the filename. While this 2-years old fork do this just fine.
>>
>>53866884
I was trying to write my solution with spidermonkey when i accidentally visited https://github.com/fent/node-youtube-dl
Then it's just "sudo npm install youtube-dl -g" if you already have node and npm.
Bad news is it now gets called instead of my python youtube-dl on my linux.
Good news is it works, so no python practice for today.
Can we call it solved for now?
>>
>>53866884
>>53867101
>http://video.fc2.com/en/a/content/20160404BTueH9Sx/
nvm, works only for some videos.
>>
>>53863908
>>53866884
Ok, you were right.
I used a string named html_source containing the page source because now I have no time to check what the "webpage" variable in youtube-dl contains.
And sorry for uglyness.

<code>
html_source = " ,,, "

import re
our_lines = [l for l in html_source.split("\n") if re.match(r"c[0-9]= new Array\(", l)]

# removing superfluous:
new_lines = []
for l in our_lines:
m = re.match(r"c.= new Array\((.,'.)'\);", l)
if m:
new_lines.append(m.groups()[0])

splitted = [s.split(",'") for s in new_lines]
our_sequence = "".join([l[1] for l in sorted(splitted)])

print(our_sequence)
</code>
>>
>>53867925
<.<

html_source = " ,,, "

import re
our_lines = [l for l in html_source.split("\n") if re.match(r"c[0-9]= new Array\(", l)]

# removing superfluous:
new_lines = []
for l in our_lines:
m = re.match(r"c.= new Array\((.,'.)'\);", l)
if m:
new_lines.append(m.groups()[0])

splitted = [s.split(",'") for s in new_lines]
our_sequence = "".join([l[1] for l in sorted(splitted)])

print(our_sequence)
>>
>>53867964
Bah. I shouldn't reply when I'm in a hurry.
import re
our_lines = [l for l in html_source.split("\n") if re.match(r"c[0-9]= new Array\(", l)]
## removing superfluous
new_lines = []
for l in our_lines:
m = re.match(r"c.= new Array\((.,'.)'\);", l)
if m:
new_lines.append(m.groups()[0])

splitted = [s.split(",'") for s in new_lines]
our_sequence = "".join([l[1] for l in sorted(splitted)])
print(our_sequence)
Thread replies: 11
Thread images: 1

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.