Webscrapping, wtf?!

Thread replies: 9
Thread images: 5

Anonymous
Webscrapping, wtf?! 2016-03-27 16:13:36 Post No. 53715160
[Report] Image search: [Google]

File: I_Want_THIS.png (301 KB, 1679x1008) Image search: [Google]

Webscrapping, wtf?! Anonymous 2016-03-27 16:13:36 Post No. 53715160 [Report]

I want to extract a single number from a website into a .csv-file every second throughout the day, I have done that using iMacros but have to have a browser opened for that throughout the day so I want to do it with R/Python/C++/C#/JScript/Java. I watched a lot of tutorials and read examples, but every time I try applying it on my desired website, I just get "- -" as value, so no value at all. I figure it has something to do that its dynamically generated or something (.aspx -website?). Here the URL:

http://www.zertifikate.commerzbank.de/MarketOverview/MarketOverviewDetails.aspx?pc=42&c=2193946&ar=.GDAXI&a=15000&isin=DEM_DAX_CASH&mkt=CBUL&pname=DAX&pdp=2

See also pic related, I extracted the whole HTML-code, but there are still no values there, some XPath-approaches in R didn't work either. Please help, any solutions?

>>

Anonymous 2016-03-27 16:19:03 Post No.53715236
[Report]

Anonymous 2016-03-27 16:19:03 Post No.53715236 [Report]

It's possible that the number is generated dynamically after the page is loaded. I think the selenium webdriver might be one way to approach it.

>>

Anonymous 2016-03-27 16:19:46 Post No.53715248
[Report] Image search: [Google]

Anonymous 2016-03-27 16:19:46 Post No.53715248 [Report]

File: iMacros.png (81 KB, 838x1005) Image search: [Google]

81 KB, 838x1005

Its as if the website protects all of its information, only through iMacros Extraction have I managed to extract the info (pic related), but that's not a long term solution.

>>

Anonymous 2016-03-27 16:20:07 Post No.53715253
[Report]

Anonymous 2016-03-27 16:20:07 Post No.53715253 [Report]

>data-field
The numbers are rendered using js but don't lose hope yet there's obviously an API endpoint which serves the page

>>

Anonymous 2016-03-27 16:33:14 Post No.53715415
[Report]

Anonymous 2016-03-27 16:33:14 Post No.53715415 [Report]

>>53715236
thank you for the hint, will look into that

>>53715253
thank you, currently looking up 'Scrapping JS generated data with R/Python', hopefully this will solve it finally.

>>

Anonymous 2016-03-27 16:46:08 Post No.53715589
[Report]

Anonymous 2016-03-27 16:46:08 Post No.53715589 [Report]

>>53715415
I haven't tried selenium before, but I hear it helps.

Also, I was taking a look at the html and found this: http://www.zertifikate.commerzbank.de/Products/ProductGraphPopoutPage.aspx?isin=DEM_DAX_CASH&mkt=CBUL&pname=DAX

This should help you out since it lowers the amount of clutter and you get the number you want. Take a look at the browser dev tools like the console. It seems that the cbcm object makes a connection to the a lightstreamer server like http://warrantspushserver.commerzbank.de/

>>

Anonymous 2016-03-27 17:03:45 Post No.53715855
[Report] Image search: [Google]

Anonymous 2016-03-27 17:03:45 Post No.53715855 [Report]

File: feelbonacci.png (107 KB, 841x797) Image search: [Google]

107 KB, 841x797

Apparently data is being live streamed as in instant messaging service with Lightstreamer. There're some demos on their website which'd be helpful to reverse engineer on. Good luck.

http://demos.lightstreamer.com/?p=lightstreamer&t=client&f=finance

>>

Anonymous 2016-03-27 17:33:16 Post No.53716240
[Report] Image search: [Google]

Anonymous 2016-03-27 17:33:16 Post No.53716240 [Report]

File: CBK.png (255 KB, 839x1005) Image search: [Google]

255 KB, 839x1005

>>53715589
thanks a lot for that it really removes some clutter, if I go into the html source however, it still remains unextractable, and you're right about the communication with the push-service lightstreamer.

>>

Anonymous 2016-03-27 17:35:55 Post No.53716269
[Report] Image search: [Google]

Anonymous 2016-03-27 17:35:55 Post No.53716269 [Report]

File: network.png (291 KB, 1706x1006) Image search: [Google]

291 KB, 1706x1006

>>53715589
However, by changing timeframes of the graph I managed to pin down where the data for the chart is coming from, under the dev-tools and network section I discovered that it draws from a simple plain text webpage with historical data which is being updated throughout the day (pic related), seems like my problem is solved.

Thanks a lot! Now I will just have to wait and see if that page is actually filled with data as frequently (every second) as the webpage is displaying new data.