I want to extract a single number from a website into a .csv-file every second throughout the day, I have done that using iMacros but have to have a browser opened for that throughout the day so I want to do it with R/Python/C++/C#/JScript/Java. I watched a lot of tutorials and read examples, but every time I try applying it on my desired website, I just get "- -" as value, so no value at all. I figure it has something to do that its dynamically generated or something (.aspx -website?). Here the URL:
http://www.zertifikate.commerzbank.de/MarketOverview/MarketOverviewDetails.aspx?pc=42&c=2193946&ar=.GDAXI&a=15000&isin=DEM_DAX_CASH&mkt=CBUL&pname=DAX&pdp=2
See also pic related, I extracted the whole HTML-code, but there are still no values there, some XPath-approaches in R didn't work either. Please help, any solutions?
It's possible that the number is generated dynamically after the page is loaded. I think the selenium webdriver might be one way to approach it.
Its as if the website protects all of its information, only through iMacros Extraction have I managed to extract the info (pic related), but that's not a long term solution.
>data-field
The numbers are rendered using js but don't lose hope yet there's obviously an API endpoint which serves the page
>>53715236
thank you for the hint, will look into that
>>53715253
thank you, currently looking up 'Scrapping JS generated data with R/Python', hopefully this will solve it finally.
>>53715415
I haven't tried selenium before, but I hear it helps.
Also, I was taking a look at the html and found this: http://www.zertifikate.commerzbank.de/Products/ProductGraphPopoutPage.aspx?isin=DEM_DAX_CASH&mkt=CBUL&pname=DAX
This should help you out since it lowers the amount of clutter and you get the number you want. Take a look at the browser dev tools like the console. It seems that the cbcm object makes a connection to the a lightstreamer server like http://warrantspushserver.commerzbank.de/
Apparently data is being live streamed as in instant messaging service with Lightstreamer. There're some demos on their website which'd be helpful to reverse engineer on. Good luck.
http://demos.lightstreamer.com/?p=lightstreamer&t=client&f=finance
>>53715589
thanks a lot for that it really removes some clutter, if I go into the html source however, it still remains unextractable, and you're right about the communication with the push-service lightstreamer.
>>53715589
However, by changing timeframes of the graph I managed to pin down where the data for the chart is coming from, under the dev-tools and network section I discovered that it draws from a simple plain text webpage with historical data which is being updated throughout the day (pic related), seems like my problem is solved.
Thanks a lot! Now I will just have to wait and see if that page is actually filled with data as frequently (every second) as the webpage is displaying new data.