c# - Scrape JavaScript array data with CsQuery -
c# - Scrape JavaScript array data with CsQuery -
some of info want scrape contained within pages javascript. looks similar pattern:
<script type="text/javascript"> arrayname["field1"] = 12; arrayname["field2"] = 42; arrayname["field3"] = 1442; </script> <script type="text/javascript"> arrayname["field4"] = 62; arrayname["field5"] = 3; arrayname["field6"] = 542; </script>
it's mixed in hell of lot of other javascript. need these values.
i started so:
var dom = cq.createfromurl("http://somesite.xxx"); cq script = dom["script[type='text/javascript']"];
but cannot think how grab data. way create regex , loop on or there way has improve performance?
i can't see how utilize css selectors actual javascript code. should seek different approach?
it seems looking server-side javascript engine - csquery can contents of script tags enough, need run script , able refer entities created. while in theory 1 create kind of query language parse out lines of script, reality is, that's running it. if need pull out particular lines containing simple assignments, , context isn't important, you're looking @ simple regular expressions (or grep) filter out need.
i have used neosis v8 wrapper -- http://javascriptdotnet.codeplex.com/ -- on nuget neosis.javascript.
it's fast (since uses google's v8 engine under hood); real downside it's not pure .net solution, 1 time set it's pretty painless. illustration of using in project https://github.com/jamietre/sharplinter uses run jshint.
there variety of 100% .net javascript engines such jint, ironjs , jurassic. have used jurassic before , it's fastest because compiles bytecode. it's surprisingly complete, not beingness actively developed, , hard much support. of them much, much slower v8 , offer no real advantages other having no non-.net references.
unless really, need 100% .net utilize javscriptdotnet.
c# csquery
Comments
Post a Comment