Download User Agent Switcher Xml File


I've confirmed that R calls of XML functions such as htmlParse and readHTML send a blank user agent string to the server.
55 respuestas a los protestantes pdf creator. Developing a site that needs to work on both mobile browsers and desktop browsers? Sick of some archaic site blocking you because you're not using Netscape 4? The User-Agent Switcher for Chrome is the answer. With this extension, you can quickly and easily switch between user-agent strings.
?XML::htmlParse
tells me under isURL
that 'The libxml parser handles the connection to servers, not the R facilities'. Does that mean there is no way to set user agent?
(I did try options(HTTPUserAgent='test')
but that is not being applied.)

2 Answers
XML::htmlParse
uses the libxml facilities (i.e. NanoHTTP) to fetch HTTP content using the GET method. By default, NanoHTTP does not send a User-Agent header. There is no libxml API to pass a User-Agent string to NanoHTTP, although one can pass arbitrary header strings to lower-level NanoHTTP functions, like xmlNanoHTTPMethod
. Hence, it would require significant source code modification in order to make this possible in the XML package.
Alternatively, options(HTTPUserAgent='test')
sets the User-Agent header for functions that use the R facility for for HTTP requests. For example, one could use download.file
like so:
The (Apache style) access log entry looks something like this:
Matt's answer is entirely correct. As for downloading to a string/character vector,you can use RCurl and getURLContent()
(or getForm()
or postForm()
as appropriate).With these functions, you have immense control over the HTTP request, including being able to set the user-agent and any field in the header. So
does the job.