Home Why can the C# HttpClient not call this URL (always times out)?

# Why can the C# HttpClient not call this URL (always times out)?

Dan Diplo
1#
Dan Diplo Published in 2018-02-14 15:13:59Z
 I've been developing an application that determines information about web pages. One of the components of this involves making an HTTP GET request to a URL, grabbing the HTML and analysing it. This has worked fine with every URL I've thrown at it, apart from one... The culprit is the .NET HttpClient that always seems to timeout requesting any URL within the problem domain. However, the same URL requested with a browser returns content within milliseconds. Nothing about the Headers seems unusual. Upping the timeout simply results in it taking longer to bomb-out. I've tried minutes with the same result. I've tried various things, such as setting the User Agent string to that of Chrome but to no avail. The domain in question is: http://careers.adidas-group.com Note the same site also runs on HTTPS at https://careers.adidas-group.com (it has a valid cert). Using either protocol results in the same error. I can show the problem with a simple C# console app, shown below: static void Main(string[] args) { string url = "http://careers.adidas-group.com"; var client = new HttpClient { Timeout = TimeSpan.FromSeconds(10) }; using (var message = new HttpRequestMessage(HttpMethod.Get, url)) { using (var httpResponse = Task.Run(() => client.SendAsync(message)).Result) { Console.WriteLine("{0}: {1}", httpResponse.StatusCode, httpResponse.ReasonPhrase); } } Console.ReadLine(); }  Note in the above example I set the timeout to 10 seconds, simply to expedite the problem - however, increasing the timeout makes no difference. The same code with a different URL (such as https://stackoverflow.com/) runs fine. Also note the code above is simplified to run as a Console App. My actual code runs properly asynchronously (using await) in an async MVC controller method - I'm just using Task.Run(() => ) to make it work with the context of a synchronous Main method in the example. But it makes no difference to the outcome. (The actual exception is a "Task was cancelled" but that appears to be a sympton of the time-out, rather than the actual issue). Can anyone explain to me why this is happening (is it something about the server configuration?) and what, if anything, I can do to make HttpClient fulfill the request? Thanks.
Dan Diplo
2#
Dan Diplo Reply to 2018-02-16 09:07:43Z
 OK, after a lot of investigation I decided it must be down to the server looking for specific headers in the request. So I checked what most browsers send, replicated those and then finally whittled it down to the server requiring all the following headers to be present: client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip, deflate"); client.DefaultRequestHeaders.Add("Accept-Language", "en-GB,en;q=0.9,en-US;q=0.8");  Remove any one of those and the server won't respond. Very odd! Thanks to everyone who looked at this and I hope this answer might help someone in future :) EDIT - More Weirdness OK, the weirdness now continues, because even though this fixes the issue running locally (in VS 2017 with IIS Express) it still doesn't work when deployed into the live environment (running in IIS 7.5 / Windows Server). Same with the console app version - works on local PC, doesn't work on server. Tried 3 Windows servers, same code, and it worked on one and not on other two. Bizzare. Further Edit - A resolution? So after further reading it appears certain web-servers, such as akamai ghost (which hosts the domain in question) have some fairly sophisticated "bot" detection which rejects connections from unknown clients. Measures include checking the order of HTTP request headers so that they match what the user-agent normally sends (ie. if you fake the user-agent string to be Chrome you best act exactly like Chrome, send headers in the order chrome does and accept the same content types etc.). Having tried faking numerous browser user-agent strings I eventually found that "pretending" to be Google PageSpeed bot worked ie. setting user-agent string to be: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Page Speed Insights) Chrome/27.0.1453 Safari/537.36" This seems to work regardless of what version of Windows server or .NET Framework is being used. The headers I eventually came up with are: this.Client.DefaultRequestHeaders.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/apng,*/*;q=0.8"); this.Client.DefaultRequestHeaders.AcceptEncoding.Add(new StringWithQualityHeaderValue("gzip")); this.Client.DefaultRequestHeaders.AcceptEncoding.Add(new StringWithQualityHeaderValue("deflate")); this.Client.DefaultRequestHeaders.Add("Accept-Language", "en-GB,en;q=0.9,en-US;q=0.8"); this.Client.DefaultRequestHeaders.Add("Connection", "keep-alive"); this.Client.DefaultRequestHeaders.Add("Cache-Control", "no-cache"); this.Client.DefaultRequestHeaders.Add("Pragma", "no-cache"); this.Client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Page Speed Insights) Chrome/27.0.1453 Safari/537.36"); 
 You need to login account before you can post.
Processed in 0.392247 second(s) , Gzip On .