Saturday, October 8, 2011

Screen Scraping (AKA Web Fetching) using ASP.NET

Screen Scraping, in terms of programmer means fetching data from a website into your application. To talk more technically, it is actually a way by which your application extracts data from the output of some other program. This technique is basically making the request and parsing the response.



his can help you in a tremendous way. You can scrape all products from a website and put them in your application or save them in a spreadsheet, you can do comparisons by scraping data from multiple sites and do research or analysis.
To perform Screen scraping in ASP.NET, we will be using the WebResponse and theWebRequest objects. For this you will need to import System.Net namespace.
I am attaching the code, you can download example Screen Scraping Visual Studio 2005 project.
The start page (i.e. startpage.aspx) looks like as shown in the figure below:
And as you click the button "Click to view", the data from my html page is fetched to the second page of the application i.e.(WebForm1.aspx) as shown in the figure below:
In this application I have screen scraped one of the html pages designed by me, and it is hardcoded, so you can simply go to the line:
//you need to replace this string with any web site url that you require
stringstr="C:/Documents and Settings/Sairam/Desktop/agro_prod.htm";
home.Text = screenscrape(str);

and put the URL you wish to fetch the content from. Or if you wish to work as this is, you will need to change the URL, depending on the location you save the html file. I have attached both the .net project as well as the "agro_prod.html". Just go and try it.
 Here the "screenscrape" method is the method defined by me which performs the major functionality.
private string screenscrape(string url)
{
  WebResponse obj;
  WebRequest obj1=System.Net .HttpWebRequest .Create (url);
  obj=obj1.GetResponse ();
  using(StreamReader sr=new StreamReader (obj.GetResponseStream ()))
  {
    r = sr.ReadToEnd ();
    sr.Close ();
  }
  return r;
}
Once you get the whole content from some site, you can now parse the data there. Extract table from there and many things as per your requirements.

In this demo I have only concentrated on one way of fetching data. There are other methods too. Two of them are listed below:

  • WebClient: This is a class and is part of the System.Net namespace. The main functionality of scraping is performed by the DownloadData method of this class.You can do this by the following lines of code:
WebClient obj = new WebClient();
Byte[] result;
result=obj.DownloadData("http://myssiteToScrape.com");
UTF8Encoding encoding;
String strResult;
strResult=encoding.GetString(result);
Label1.Text=result;
  • HttpServerUtility: This is a seldom used technique. You can use this technique if intend to extract your data from some other page in your own application, by implementing through the following lines of code:
TextWriter writer = new StringWriter();
Server.Execute ("startpage.aspx",writer);
Response.Output.Write(writer.ToString());
           It is to note here that TextWriter is an abstract class so it cannot be instantiated.