Saturday, October 8, 2011

Screen Scraping (AKA Web Fetching) using ASP.NET

Screen Scraping, in terms of programmer means fetching data from a website into your application. To talk more technically, it is actually a way by which your application extracts data from the output of some other program. This technique is basically making the request and parsing the response.



his can help you in a tremendous way. You can scrape all products from a website and put them in your application or save them in a spreadsheet, you can do comparisons by scraping data from multiple sites and do research or analysis.
To perform Screen scraping in ASP.NET, we will be using the WebResponse and theWebRequest objects. For this you will need to import System.Net namespace.
I am attaching the code, you can download example Screen Scraping Visual Studio 2005 project.
The start page (i.e. startpage.aspx) looks like as shown in the figure below:
And as you click the button "Click to view", the data from my html page is fetched to the second page of the application i.e.(WebForm1.aspx) as shown in the figure below:
In this application I have screen scraped one of the html pages designed by me, and it is hardcoded, so you can simply go to the line:
//you need to replace this string with any web site url that you require
stringstr="C:/Documents and Settings/Sairam/Desktop/agro_prod.htm";
home.Text = screenscrape(str);

and put the URL you wish to fetch the content from. Or if you wish to work as this is, you will need to change the URL, depending on the location you save the html file. I have attached both the .net project as well as the "agro_prod.html". Just go and try it.
 Here the "screenscrape" method is the method defined by me which performs the major functionality.
private string screenscrape(string url)
{
  WebResponse obj;
  WebRequest obj1=System.Net .HttpWebRequest .Create (url);
  obj=obj1.GetResponse ();
  using(StreamReader sr=new StreamReader (obj.GetResponseStream ()))
  {
    r = sr.ReadToEnd ();
    sr.Close ();
  }
  return r;
}
Once you get the whole content from some site, you can now parse the data there. Extract table from there and many things as per your requirements.

In this demo I have only concentrated on one way of fetching data. There are other methods too. Two of them are listed below:

  • WebClient: This is a class and is part of the System.Net namespace. The main functionality of scraping is performed by the DownloadData method of this class.You can do this by the following lines of code:
WebClient obj = new WebClient();
Byte[] result;
result=obj.DownloadData("http://myssiteToScrape.com");
UTF8Encoding encoding;
String strResult;
strResult=encoding.GetString(result);
Label1.Text=result;
  • HttpServerUtility: This is a seldom used technique. You can use this technique if intend to extract your data from some other page in your own application, by implementing through the following lines of code:
TextWriter writer = new StringWriter();
Server.Execute ("startpage.aspx",writer);
Response.Output.Write(writer.ToString());
           It is to note here that TextWriter is an abstract class so it cannot be instantiated.

Sunday, September 25, 2011

Display alert before redirect in asp.net

 ScriptManager.RegisterStartupScript(Me, Me.GetType(), "message", "alert('Thank for posting your comment. You will now be redirected to our home page.');location.href = 'EntryMaster.aspx'", True)

Saturday, September 24, 2011

How To Block F5(Refresh) Key In IE and Firefox


<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
    <title>Block F5 Key In IE & Mozilla</title>

    <script language="JavaScript">

        var version = navigator.appVersion;

        function showKeyCode(e) {
            var keycode = (window.event) ? event.keyCode : e.keyCode;

            if ((version.indexOf('MSIE') != -1)) {
                if (keycode == 116) {
                    event.keyCode = 0;
                    event.returnValue = false;
                    return false;
                }
            }
            else {
                if (keycode == 116) {
                    return false;
                }
            }
        }

    </script>

</head>
<body onload="JavaScript:document.body.focus();" onkeydown="return showKeyCode(event)">
</body>
</html>

How To Stopping Your User From Right -Clicking


Want to prevent your user from performing any of the other commands available by right-clicking on a Web page in Internet Explorer? It’s not foolproof, but this neat little HTML edit usually does the trick.
Just alter the opening <body> tag of your HTML to the following:


<body oncontextmenu="return false">

When the menu is requested, the oncontextmenu event runs, and we instantly cancel it using JavaScript. This is especially potent as a method for stopping the user from viewing your source, when used in conjunction with a menu-less browser window. Great stuff!

Get list of all active session variables in ASP.NET


In this post I am going to discuss about  how you can get list of all active Session Variables in ASP.NET Application. The easiest way to get the details of session variable is using “Tracing” . If you enable the “Tracing” for your application, you can get list of all  Active Session variables. Another alternative way is, get all the list of session variable using “Session.Contents”.
To illustrate,  Let me store some dummy data to in some session variables,
image
Now, if you enable tracing in your page and inspect the Session State section, you will get the list of all session variable along with their type and values.
image
Now, if you want to read the same session variable programmatically, you have to use “Session.Contents”. Session.Contents returns the current System.Web.SessionState.HttpSessionState
image

Once done, you will get below details (here sessionItems is a multiline Text here )
image
Hope this will help you !!

Maintain Scroll Position After Postback in Asp.Net 2.0 3.5

Method 1 .

Write below mention directive in page directive section of html source of aspx page to maintain scroll position of only one page or selected pages rather then whole web application.
<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default.aspx.cs"  
         MaintainScrollPositionOnPostback="true"  Inherits="_Default" %>



Method 2. 

To maintain scroll position programmatically use code mentione below.
System.Web.UI.Page.MaintainScrollPositionOnPostBack = true;


Method3.

To maintain scroll position application wide or for all pages of web application we can write below mentioned code in pages section of web.config file so that we don't need to add page directive in each and every page.

<pages maintainScrollPositionOnPostBack="true">

Friday, August 12, 2011

Beginner’s Guide: How IIS Process ASP.NET Request


Introduction
When request come from client to the server a lot of operation is performed before sending response to the client. This is all about how IIS Process the request.  Here I am not going to describe the Page Life Cycle and there events, this article is all about the operation of IIS Level.  Before we start with the actual details, let’s start from the beginning so that each and everyone understand it's details easily.  Please provide your valuable feedback and suggestion to improve this article.

What is Web Server ?
When we run our ASP.NET Web Application from visual studio IDE, VS Integrated ASP.NET Engine is responsible to execute all kind of asp.net requests and responses.  The process name is "WebDev.WebServer.Exe" which actually takw care of all request and response of an web application which is running from Visual Studio IDE.
Now, the name “Web Server” come into picture when we want to host the application on a centralized location and wanted to access from many locations. Web server is responsible for handle all the requests that are coming from clients, process them and provide the responses.
What is IIS ?
IIS (Internet Information Server) is one of the most powerful web servers from Microsoft that is used to host your ASP.NET Web application. IIS has it's own ASP.NET Process Engine  to handle the ASP.NET request. So, when a request comes from client to server, IIS takes that request and  process it and send response back to clients.
Request Processing :

Hope, till now it’s clear to you that what is Web server and IIS is and what is the use of them. Now let’s have a look how they do things internally. Before we move ahead, you have to know about two main concepts
1.    Worker Process
2.    Application Pool

Worker Process:  Worker Process (w3wp.exe) runs the ASP.Net application in IIS. This process is responsible to manage all the request and response that are coming from client system.  All the ASP.Net functionality runs under the scope of worker process.  When a request comes to the server from a client worker process is responsible to generate the request and response. In a single word we can say worker process is the heart of ASP.NET Web Application which runs on IIS.

Application Pool:  Application pool is the container of worker process.  Application pools is used to separate sets of IIS worker processes that share the same configuration.  Application pools enables a better security, reliability, and availability for any web application.  The worker process serves as the process boundary that separates each application pool so that when one worker process or application is having an issue or recycles, other applications or worker processes are not affected. This makes sure that a particular web application doesn't not impact other web application as they they are configured into different application pools.

Application Pool with multiple worker process is called “Web Garden”.

Now, I have covered all the basic stuff like Web server, Application Pool, Worker process. Now let’s have look how IIS process the request when a new request comes up from client.

If we look into the IIS 6.0 Architecture, we can divided them into Two Layer

1.    Kernel Mode
2.    User Mode
Now, Kernel mode is introduced with IIS 6.0, which contains the HTTP.SYS.  So whenever a request comes from Client to Server, it will hit HTTP.SYSFirst.

Now, HTTP.SYS is Responsible for pass the request to particular Application pool. Now here is one questionHow HTTP.SYS comes to know where to send the request?  This is not a random pickup. Whenever we creates a new Application Pool, the ID of the Application Pool is being generated and it’s registered with the HTTP.SYS. So whenever HTTP.SYS Received the request from any web application, it checks for the Application Pool and based on the application pool it send the request.

So, this was the first steps of IIS Request Processing.

Till now, Client Requested for some information and request came to the Kernel level of IIS means at HTTP.SYS. HTTP.SYS has been identified the name of the application pool where to send. Now, let’s see how this request moves from HTTP.SYS to Application Pool.
In User Level of IIS, we have Web Admin Services (WAS) which takes the request from HTTP.SYS and pass it to the respective application pool.

When Application pool receive the request, it simply pass the request to worker process (w3wp.exe) . The worker process “w3wp.exe” looks up the URL of the request in order to load the correct ISAPI extension. ISAPI extensions are the IIS way to handle requests for different resources. Once ASP.NET is installed, it installs its own ISAPI extension (aspnet_isapi.dll) and adds the mapping into IIS.  

Note : Sometimes if we install IIS after installing asp.net, we need to register the extension with IIS using aspnet_regiis command.

When Worker process loads the aspnet_isapi.dll, it start an HTTPRuntime, which is the entry point of an application. HTTPRuntime is a class which calls the ProcessRequest method to start Processing.


When this methods called, a new instance of HTTPContext is been created.  Which is accessible using HTTPContext.Current  Properties. This object still remains alive during life time of object request.  Using HttpContext.Current we can access some other objects like Request, Response, Session etc.

After that HttpRuntime load an HttpApplication object with the help of  HttpApplicationFactory class.. Each and every request should pass through the corresponding HTTPModule to reach to HTTPHandler, this list of module are configured by the HTTPApplication.

Now, the concept comes called “HTTPPipeline”. It is called a pipeline because it contains a set of HttpModules ( For Both Web.config and Machine.config level) that intercept the request on its way to the HttpHandler. HTTPModules are classes that have access to the incoming request. We can also create our own HTTPModule if we need to handle anything during upcoming request and response.

HTTP Handlers are the endpoints in the HTTP pipeline. All request that are passing through the HTTPModule should reached to HTTPHandler.  Then  HTTP Handler  generates the output for the requested resource. So, when we requesting for any aspx web pages,   it returns the corresponding HTML output.
All the request now passes from  httpModule to  respective HTTPHandler then method and the ASP.NET Page life cycle starts.  This ends the IIS Request processing and start the ASP.NET Page Lifecycle.
Conclusion
When client request for some information from a web server, request first reaches to HTTP.SYS of IIS. HTTP.SYS then send the request to respective  Application Pool. Application Pool then forward the request to worker process to load the ISAPI Extension which will create an HTTPRuntime Object to Process the request via HTTPModule and HTTPHanlder. After that the ASP.NET Page LifeCycle events starts.
This was just overview of IIS Request Processing to let Beginner’s know how the request get processed in backend.  If you want to learn in details please check the link for Reference and further Study section.

Difference Between Web Farm and Web Garden


Introduction
Web Farms and Web Garden are very common terminology for any production deployment. Though these terms looks same but the things are totally different. Many beginners very confused with these two terms. Here I am giving the basic difference between the Web Farm and Web Garden.
Web Farm
After developing our asp.net web application we host it on IIS Server.  Now one standalone server is sufficient to process ASP.NET Request and response for a small web sites but when the site comes for big organization where there an millions of daily user hits then we need to host the sites on multiple Server. This is called web farms. Where single site hosted on multiple IIS Server and they are  running behind the Load Balancer.
Fig : General Web Farm Architecture
This is the most common scenarios for any web based production environment. Where Client will hit an Virtual IP ( vIP) . Which is the IP address of Load Balancer. When Load balancer received the request based on the server load it will redirect the request to particular Server.
Web Garden
All IIS Request process by worker process ( w3wp.exe). By default each and every application pool contain single worker process. But An application pool with multiple worker process is called Web Garden.   Many worker processes with same Application Pool can sometimes provide better throughput performance and application response time. And Each Worker Process Should have there own Thread and Own Memory space.
There are some Certain Restriction to use Web Garden with your web application. If we use Session Mode to "in proc", our application will not work correctly because session will be handled by different Worker Process. For Avoid this Type of problem we should have to use Session Mode "out proc" and we can use "Session State Server" or "SQL-Server Session State".
How To Configure Web Garden?
Right Click on Application Pool > Properties > GoTo Performance Tab
In bottom Group Section  Increase the Worker Process Count.
Further Study

When to use Inproc Session ? When to outproc session ?


in case of inproc session mode,  
1) session data is stored in current application domain... and so consumes memory of server machine (hampering the performence).
2) if server restarts, all session data is lost.
in case of out-proc mode (which is generally state server),
1) session data is stored on state server and so web servers memory is not consumed
2)in case of web server restart, session data is preserved.
considering above points, use of session mode is the choice to be made considering load, no of users, performence requirment etc.
for small web site with limited no. of users, in-proc mode is best suited
for large applications with huge no. of users, state-server/sql server session mode should be used...

The basic necessacity of using the Outproc session is when you have deployed your application in web farm or web garden. Outproc session should be used when you expect issues like if you are modifying you web.config file or bin folder at runtime or IIS restart on server. As State server mode gives a advantage that it will not lose the data even if your application crash. InProc is the by default session mode in the asp.net apps.

Wednesday, August 10, 2011

What’s a bubbled event?

Bubble event is that cotrol handled by another control means as data grid is control by combo box selection or another control .

Get the IP Address and Referer of a visitor

Here is the code snippet to obtain the IP Address and Referer of a visitor to your website

Dim IPAddress=Request.ServerVariables("REMOTE_ADDR")
Dim Referer=Request.ServerVariables("HTTP_REFERER")

Disable Copy,Cut and Paste of the Aspx textbox


In Asp.net for the textboxes, if you want to disable some options like Copy,Paste and Cut, you can do this by writing simple code in the aspx page as below.
<table>
<tr>
<td><asp:TextBox ID="TxtNameEmp" runat ="server" oncut="return false" ></asp:TextBox></td>
<td><asp:TextBox ID="TxtId" runat ="server" oncopy="return false"></asp:TextBox></td>
<td><asp:TextBox ID="TxtAddress" runat ="server" onpaste="return false"></asp:TextBox></td>
</tr>
</table>

Tuesday, August 9, 2011

Session timeout


the procedure is as below.
Change the following time-outs in Internet Services Manager .Choose a value greater than the default of 20.
    1. Select Default Web Site > Properties > Home Directory > Application Settings > Configuration > Options.
    2. Enable the session state time-out and set the Session timeout for 60 minutes.
    3. Select Application Pools > DefaultAppPool > Properties.
    4. From the Performance tab under Idle timeout, set Shutdown worker processes after being idle for a value higher than 20.
The default session time-out setting on IIS is 20 minutes but it can be increased to a maximum of 24 hours or 1440 minutes. See Microsoft article Q233477 for details about increasing the timeout in IIS.
Symptom
When returning to the logon page for Web Interface, users often encounter an Error: Your session with the web-server expired. You have been logged out.
Cause
Web Interface 2.0 picks up the session timeout setting from IIS.
Resolution
1. Start Internet Services Manager 5.0.
2. For explicit authentication, right-click the /Citrix/MetaFrameXP/default virtual directory and view the Properties.
    For desktop credentials, pass-through users edit the /Citrix/MetaFrameXP/integrated virtual directory.
For Smart Card users, edit the /Citrix/MetaFrameXP/certificate virtual directory.
3. In the Application Settings section, click Configuration.
4. Select the App Options tab.
5. Ensure Enable session state is selected.
The default session time-out setting on IIS is 20 minutes but it can be increased to a maximum of 24 hours or 1440 minutes. See Microsoft article Q233477 for details about increasing the timeout in IIS.
NFuse Classic 1.71 and earlier
When Internet Explorer is configured never to check for newer versions of Web pages, NFuse Classic will not launch an application after a session is allowed to expire. To reproduce this problem:
    1. Configure your browser settings for 'Temporary Internet files and set Check for newer versions of stored pages to Never.
    2. Log on to Citrix NFuse Classic and view your published applications.
    3. Leave the system idle for 20 minutes or longer so that the Web server session expires.
    4. Click an application icon. The system returns you to the Logon page and a message appears stating that your session expired.
    5. Log on again and click the same application as before. You are again returned to the Logon page even though your session should not have expired this time.
Solutions
      • Clear the Temporary Internet Items folder
      • Upgrade to Web Interface 2.0 or later
      • Set Check for Newer versions of stored pages to Automatically
The default session time-out setting on IIS is 20 minutes but it can be increased to a maximum of 24 hours or 1440 minutes. See Microsoft article Q233477 for details about increasing the timeout in IIS.
NFuse 1.6
This error is caused by the expiration of a Web server session, not a MetaFrame ICA session. Web servers maintain session state for a fixed time period to preserve Web server resources and as a security precaution. The default setting for IIS is 20 minutes. Do not remove this timeout. However, the timeout can be modified in the following ways (assuming your Web server is Microsoft IIS):
    1. Open Internet Services Manager.
    2. From the Web site Properties, click the Home Directory tab, then click the Configuration... button in the Application Settings section.
    3. On the configuration panel, click the App Options tab and set the session timeout there.
    4. In NFuse 1.6, add this line to the end of redirect.asp:
    <% Session.Timeout = 20%>
    where 20 is the number of minutes after which an idle session will time out.
Increasing this value improves usability, preventing users from having to enter their credentials too frequently.
Decreasing this value improves security; if a user leaves his desk unattended with the NFuse application list showing, other users can launch applications using the account. The session time out acts as an idle time out and prevents this type of abuse.