|
HOW-TO Write a CGI in C/C++
Table of Contents
1. Introduction to CGI PrinciplesThe Common Gateway Interface (CGI) is a standardised method of passing data to and from a web server. The CGI allows for web pages not merely consisting of a static HTML file but inclusive of dynamic content. The CGI framework defines standards to allow a web server to call a second, seperate application and pass to/receive from it.Data is produced by the CGI program and output to a virtual 'screen', this screen is then echoed by the web server back to the requesting client. This simple and standardised output method combined with a number of powerful yet easily functional input methods detailed later allow relatively easy creation of powerful web-based applications. Developing CGI rather than standard applications allows easy multi-user environments as well as removing a vast amount of work from the developer in terms of I/O. CGI programming is no more or less complex than other forms of programming and the added benefits that it provides may be outwayed by the lack of native real-time control in the user interface. This document is intended for programmers to get a feel for CGI techniques and hopefully to discuss some best-practice for the specific design and development methods required. Although examples throughout are provided in C, the basic I/O system would be equally applicable with any CGI-able language (which is most but probably not Visual Basic).
2. Differences Between a CGI and Other ApplicationsUnlike a traditional application a CGI technically runs in a non user interactive environment. A CGI is usually run (obviously there are always advanced examples like self-referential code to spoil every example) as a single-use program. When it is called by the web server, all the input data will have been pre-assigned and packaged according to the input method. Your program deals (in this execution at least) with this data only or other automatically garnered data but with no further user input before it does it's output and finishes execution. There are ways of persisting data such as cookies, explained in the Advanced section.This causes a considerable degree of thought to be placed into the one-pass design of the system and perhaps not allow easy transition of a traditional interactive system into a CGI environment. It is possible however to, by ensuring you use functional design, design a CGI application that differs little from a traditional system in overall design/implementation/interface. Without dealing with the specifics of data I/O which are covered later on, let us consider a simple application. Example: A booking system for something or other. A standard set of data is gathered for all customers. Dependent upon their requirements certain other data will be gathered before the booking is confirmed and a range of output options chosen. In a traditional application we would most likely use a series of GUI-based forms or data input screens. We would order the input in such a way as to determine which questions should be asked dependent on previous information. It is likely that we would have some form of function to determine which questions needed to still be asked dependent on a data structure that was constantly queried and updated.
The limitations we must consider for deploying this application via CGI are
primarily two-fold:
In our example, the CGI application would have to dump all the available data each time for the next execution which would decide which questions to ask before outputting the questions along, again, with all the current data. As well as a design consideration, an execution overhead is caused by the fact that the code is executed every time rather than running interactively. This primarily means than more complex applications, involving a high degree of object orientation can be cumbersomely slow as they are initialised, loaded with data, executed and then destroyed on every execution.
3. Basic Input and OutputOutput from your CGI is sent via stdout to the web server which has in effect system'd your program. Other than header lines this data can be binary format (for example your CGI can generate a binary image file). More complicated header topics such as cookies and redirections are dealt with later in this document, for basic output all you request is the Content-type header. This is a requirement for web servers and clients and part of the CGI standard. This line needs to be buffered with TWO newlines in order to have effect and ends the header stage of the output. The normal Content-type header is:Content-type: text/html Which bizarre as it may seen denotes textual html output from this CGI. Following this line and another newline you may start your normal HTML document and print this to stdout using your preferred method. Input to your CGI comes from two sources: environment variables and standard input (stdin). Lots of information will always be provided by the webserver and available as environment variables to the program (a list of some more common ones is provided later in this document) accessible with getenv(). The majority of these variables will contain connection or environment related data such as the remote IP address or URL being requested, in addition they can be used to pass dynamic data from the user to the CGI. As mentioned previously CGI's are called using a method as are all documents from a webserver. These methods are GET, POST and PUT. GET is the most common and is a simple request to GET a document with all data encoded in the URL. POST is used to encode data for example to pass to a CGI. PUT is used with file uploads and not documented here. Using the GET method any data passed to a CGI is encoded in the URL, you may have seen this before with URLs such as /cgi-bin/cgi?Variable=value. The Question Mark in a URL marks the beginning of data or the query string. This is recovered as the environment variable QUERY_STRING. The POST method requires the web server to pass data to your CGI via stdin. This can be recovered using standard file read functions on the stdin stream. All data passed to a CGI is urlencoded. Encosion strips out any characters that can be problematic such as spaces or question marks (as that denotes the start of a query string) and replaces them with a % sign and then two digits signifying the characters ASCII hex value. Spaces are a special case and sometimes denoted with %20 (32 decimal ASCII space) or sometimes a plus dependent on the whim of the browser and the like. Dealing with this data is explained in further detail later in this document.
4. First Simple CGIOk... The "Goodbye Cruel World" program. Very very simple:
#include <stdio.h> int main(void) { printf("Content-type: text/html\n\n"); printf("<html><title>Hello</title><body>\n"); printf("Goodbye Cruel World\n"); printf("</body></html>"); return 1; }And there you go, compile with something like: gcc first.c -o first.cgi And copy into your web server's CGI directory. You should now be able to run your CGI via your webserver and see the output on your web browser. This is a nice simple test and should work. If it has drink a celebratory Iron Bru and skip ahead to the next section. Move to the top of the class. If you are having problems however read on. If you are a Windoze user you may have taken the compilation literally. You will need some form of Windoze compiler (I recommend Borland's BCC) as it's easier to setup under W32 than GCC. You will also need to output the file as something.exe because Windows claims to not relay upon the three letter group but, oh well I shan't carry on about it here. A 500 Internal Server error isn't good. Does your webserver have the CGI correctly compiled? Can you run the CGI from the command line ok and get the output you should? Are there TWO NEWLINES AFTER THE CONTENT-TYPE HEADER?? If you get prompted to download the program then the webserver is not executing it as a CGI and you need to enable this on your webserver.
5. Dealing With User InputAs stated user input arrives via the GET or POST methods (server variables are available as environment variables regardless of the method). The data is always passed in a URL encoded string either tacked onto the URL with GET or received from stdin with POST.
A URL encoded string is one with certain special characters encoded as a
percent sign (%) followed by the two-digit hexadecimal ASCII code (spaces
are also represented sometimes by the + sign). Variables are named followed
by an equal sign (=) and the value. Variables are seperated with the ampersand
(&) character. With GET, this string is
tacked onto the URL after a question mark (?) and forms a URL like: When GET is used to request a CGI the query string is available from the environment variable QUERY_STRING (or you can get the URL with the query string attached with the environment variable REQUEST_URI). POST uses stdin to pass the URL Encoded string and can be read using any standard file input functions such as fgets (or even fscanf shudder). The method used is provided by the server as the environment variable REQUEST_METHOD. I normally parse this input into a linked list or some such wizardry. For your convinience I have made available my CGI Variable Wrapper for C++ which provides IMHO a simple and easy way of getting hold of user input. The code to decode url strings is provided as part of this package and I shan't bother to list it seperately here (it's in the StringLoad() method in the cgi_interface.cpp file).
6. More Advanced TopicsRedirectionWith CGI it is possible to pass the browser a re-direct as part of the header (as an alternative to using meta http-equiv methods). This will automatically redirect the browser to the alternative page and display nothing of the source page (most browsers provide the option to prompt when a redirect is encountered but most have this disabled by default). To cause the browser to redirect simply send the header: Location: http://www.fullurl.here And then exit your program. If you want the users to see a message first (for example to inform them the URL they're using is out of date), a delayed http-equiv refresh is a much better option.
URL Encoding a String
hello "there" world would URL encode into: Please note that I provide free of charge my most cantankerous and Heath Robinson inspired CGI Wrapper which will handle this and much for for you, details of which can be found here. A full list of ASCII characters and their decimal and hex codes is provided in the reference section of this site.
Real Time User Interaction/Validation You have basically four choices: a full blown Java Applet, JavaScript, JScript (Microsoft's bastard interpretation of JavaScript) or VBScript (Microsoft's bastard implementation of a wannabe JavaScript). My preference would be to use JavaScript (even modern MSIE browsers fully support it).
Cookies Need More Information?View FAQ or ask QuestionView the Reference Section © Copyright 2004-2007 purplepixie.org, all rights reserved. |