Photo of Ethernet cables in a server
Pexels / Brett Sayles

The Common Gateway Interface (CGI) is a standard defining how external programs can provide information to web servers. CGI provides a mechanism for web servers like Apache to exchange data with programming languages such as Perl.

CGI is one of the oldest components of internet infrastructure. It’s still widely used today, despite having been superseded by newer alternatives.

Web server software was traditionally limited to serving static webpages. CGI scripts enabled the production of dynamic responses, created when a request is received.

Standardising HTTP Servers

CGI was designed to provide a standardised way for programming languages to access HTTP server information. Any HTTP server can be paired with any programming language, provided they both adhere to the CGI spec.

CGI-enabled servers will handle requests using a process similar to the following:

  1. A new request is received: /
  2. The web server recognises as an executable CGI script, so it invokes the script.
  3. The Perl CGI script receives all the data about the request, such as its URL and HTTP headers.
  4. The script runs; its output will be passed back to the web server for emission as an HTTP response.

The flow outlined above stands in stark contrast to a web server’s regular operation. A basic request for / would return the content of that file. If the file didn’t exist, you’d receive a 404 response instead.

When using CGI, a request doesn’t need to map to a real file on-disk. Instead, a user-defined program is run. The program has responsibility for generating the output to send to the client. The web server is no longer concerned with the actual content of the response.

Information Exchanged Via CGI

The program binary executed via CGI can access various data about the incoming HTTP request. This includes the URL, headers, query string and HTTP method, as well as the remote client’s IP address.

Server software isn’t required to provide all data verbatim. The CGI specification permits servers to exclude headers from environment variables. This may be to omit sensitive information – such as the value of the Authorization header – or to avoid redundancy when the same information could be accessed using a dedicated variable.

In addition to data about the request, CGI-compatible servers must also indicate various details about themselves. This includes the name and version of the host server software. Scripts may use these details as they see fit.

Information is passed from the server to the CGI program as environment variables. The program accesses them in the same way as any other environment variable. The server will run the program as a child process of itself, setting the environment variables before calling the executable.

There is one piece of data which won’t be passed as an environment variable. The request body gets special treatment, as it could be extremely long. This will be piped into the script on its standard input stream. Scripts are informed how much data is available via the CONTENT_LENGTH environment variable.

Once the script processing completes, the CGI script returns an HTTP response to the server. This must be a complete HTTP response comprising headers and an optional body. The script emits the response to its standard output stream. The server then sends the response back to the client over the HTTP connection.

Where’s CGI Today?

CGI helped bring about the modern web. It provided an exceedingly simple way to build dynamic server-side scripts using the technologies of the mid-90s. No longer was a webpage a static HTML file.

CGI’s simplicity has helped it to endure in the decades since. CGI scripts remain in use, particularly within legacy applications based on older languages. Technology has not stood still though; CGI has been superseded by more modern alternatives that are better suited to today’s web.

Traditional CGI creates an overhead which becomes problematic at scale. The CGI script is reloaded on every request, spawning a new process which can exhaust resources on high-traffic sites.

CGI’s also limited in terms of the control it provides to scripts. Scripts are only able to determine the response content sent back to the client. They’re unable to influence any other part of the HTTP exchange, such as authentication or session management.

Finally, there are security concerns. CGI scripts are generally executed as a child process of the server. This means the server must be protected from script interference. Misconfiguration could give a script undesirable access to other resources managed by the server, such as its configuration and log files.

Many of CGI’s issues have been addressed by newer interface technologies. FastCGI was created to reduce the CGI overhead issue. It works similarly to CGI but does not spawn a new process for each request. Instead, the FastCGI server works independently of the web server, maintaining its own set of persistent processes used to host the CGI scripts.

Elsewhere, individual programming languages have implemented their own server interfaces. These are directly integrated into web servers, usually through optional modules. One example is Apache’s mod_php and mod_perl, which offer native support for those programming languages without using CGI (even though both can be used via CGI).

Despite the emergence of these mechanisms, CGI does remain relevant. The simplicity at the core of its design has informed most subsequent efforts to improve the overall architecture. While you’re unlikely to encounter CGI day-to-day in modern web systems, major web servers continue to support it and that looks unlikely to change anytime soon.

Profile Photo for James Walker James Walker
James Walker is a contributor to How-To Geek DevOps. He is the founder of Heron Web, a UK-based digital agency providing bespoke software development services to SMEs. He has experience managing complete end-to-end web development workflows, using technologies including Linux, GitLab, Docker, and Kubernetes.
Read Full Bio »