Apache Server – Redirect – mod_rewrite

  • URL redirection

    • Simple Redirection

      • Redirection with an html file

      • Redirect with the redirect statement

      • Redirect with the RedirectMatch statement

    • Mod_rewrite to rewrite URLs

      • Rewriting gif images to png

      • Establishment of rewriting condition

        • Setting up a rewrite condition according to the browser

        • Establishment of rewriting condition according to the origin

        • Setting up redirection on the PATH of the URL

        • When not to use mod_rewrite

URL redirection

The manipulation of URLs is now a very common practice in the industry, the reasons are multiple:

  • Hide the actual file structure on the system

  • Redirect all requests to a main file (.php, …)

  • Redirect errors to a page, prettier than the default apache error

  • Migrated a website while retaining the original structure, in order to satisfy customer requirements and referencers (google, yahoo, altavista: P, …)

  • Redirect access requests to the unencrypted site (http) for the encrypted version (https)

  • Redirect requests from mobile clients to a site more suited to the equipment, see block impure OS: P (it’s a joke)

  • Process the arguments passed in the URL for a particular behavior.

We will see some methods of setting up redirection, some configuration are simple to set up other complex. We will see the mod_rewrite system which allows to realize very complex configurations, I advise you to have a test environment before trying to carry out the configurations directly in production. A bad configuration can make your website inaccessible.

Simple Redirection

Before talking about mod_rewrite we will see another method of redirection. Although mod_rewrite can answer all of the examples covered below, it is easier to use methods with which we are more comfortable because simpler. Free to you later: P.

Redirection with an html file

I leave a little apache frame, however I like this method, because it is simple and does not require access to apache configuration. Just create an index.html file in the desired directory, for example the root of the site and put the redirection code.

If we take the site we have previously created, if we access the site site now we simply have the message: “Site C”. We will redirect to the google site when we access the site.

To do this I will modify the index.html file with the following statement:

 

You will be redirected in 5 seconds to google.com

This works very well however it only works if we access the root of the website, indeed if we create an info.html file in the DocumentRoot and we access it directly we will not be redirected. Demonstration!

$ sudo vim info.html $ cat info.html un peu de texte pour fournir de l'information

If you access http://www.sitec.com/info.html, as mentioned you will have the contents of the file without any redirection. It is possible that this is what you want, every problem that solution!

Reference:

  • http://webmaster.iu.edu/tools-and-guides/maintenance/redirect-meta-refresh.phtml

Redirect with the redirect statement

Before we see the module mod_rewrite we will see the redirect statement it is part of module mod_alias.

As you can see in the documentation, the statement can be used in several configuration contexts:

  • Global configuration of the server

  • A virtual server (Virtual host)

  • A directory

  • An .htaccess file

    Here is a simple example is very very classic of the use of the redirect statement, return all communications from http site to ssl httpS mode.

    We will therefore modify the configuration of the siteA virtual server so that when the request is received on the unencrypted channel, the whole is redirected to the site in SSL.

    We will edit the file /etc/apache2/sites-available/siteA.conf:
$ sudo vim /etc/apache2/sites-available/siteA.conf $ cat siteA.conf ServerAdmin webmaster@localhost ServerName www.linux202-siteA.com ServerAlias linux202-siteA.com ServerAlias toto.linux202-siteA.com DocumentRoot /data/vhosts/siteA/docroot/ Redirect / https://www.linux202-siteA.com/ # Configuration des logs ErrorLog /data/vhosts/siteA/logs/error.log CustomLog /data/vhosts/siteA/logs/access.log combined

If you look at the original configuration file you will find that I delete all the instructions related to the permissions of the directories. Indeed I remove all the instructions to simplify the file to the maximum because anyway all instructions will be redirected to the SSL configuration.

If you do some tests

 

  • http://www.linux202-sitea.com/ : Redirige effectivement vers le site en https://www.linux202-sitea.com/

  • http://www.linux202-sitea.com/info.php : Redirige aussi vers le site en https tous en conservant le fichier de référence.

It is also possible to redirect a section only to an external site. In the following example, the / search section will redirect to the duckduckgo.com site.

$ cat /etc/apache2/sites-available/siteA.conf [ ... OUTPUT COUPÉ ... ] DocumentRoot /data/vhosts/siteA/docroot/ Redirect /search https://www.duckduckgo.com Redirect / https://www.linux202-siteA.com/ [ ... OUTPUT COUPÉ ... ]

This can be very useful when there is movement in the organization of the website. Corporate image is very important nowadays and the service website or company is the showcase of our ability to manage change. Another example of experience I had is the server switch to provide multimedia content. Originally all hardware was on the server providing the service but with time space or performance of the server was no longer at the rendezvous. So we have to modify the server so that the imgs directories are redirected to http://img.corposite.com.

Redirect with the RedirectMatch statement

Another example of the situation is the conversion of gif or bmp: P file to a png or jpg file format. We have the RedirectMatch statement, which works like the Redirect statement but allows us to use regular expressions and variables. Here’s an example of how to convert gif files to png.

RedirectMatch "(.*).gif$" "https://www.linux202-sitea.com$1.png"

Let’s set up this configuration:

$ cat /etc/apache2/sites-enabled/siteA-ssl.conf ServerAdmin webmaster@localhost ServerName www.linux202-siteA.com ServerAlias linux202-siteA.com ServerAlias toto.linux202-siteA.com DocumentRoot /data/vhosts/siteA/docroot/ Alias "/cm-images" "/data/vhosts/common/images" Options none AllowOverride ALL Require all granted Options none AllowOverride None Require all granted RedirectMatch "(.*).gif$" "https://www.linux202-sitea.com$1.png" [ ... OUTPUT COUPÉ ... ]

We will redirect queries directed to gif files to png files. Before enabling this configuration we will put a .gif file and access the page to view it here is the logo of the FSF.

  • Version Gif

    •  
  • Version Png

Copy it into the DocumentRoot of the site and go to the URL, in my case:

$ sudo cp Free_Software_Foundation_logo.gif /data/vhosts/siteA/docroot/ $ sudo cp Free_Software_Foundation_logo.png /data/vhosts/siteA/docroot/

URL : https://www.linux202-sitea.com/FreeSoftwareFoundation_logo.gif

Let’s activate the redirection configuration now:

$ sudo apachectl configtest && sudo /etc/init.d/apache2 restart

URL : https://www.linux202-sitea.com/FreeSoftwareFoundation_logo.png

So as you can see we see the redirection according to the following configuration the redirection will occur regardless of the directory where the file is located. If we create the directory toto / blabla / and put the file png when the access the redirection will take place too, demonstration.

$ sudo mkdir -p /data/vhosts/siteA/docroot/toto/blabal/ $ sudo cp Free_Software_Foundation_logo.png /data/vhosts/siteA/docroot/toto/blabal/

URL : https://www.linux202-sitea.com/toto/blabla/FreeSoftwareFoundation_logo.png

If we look at the logs we see the redirection:

$ tail /data/vhosts/siteA//logs/ssl_access.log 172.17.42.1 - - [12/May/2016:08:38:01 -0400] "GET /toto/blabla/Free_Software_Foundation_logo.gif HTTP/1.1" 302 2373 "-" "Mozilla/5.0 (X11; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0" 172.17.42.1 - - [12/May/2016:08:38:01 -0400] "GET /toto/blabla/Free_Software_Foundation_logo.png HTTP/1.1" 304 209 "-" "Mozilla/5.0 (X11; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0"

Let’s go back to the line that realizes this magic: P, and take a few minutes to understand it:

RedirectMatch "(.*).gif$" "https://www.linux202-sitea.com$1.png"
  • RedirectMatch: Instruction contained in the alias module

  • “(. *). gif $”: This is certainly less clear, this is a regular expression that will process requests and see if there is action to be taken on the latter is the condition. Regular expressions are very important because this allows you to conditionally process strings. Let’s analyze what this means, I’ll return to parentheses:

    • . == Any character this can be a space, a number, a letter, …
    • * == the star is an operator this corresponds to what precedes it, repeated zero or more times. So if we look at the set this means (. *) Any character zero or more times.

    • . == As we saw the point (.) Represents any character so if we want to explicitly specify the character. we must precede it with a .

    • gif == the regular expression therefore expects to see the characters gif

    • $ == This predicate does not correspond to any character but sets a necessary condition to find an agreement on what precedes it by indicating that it must be at the end of a line (so be at the end of the input text or just before a line break).

    • Summary: When the URL is processed, the system looks at all the characters it is present 0 or more times (. *), And it validates that the processed URL ends with .gif. The use of () allows you to group a set of text for reuse later via variables $ 1, $ 2, $ 3 depending on the number of parentheses used. We will see are use in the next instruction.

  • “https: //www.linux202-sitea.com$1.png”: Last argument the link where the redirection will be done, as mentioned above the information contained in the parentheses will be substituted instead of $ 1. As you probably noticed, the .gif extension is not contained in the parentheses, that’s why we do not realize the extension change when redirecting with the .png

Now that we are dealing with each argument we have a little bit of an example, this will surely simplify the understanding.

If we take the query: https://www.linux202-sitea.com/FreeSoftwareFoundation_logo.gif Let’s look at the entry in the logs

172.17.42.1 - - [12/May/2016:08:27:36 -0400] "GET /Free_Software_Foundation_logo.gif HTTP/1.1" 302 681 "-" "Mozilla/5.0 (X11; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0"

From apache’s point of view when processing the RedirectMatch statement the latter receives: /FreeSoftwareFoundation_logo.gif.

  1. Apache Receives in the VirtualHost: /FreeSoftwareFoundation_logo.gif
  2. Processing the redirection rule:

    RedirectMatch "(.*).gif$" "https://www.linux202-sitea.com$1.png"
  3. The query Match the criterion if we take the regex: (. *). Gif $ this gives:

    • (.*) == /FreeSoftwareFoundation_logo

    • .gif$ == .gif

  4. Realization of the redirection by replacing the variable $ 1 by the contents of the variable:

    • “https://www.linux202-sitea.com$1.png” == “https://www.linux202-sitea.com/FreeSoftwareFoundation_logo.png”

We can therefore see that the formula works as well if the file is in a directory and not at the root of the site:

If we access the URL: https://www.linux202-sitea.com//toto/blabla/FreeSoftwareFoundation_logo.gif

 

172.17.42.1 - - [12/May/2016:08:28:58 -0400] "GET /toto/blabla/Free_Software_Foundation_logo.gif HTTP/1.1" 302 705 "-" "Mozilla/5.0 (X11; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0"

From the apache point of view when processing the RedirectMatch statement, the latter receives: /toto/blabla/FreeSoftwareFoundation_logo.gif.

  1. Let’s break the regex: (. *). Gif $

    • (.*) == /toto/blabla/FreeSoftwareFoundation_logo

    • .gif$ == .gif

  2. Realization of the redirection by replacing the variable $ 1 by the contents of the variable:

    • “https://www.linux202-sitea.com$1.png” == “https://www.linux202-sitea.com/toto/blabla/FreeSoftwareFoundation_logo.png”

Of course, the available operations are multiple with the regex this is only an example, to you to see your needs.

Reference:

  • https://httpd.apache.org/docs/current/fr/mod/mod_alias.html#redirect

  • https://fr.wikipedia.org/wiki/GraphicsInterchangeFormat

  • https://fr.wikipedia.org/wiki/PortableNetworkGraphics

Mod_rewrite to rewrite URLs

After covering the Redirect and RedirectMatch statement, you are able to do a lot of things. With the use of mod_rewrite your possibilities increase, however a level of complexity will also be significantly increased.

The mod_rewrite module uses a rule-based rewrite engine, based on a PCRE regular expression interpreter, to rewrite URLs on the fly. By default, mod_rewrite maps a URL to the file system. However, it can also be used to redirect a URL to another URL, or to invoke an internal request to the proxy.

mod_rewrite provides a flexible and powerful method for manipulating URLs using an unlimited number of rules. Each rule can be associated with any number of conditions, allowing you to rewrite URLs based on server variables, environment variables, HTTP headers, or timestamps.

mod_rewrite acts on the entire URL, including the path part. A rewrite rule can be invoked in httpd.conf or in a .htaccess file. The path generated by a rewrite rule may include a parameter string, or may refer to internal secondary processing, redirection to an external request, or to the internal proxy.

More details, discussions, and examples can be found in the detailed mod_rewrite documentation.

Rewriting gif images to png

We saw the RedirectMatch statement that is part of the mod_alias module for file redirection originally in .gif format to .png. As we have already understood the operation we will redo the same configuration with mod_rewrite only for this familiar with the syntax.

First we need to activate the mod_rewrite module, in order to use the module’s instructions.

 

$ cd /etc/apache2/mods-enabled $ sudo ln -s ../mods-available/rewrite.load .

If you do not perform this operation during the syntax validation of the configuration (sudo apachectl configtest) you will get the following message:

AH00526: Syntax error on line 26 of /etc/apache2/sites-enabled/siteA-ssl.conf: Invalid command 'RewriteEngine', perhaps misspelled or defined by a module not included in the server configuration Action 'configtest' failed. The Apache error log may have more information.

As a reminder, the original configuration:

RedirectMatch "(.*).gif$" "https://www.linux202-sitea.com$1.png"

Here is the instruction with RewriteRule:

RewriteEngine on RewriteRule "(.*).gif$" "$1.png" [R,L]

We validate and reload the configuration:

$ sudo apachectl configtest && sudo /etc/init.d/apache2 restart Syntax OK * Restarting web server apache2 ...done. *

Now let’s go to the URL of the image:

  • https://www.linux202-sitea.com/toto/blabla/FreeSoftwareFoundation_logo.gif

  • https://www.linux202-sitea.com/FreeSoftwareFoundation_logo.gif

In both cases the redirection is made to the png file and the URL rewritten in the bar of your browser:

  • https://www.linux202-sitea.com/toto/blabla/FreeSoftwareFoundation_logo.png

  • https://www.linux202-sitea.com/FreeSoftwareFoundation_logo.png

Let’s analyze one of the instructions:

  1. Enabling mod_rewrite module for VirtualHost

    RewriteEngine on
  2. Realization of the redirection rule:

    RewriteRule "(.*).gif$" "$1.png" [R,L]
  3. We find that the instruction instead of RedirectMatch is RewriteRule.

  4. The regular expression is equivalent so I will skip the explanation of the latter. Review the explanation in the RedirectMatch section.

  5. We have more flags, these allow to define the behavior of the redirection, you with the list of flags available on the documentation

    [R,L]
    • A: Using the [R] flag causes a redirection to the browser. If a fully qualified domain name (FQDN) is specified (ie including http: // server-name /), a redirection will be made to this address. Otherwise, the current protocol, server name, and port number will be used to generate the URL sent with the redirection.

    • L: When the [L] flag is present, mod_rewrite stops processing the rule set. This means in most situations that if the rule applies, no other rule will be processed. This flag corresponds to the Perl last command, or to the break command in C. Use this flag to indicate that the current rule should be applied immediately, regardless of subsequent rules.

If we make a change in the flags, remove the R option for not having a redirection but simply a rule rewrite. WARNING this only works if the redirection is done locally, in other words to a page or an internal reference to the server. Here is the new configuration:

RewriteEngine on Rewriterule "(.*).gif$" "$1.png" [L]

Only difference in the configuration we deleted the R flag, if you access the URL: https://www.linux202-sitea.com/FreeSoftwareFoundation_logo.gif, it still works and you see the png file because it there is the black line present. Be attentive now to the URL in the bar, the URL is always GIF and not PNG, so the writing is done but is not presented to the user: D.

As mentioned this only works when the server manages the set, either the source and the destination, if you set an external URL for example the URL of the Google logo, the behavior will be a redirect:

 

RewriteEngine on Rewriterule "(.*).gif$" "https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png"

Establishment of rewriting condition

The real advantage of mod_rewrite is the implementation of multiple condition management, which is impossible with the redirection system, with the mod_alias module. RewriteRule allows to define a condition in the example above we have defined only the fact that the query is for files with the extension gif. With the RewriteCond statement it is possible to define more condition, not only on the supplied URL but also on the client performing the request.

Setting up a rewrite condition according to the browser

Small demonstration to understand the concept, we will make a redirection rule to provide a page for Chrome browsers because Chrome is not free software must inform people :-).

So we will put a rule in order to intercept the connections of the browser type Chrome, to do this we will use the USER-AGENT identifier. When a user visits a web page, a text string is usually sent to the server for identify the user agent. It is included in the HTTP request by the “User-Agent” header and gives information such as: the name of the application, the version, the operating system, the language, etc.).

I will return in a few minutes with a list of validation variables available once the demonstration is done. Here is the instruction:

 

RewriteCond "%{HTTP_USER_AGENT}" "(Chrome)" RewriteRule "(.*)" "/chrome-info.php?link=$1"

So we take the browser information, if the browser is Chrome then we redirect to the chrome-info.php file by passing the original URL to PHP. The goal is to display a message to the user and redirect it to the original page but we will go there step by step: P.

Realize the file chrome-info.php: /data/vhosts/siteA/docroot/chrome-info.php

 

Votre Fureteur est pas libre ... Chrome est privatif, je vous suggere Chromium : https://www.chromium.org/ !!

Ceci est uniquement pour vous informer ! vous serez redirige vers la bonne page dans 5 secondes vers l'URL

We validate the configuration and restart the apache service, then we access the URL, for example info.php.

$ sudo apachectl configtest && sudo /etc/init.d/apache2 restart

So if we access: https://www.linux202-sitea.com/info.php, you will have this on the screen:

The message is clear, there is a loop in the redirection … You can easily confirm it by consulting the logs, make a tail -f on the file of access you will be able to see it clearly.

$ tail -f /data/vhosts/siteA/logs/ssl_access.log 172.17.42.1 - - [19/May/2016:08:35:35 -0400] "GET /info.php HTTP/1.1" 302 813 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36" 172.17.42.1 - - [19/May/2016:08:35:35 -0400] "GET /chrome-info.php?link=/info.php HTTP/1.1" 302 689 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36" 172.17.42.1 - - [19/May/2016:08:35:35 -0400] "GET /chrome-info.php?link=/chrome-info.php HTTP/1.1" 302 689 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36" 172.17.42.1 - - [19/May/2016:08:35:35 -0400] "GET /chrome-info.php?link=/chrome-info.php HTTP/1.1" 302 689 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36" .... ....

So if we look at the logs:

  1. “GET /info.php HTTP / 1.1” 302 813 “-” “Mozilla / 5.0 (X11; Linux i686) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome /: We see the first query at the URL info.php and we find that the agent contains the word Chrome.

  2. “GET /chrome-info.php?link=/info.php HTTP / 1.1” 302 689 “-” “Mozilla / 5.0 (X11; Linux i686) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome: We have redirection to the file chrome-info.php with the argument in parameter of the original page? link = / info.php So far all is well, moreover we still have the agent containing Chrome

  3. “GET /chrome-info.php?link=/chrome-info.php HTTP / 1.1” 302 689 “-” “Mozilla / 5.0 (X11; Linux i686) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome: Oops, that’s starts spinning, indeed when the client requests to access the chrome.php page the condition RewriteCond still applies: P, there is always the agent Chrome so he realizes the redirection by passing in argument the Original URL? Link = / chrome-info.php

  4. “GET /chrome-info.php?link=/chrome-info.php HTTP / 1.1” 302 689 “-” “Mozilla / 5.0 (X11; Linux i686) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome: We are in the loop …

This is a classic case when working with the URL redirection system so it is complicated to work with mod_rewrite and I always suggest having a test environment to validate your configurations.

How to correct this problem ??

We will add a condition, we will indicate that we do not want the redirection applies if the browser asks the page chrome-info.php. Here is the configuration with the fix to avoid falling into an infinite loop:

 

RewriteCond "%{HTTP_USER_AGENT}" "(Chrome)" RewriteCond "%{REQUEST_URI}" !/chrome-info.php RewriteRule "(.*)" "/chrome-info.php?link=$1" [R,L]

We added the validation on the requested URL (REQUEST_URI), I defined a negative condition, indeed I specify that for the rewrite to apply it is NOT (!) That the requested URL is / chromium-info.php. With the use of the exclamation point (!) This allows us to define a negative condition.

After the change restart the apache service and look at the logs when we ask the info.php page:

 

172.17.42.1 - - [19/May/2016:08:49:45 -0400] "GET /info.php HTTP/1.1" 302 2343 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36" 172.17.42.1 - - [19/May/2016:08:49:45 -0400] "GET /chrome-info.php?link=/info.php HTTP/1.1" 200 575 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36" 172.17.42.1 - - [19/May/2016:08:49:45 -0400] "GET /favicon.ico HTTP/1.1" 302 681 "https://www.linux202-sitea.com/chrome-info.php?link=/info.php" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36" 172.17.42.1 - - [19/May/2016:08:49:45 -0400] "GET /chrome-info.php?link=/favicon.ico HTTP/1.1" 200 575 "https://www.linux202-sitea.com/chrome-info.php?link=/info.php" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36"
  1. GET /info.php HTTP / 1.1 “302 2343” – “” Mozilla / 5.0 (X11; Linux i686) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome: Good no change here the first request for info.php and always l Chrome agent.

  2. GET /chrome-info.php?link=/info.php HTTP / 1.1 “200 575” – “” Mozilla / 5.0 (X11; Linux i686) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome: So far all is well we have the redirection to the chrome-info.php page with the original URL as argument.

  3. GET /favicon.ico and GET /chrome-info.php?link=/favicon.ico: The browser tries to have the icon of the page, we see that the redirection system works even for that: P, there could have improved the rule but stay there for the moment: D.

  4. That’s NOT any other request, no more loop, because when calling the chrome-info.php page the RewriteCond rules are no longer valid.

So you should see the page:

It’s good we have to do the redirection: D and it’s over … Uh, we’re gonna have another problem :). But good one problem at a time :), set up the redirection: D.

To do this I will modify the file chrome-info.php here is the new version:

 

Votre Fureteur est pas libre ... Chrome est privatif, je vous suggere Chromium : https://www.chromium.org/ !!

Ceci est uniquement pour vous informer ! vous serez redirige vers la bonne page dans 5 secondes vers l'URL

The only thing in addition is the line header which allows to define the html redirection. Unfortunately if you access the URL: https://www.linux202-sitea.com/info.php, there is the redirection to https://www.linux202-sitea.com/chrome-info.php? link = / info.php. However when the client is redirected to info.php, it is redirected again to chrome-info.php and this in perpetual way. You can see it, looking at the information time: 1463746918. In the logs we see it clearly (for information I deleted the information of the favicon.ico to simplify the visualization):

172.17.42.1 - - [20/May/2016:08:19:17 -0400] "GET /info.php HTTP/1.1" 302 675 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36" 172.17.42.1 - - [20/May/2016:08:19:17 -0400] "GET /chrome-info.php?link=/info.php HTTP/1.1" 200 600 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36" 172.17.42.1 - - [20/May/2016:08:19:22 -0400] "GET /info.php HTTP/1.1" 302 675 "https://www.linux202-sitea.com/chrome-info.php?link=/info.php" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36" 172.17.42.1 - - [20/May/2016:08:19:22 -0400] "GET /chrome-info.php?link=/info.php HTTP/1.1" 200 600 "https://www.linux202-sitea.com/chrome-info.php?link=/info.php" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36"

So we find ourselves still in a loop, but an application loop and no longer a configuration loop in mod_rewrite.

How to solve this problem because we want to inform the client, but not block it, well maybe if it is on the internet explorer: P, but not Chrome: P. There are several solutions, but a simple is the setting up of a cookie, I also chose this solution to demonstrate the use of a cookie with mod_rewrite. I think this is very relevant! So we will modify the php file to set the cookie when the client accesses the page:

 

Here I set a cookie named bad_browser and I set it a string with the name of the browser chrome plus I set a 60-second cookie expiration date. The expiration is too short for use in production because the customer will be redirected to the page after 1 minute of consultation of the site, but for the tests it is very practical: P. More information is available on the php: setcookie documentation page.

Now that the cookie is defined in the application I will modify the rules of mod_rewrite so that the redirection rule does not apply if the cookie named bad_browser is set.

 

RewriteCond "%{HTTP_USER_AGENT}" "(Chrome)" RewriteCond "%{REQUEST_URI}" !/chrome-info.php RewriteCond "%{HTTP_COOKIE}" !(bad_browser) RewriteRule "(.*)" "/chrome-info.php?link=$1" [R,L]

Establishment of rewriting condition according to the origin

Let’s look at another example of rewrite manipulation now to see a method, I would not take the time to deal with all the available variables because there are many:

  • https://httpd.apache.org/docs/2.4/fr/mod/mod_rewrite.html#rewritecond

So we will see the method to “protect” our images on the website, when I say protected I mean we will block the inclusion of your images on a remote website.

We will modify the content of the index.html page of the website of the siteA to attach an image:

 

$ cat /data/vhosts/siteA/docroot/index.html

Le site A

 

If you access the URL: https://www.linux202-sitea.com/ you will have the images of the Free Software Foundation:

Let’s look at the logs that are generated when accessing the site:

$ tail -f /data/vhosts/siteA/logs/ssl_access.log 172.17.42.1 - - [25/May/2016:08:30:07 -0400] "GET / HTTP/1.1" 200 668 "-" "Mozilla/5.0 (X11; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0" 172.17.42.1 - - [25/May/2016:08:30:07 -0400] "GET /Free_Software_Foundation_logo.png HTTP/1.1" 200 38268 "https://www.linux202-sitea.com/" "Mozilla/5.0 (X11; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0"

As you can see the inclusion of the images for reference the provenance: 200 38268 “https://www.linux202-sitea.com/” “Mozilla / 5.0.

Now we will change the configuration of the siteA so that the index page includes the image available on the site.

$ cat /data/vhosts/siteC/docroot/index.html

Le site C

Go to the C site to see the page and logs: http://sitec.com/

$ tail -f /data/vhosts/siteA/logs/ssl_access.log 172.17.42.1 - - [25/May/2016:08:36:23 -0400] "GET /Free_Software_Foundation_logo.png HTTP/1.1" 200 38406 "http://sitec.com/" "Mozilla/5.0 (X11; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0"

As we can see as I query the site’s logs, I do not see the access to the C site index, however I see the access to the image with the identifier of the reference: 200 38406 “http: / /sitec.com/ “” Mozilla / 5.0.

Now that we see the process of inclusion of an image by another site we will be able to start the manipulation with mod_rewrite. We will make sure that when a person includes an image of our site, the latter does not receive the original image but another file: D.

 

I put the file in the DocumentRoot of siteA.

Let’s go to rewriting the URL, so if there is access to png files from outside the site we will provide the images of the Lego. To do this we will edit the apache configuration for siteA:

 

$ cat /etc/apache2/sites-enabled/siteA-ssl.conf [ ... OUTPUT COUPÉ ... ] RewriteCond "%{HTTP_REFERER}" !https://www.linux202-sitea.com/ RewriteRule "(.*).png$" "/access-denied.jpg" [R,L] [ ... OUTPUT COUPÉ ... ]

We follow the same concept with a negative condition using the symbol! , reload the apache configuration and look at the logs:

$ sudo apachectl configtest && sudo /etc/init.d/apache2 restart Syntax OK Restarting web server apache2 * ...done. $ tail -f /data/vhosts/siteA/logs/ssl_access.log /data/vhosts/siteC/logs/access.log ==> /data/vhosts/siteA/logs/ssl_access.log <== 172.17.42.1 - - [25/May/2016:08:57:58 -0400] "GET / HTTP/1.1" 200 2198 "-" "Mozilla/5.0 (X11; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0" 172.17.42.1 - - [25/May/2016:08:57:59 -0400] "GET /Free_Software_Foundation_logo.png HTTP/1.1" 200 38268 "https://www.linux202-sitea.com/" "Mozilla/5.0 (X11; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0" ==> /data/vhosts/siteC/logs/access.log <== 172.17.42.1 - - [25/May/2016:08:58:41 -0400] "GET / HTTP/1.1" 200 470 "-" "Mozilla/5.0 (X11; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0" ==> /data/vhosts/siteA/logs/ssl_access.log <== 172.17.42.1 - - [25/May/2016:08:58:41 -0400] "GET /Free_Software_Foundation_logo.png HTTP/1.1" 302 787 "http://sitec.com/" "Mozilla/5.0 (X11; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0" 172.17.42.1 - - [25/May/2016:08:58:41 -0400] "GET /access-denied.jpg HTTP/1.1" 200 354733 "http://sitec.com/" "Mozilla/5.0 (X11; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0"

We see it clearly in the redirection logs because the source of the access is NOT the sitea.

Setting up redirection on the PATH of the URL

Finally, let’s look at the redirection of a URL based on the path, if your website uses a language such as php, python, ruby, … It is very likely that you have a database back to the storage of the content. We often end up with URLs containing example parameters:

  • https://www.linux202-sitea.com/articles.php?year=2016&title=le_super_article

This is not critical, however, apart from the aesthetic aspect, this at a cost on indexing in google, indeed a URL worth more than an argument … Moreover the user prefers to have a “hard” URL this is easier to remember and avoids argument errors with the characters: &,?, =, …

Once our redirection is set up here is the URL we will have:

 

  • https://www.linux202-sitea.com/articles/2016/le_super_article

It’s all the more beautiful … For the purposes of the demonstration I will define the file articles.php so that you can have a point of reference for your tests:

$ cat /data/vhosts/siteA/docroot/articles.php

 

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed elementum, lacus sed egestas faucibus, justo magna placerat ante, quis feugiat diam dui eu metus. Nam sit amet turpis arcu. Curabitur vel lobortis dui, ac consectetur lacus. Donec felis lectus, malesuada nec convallis in, hendrerit ut nisi. Phasellus sagittis, est sit amet dignissim iaculis, eros ex viverra nunc, vel porta neque dolor et justo. Etiam dignissim lacinia sollicitudin. Sed eget erat quam. Maecenas non molestie dui, quis dictum felis. Phasellus vel facilisis sapien. Curabitur non mollis leo, vel auctor nisi. Fusce pretium arcu dui, et efficitur mauris vulputate eget.

 



Before setting up the redirection let’s validate that the php file works as expected. Let’s go to the URL: https://www.linux202-sitea.com/articles.php?year=2016&title=le_super_article

So we see the information of the year as well as the title passed in parameter, below (not present on the screenshot) we also see the SQL query that would be used.

Let’s set up URL redirection now: /etc/apache2/sites-enabled/siteA-ssl.conf

 

[ ... OUTPUT COUPÉ ... ] RewriteEngine on RewriteRule "^/articles/([^/]*)/([^/]*)" "/articles.php?year=$1&title=$2" [PT] Rewriterule "(.*).gif$" "$1.png" [L] RewriteCond "%{HTTP_USER_AGENT}" "(Chrome)" RewriteCond "%{REQUEST_URI}" !/chrome-info.php RewriteCond "%{HTTP_COOKIE}" !(bad_browser) RewriteRule "(.*)" "/chrome-info.php?link=$1" [R,L] [ ... OUTPUT COUPÉ ... ]

Let’s analyze the rules: RewriteRule “^ / articles / ([^ /]) / ([^ /])” “/articles.php?year=$1&title=$2” [PT]

  • ^ / articles / ([^ /] *) / ([^ /] *): The URL will be processed if this starts with / articles, then I group the first “directory” in the variable $ 1 as well as the second “directory” in the $ 2 variable.

  • /articles.php?year=$1&title=$2: I pass the previously extracted information $ 1 and $ 2 with the correct variable name
  • [PT]: We mainly used the flag: R and L. By default, the target (or substitution string) of a RewriteRule rule is supposed to be a file path. With the flag [PT], however, it is treated as a URI. In other words, with the [PT] flag, the result of the RewriteRule rule is passed back to the URL mapping system with the file system, so that the mapping systems based on the For example, files such as the Alias directive, Redirect, or ScriptAlias, may have a chance to accomplish their task.

Let’s add another piece of data, I’m trying to answer a few classical cases that we encounter, so imagine that you already have articles present, which are not contained in the database system. For example if we have the following structure:

$ cat /data/vhosts/siteA/docroot/articles/1999/le_vieux_articles.html Un super Vieux Articles mais toujours super pertinents

So we have a 1999 articles contained in the articles directory the problem is that with the redirection rule in place you no longer have access to this file. The redirection rule intercepts the request to the file, so we will add a condition so that if the requested file actually exists on the file system the redirection does not apply.

RewriteEngine on RewriteCond "%{DOCUMENT_ROOT}/%{REQUEST_FILENAME}" !-f RewriteRule "^/articles/([^/]*)/([^/]*)" "/articles.php?year=$1&title=$2" [PT]
  • % {DOCUMENT_ROOT}: Defines the root of the website, the use of a variable allows to have an applicable solution whatever the site

  • % {REQUEST_FILENAME}: Sets the path in the URL that is requested

  • ! -f: Set the condition if NOT (!) a file

When not to use mod_rewrite

I invite you to visit: https://httpd.apache.org/docs/2.4/en/rewrite/avoid.html, to understand when it is not advisable to use mod_rewrite.

Reference:

  • https://httpd.apache.org/docs/2.4/fr/rewrite/

  • https://httpd.apache.org/docs/2.4/fr/rewrite/remapping.html

  • https://httpd.apache.org/docs/2.4/fr/rewrite/avoid.html

  • https://httpd.apache.org/docs/current/fr/rewrite/flags.html

  • http://www.useragentstring.com/pages/useragentstring.php