Welcome Guest, Not a member yet? Register   Sign In
Is it possible to intercept external garbage URLs to a search routine instead of 404 page?
#1

[eluser]John_Betong_002[/eluser]
 
Google Webmaster Tools is complaining of numerous links returning with "An Error Was Encountered [400]".

Edit - start:

Just noticed that my original post was truncatedSad

What I would like to do is to somehow trap the URL before it fails the routing tests, etc.

I would like to use the following code:
Code:
$bad_chars = array('width','height','=',':','<','>','alt','//', 'etc');

  $good_words  = str_replace($bad_chars, '/', $_SERVER['REDIRECT_URL']); // REQUEST_URL

  header('Location: /my_search_routine/' . $good_words);
  exit;

Edit - end:


The following is an example which results in application/errors/error_general.php

&nbsp;
http://website.com/afiles/images/santa-email.jpg" width="100" height="50" alt="image"/></a> </div> <div class="c0 r"><a
#2

[eluser]WanWizard[/eluser]
Make sure your images exist?
#3

[eluser]John_Betong_002[/eluser]
&nbsp;
The images all exist, the problem is the trailing junk.

Try appending the junk onto a known URL image on your site and see what happens.


I have just tried appending the junk onto your avatar and the response I get is "Oops! This link appears to be broken."

&nbsp;

http://ellislab.com/images/avatars/uploa...55.jpgOops! This link appears to be broken.

&nbsp;
&nbsp;
&nbsp;
#4

[eluser]InsiteFX[/eluser]
The problem comes form the href content which starts with a single quote but is erroneously closed with a double quote - so it's not actually closed until another single quote is found further down. So all of http://www.snapshotjourneys.com/uploads/...ersity.jpg" width="81" height="50" alt="image"/></a> </div> <div class="c0 r"><a
#5

[eluser]John_Betong_002[/eluser]
&nbsp;
Just updated my original post to include the requirements which were truncated.

&nbsp;
&nbsp;
&nbsp;
#6

[eluser]WanWizard[/eluser]
I still don't see how you can get into this situation other than invalid HTML or invalid links.
Which is your problem as a developer, and you should fix that, not work around it.
#7

[eluser]John_Betong_002[/eluser]
I got into this situation by other webmasters using incorrect hotlinks. I have no control over these other sites but it appears I am being penalised by Google for not having corresponding landing pages for the bad URLs.

Here are Google Webmaster Tools's first two from eighteen web sites that have invalid links:

http://ezentials.com/eqk-7-days-before-santa-rfc.html

http://fivestarsmarketplace.com/lov-diag...ables.html

Search the source code for "afiles/images" and as you will see the first part of the image URLs is correct but the complete URL is invalid.

I was hoping to find a way to test the URL before CI routed the URL to an error page. This would also be ideal for filtering all the other hotlinked images.
&nbsp;
&nbsp;
&nbsp;
#8

[eluser]WanWizard[/eluser]
Ok. So this is about other sites linking to your site?

Then instead of a standard CI 404, route to a 404 controller returning a 200 status, and displays a 404 page with links to important parts of your application.
#9

[eluser]John_Betong_002[/eluser]
[quote author="WanWizard" date="1302561152"]

Ok. So this is about other sites linking to your site?

Then instead of a standard CI 404, route to a 404 controller returning a 200 status, and displays a 404 page with links to important parts of your application.

[/quote]
&nbsp;
Ah the "penny has dropped".

I was curious to know why my code was being ignored in the /application/errors/error_general.php. I will try remming the "header("HTTP/1.1 404 Not Found");" script and report back tomorrow... now it is way past my bed time Smile

Many thanks.
&nbsp;
&nbsp;
&nbsp;
#10

[eluser]John_Betong_002[/eluser]
Nearly there but cannot get both conditions to work together.

What I would like to do is to somehow trap the external URL before it fails the routing tests, etc.

The following .htaccess in the images folder is supposed to:
1. accept image links from my own site
2. intercept all external links and divert to an ./images/index.php
(where URL is parsed and routed to a search routine).

.htacees
Code:
RewriteEngine on

  # this line redirects everything to index.php including links from my own site
  # RewriteRule (.*) index.php

  RewriteCond %{HTTP_REFERER} !^$
  RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?johns-jokes.com [NC]

  # RewriteRule \.$ ./index.php
  # RewriteRule (.*) index.php/$1 [R,NC,L]


&nbsp;
./images/index.php
Code:
&lt;?php
  // this works fine
  // 1. parses the URI
  // 2. formats the results
  // 3. redirects the results to my search routine with parameters

  $x = $_SERVER['REQUEST_URI'];
  if(strpos($x, '.'))
  {
    // bad link used for testing
    // $x = http://johns-jokes.com//afiles/images/days-before-christmas.png" width="39" alt="image"/>

    $x=substr($x,15);
    $i2=strpos($x, '.');
    $x=substr($x, 0, $i2);

    $x=str_replace('-','/', $x);
    header ('HTTP/1.1 301 Moved Permanently');
    header('Location: http://johns-jokes.com/joke/search/' .$x, TRUE, 301);
    exit;
  }
&nbsp;
&nbsp;
&nbsp;




Theme © iAndrew 2016 - Forum software by © MyBB