Welcome Guest, Not a member yet? Register   Sign In
Duplicate content
#1
Question 

Hello,

Using the CI Routes system, I realize that this can generate content duplication.

Here is an example of a route

PHP Code:
$route['my-page'] = "controllerNumber1/index";
$route['cat/other-page'] = "controllerNumber2/index"

The url "my-page.html" is accessible from :
- mywebsite.com/my-page
- mywebsite.com/controllerNumber1/ 

In the robots.txt I have to put :
User-agent: *
Disallow: /controllerNumber1/
Disallow: /controllerNumber2/ etc.

But this is very constraining, especially for multi-language sites.

What is the best solution for this problem?

Ideally, I would like to return a 404 code on urls such as: mywebsite.com/controllerNumber1/

Sincerely

H&T
Reply
#2

(This post was last modified: 01-30-2020, 01:15 PM by jreklund.)

They only way for Google or people to find that re-mapped controller are that you have used it before.

In case you want to do it anyway. You can grab what you want with:
https://codeigniter.com/user_guide/helpe...uri_string
https://codeigniter.com/user_guide/libraries/uri.html

And the real way to tell Google the real website url:
https://support.google.com/webmasters/an...9066?hl=en
Reply
#3

Thank you for your response

This could also be used in a negative SEO attack.

Concerning the Class URI or URL Helper, is it possible to automate the task?

For example:

if (uri_string == "ControllerName.php") :

Where "Controller Name" would be retrieved automatically



Sincerely
Reply
#4

Never heard or read anything about negative SEO attack. How can an URL to your website be negative in terms of getting people to visit your website?

For getting the filename: basename(__FILE__).
Reply
#5

For example, on CodeIgniter, your page is duplicate.

Ex :
- https://codeigniter.com/Help/index
- https://codeigniter.com/en/help

Other examples :
- https://codeigniter.com
- https://codeigniter.com/en/home
- https://codeigniter.com/Home/index

If I link from my site to https://codeigniter.com/Help/index, this page will be indexed and duplicated in the Google index

And Google may penalize the duplication of internal and/or external content.

It is indeed possible to put a Canonical URL, but this is still a patch.

If you are interested, here are some articles on duplication
- https://support.google.com/webmasters/an...6359?hl=en
- https://moz.com/learn/seo/duplicate-content

Sincerely
Reply
#6

Thanks, I have personally never had a problem with multiple URLs (negative SEO). As no-one have tried targeting my controllers. But it may just be that I have always used this "patch" recommended by Google. :-) It's on the website you just linked me.

Quote:Google does not recommend blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel="canonical" link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Search Console.

So instead of writing multiple 301 links in every controller for all possible invalid url. Output what you want them too see in rel="cannonical" and they will know what to do with it.
Reply
#7

Thank you for your message.

It was all for your information.

If you analyze GoogleBot's server logs in detail, you'll see that it's better to block the crawl via the robots.txt or return a 404 code rather than using a canonical

The Canonical uses more Crawl Budget than the Disallow or page 404

Sincerely
Reply
#8

Thanks for your thoughts regarding this. Do you have any proof regarding those statements? As it goes against all recommendations on ALL search engines. Here are Bing for another example: https://www.bing.com/webmaster/help/webm...s-30fba23a

Do a 301 in case you know the real URL, in case it have been moved. Or you prefer a certain URL. And have canonical for the rest. For those URLs that have the same content, but you want to track where people are coming from.
Reply




Theme © iAndrew 2016 - Forum software by © MyBB