Login

05-17-2008, 08:03 AM

[eluser]codex[/eluser]
I have this function for normalizing tags before writing them to the database. It works, that's not the issue, but I was just wondering if it's possible to shorten or simplify the sucker.

If not clear, what it does is it strips characters like '!@#$%^&*(' and replaces characters like 'éèëê' with e, 'ÀÁÂÃ' with A etc.

Code:
function normalize_tag($s) {

        $s = htmlentities($s);

        $s = str_replace('http://', '', $s); // Strip http://

        $s = preg_replace ('/&([a-zA-Z])(uml|acute|grave|circ|tilde|cedil|ring);/', '$1', $s); // Replace accented chars with regular chars

        $s = html_entity_decode($s);

        $s = preg_replace('/ +/', '_', $s); // Replace multiple spaces with underscore

        $s = preg_replace('/-+/', '_', $s); // Replace multiple divisions with underscore

        $s = preg_replace('/_+/', '_', $s); // Replace multiple underscores with just one

        $s = ereg_replace('[^A-Za-z0-9_-]', '', $s);

        $s = strtolower($s);

        return $s;

    }

05-17-2008, 08:56 AM

[eluser]TheLoops[/eluser]
If by "Replace multiple spaces/divisions/underscores" you mean "only if > 1 are found"…
… then try this combination:

Code:
$s = preg_replace('/( {2,}|-{2,}|_{2,})/', '_', $s); // Replace multiple spaces/divisions/underscores with a single underscore

… else try this:

Code:
$s = preg_replace('/[ \-_]+/', '_', $s); // Replace multiple spaces/divisions/underscores with a single underscore

05-17-2008, 09:18 AM

[eluser]Pascal Kriete[/eluser]
Here's my take on the thing. It's a little shorter and deals with some exceptions. Not tested though, so no guarantees Smile

.

Code:
// This way we don't have to mess with uppercase the rest of the time

$s = strtolower(htmlentities($s));

// We don't need to capture the second group, so I made it optional || added slash for oslash

$s = preg_replace ('/&([a-z])(?:uml|acute|grave|circ|tilde|cedil|ring|slash);/', '$1', $s);

// Weird characters that don't get caught above - also includes &eth;and &thorn;, but I don't know what the best replacement for those would be.

// While we're at it, we'll also get the http

$s = str_replace( array('&szlig;', '&aelig;', 'http://'), array('ss', 'ae', ''), $s);

$s = html_entity_decode($s);

// Normalize multiple spaces, dashes, and underscores

$s = preg_replace( array('/\s+/', '/-+/', '/_+/'), '_', $s);

// I've added a space here

$s = preg_replace('/[^a-z 0-9_-]/', '', $s);

return $s;

05-17-2008, 05:59 PM

[eluser]codex[/eluser]
[quote author="inparo" date="1211055482"]Here's my take on the thing. It's a little shorter and deals with some exceptions. Not tested though, so no guarantees Smile

.

Code:
// This way we don't have to mess with uppercase the rest of the time

$s = strtolower(htmlentities($s));

// We don't need to capture the second group, so I made it optional || added slash for oslash

$s = preg_replace ('/&([a-z])(?:uml|acute|grave|circ|tilde|cedil|ring|slash);/', '$1', $s);

// Weird characters that don't get caught above - also includes &eth;and &thorn;, but I don't know what the best replacement for those would be.

// While we're at it, we'll also get the http

$s = str_replace( array('&szlig;', '&aelig;', 'http://'), array('ss', 'ae', ''), $s);

$s = html_entity_decode($s);

// Normalize multiple spaces, dashes, and underscores

$s = preg_replace( array('/\s+/', '/-+/', '/_+/'), '_', $s);

// I've added a space here

$s = preg_replace('/[^a-z 0-9_-]/', '', $s);

return $s;

[/quote]

That works indeed. It's not spectacularly shorter (which I hoped was possible), but you shaved off a few bits, which is always good. Plus I learned something new in the process. So thank you!

05-18-2008, 12:06 AM

[eluser]xwero[/eluser]
Nice use of htmlentities/html_entity_decode to catch accented chars. I think those cover German, French, Dutch, Spanish and Scandinavic languages.