Welcome Guest, Not a member yet? Register   Sign In
Can you simplify this function?
#1

[eluser]codex[/eluser]
I have this function for normalizing tags before writing them to the database. It works, that's not the issue, but I was just wondering if it's possible to shorten or simplify the sucker.

If not clear, what it does is it strips characters like '!@#$%^&*(' and replaces characters like 'éèëê' with e, 'ÀÁÂÃ' with A etc.

Code:
function normalize_tag($s) {
        
        $s = htmlentities($s);
        $s = str_replace('http://', '', $s); // Strip http://
        $s = preg_replace ('/&([a-zA-Z])(uml|acute|grave|circ|tilde|cedil|ring);/', '$1', $s); // Replace accented chars with regular chars
        $s = html_entity_decode($s);
        $s = preg_replace('/ +/', '_', $s); // Replace multiple spaces with underscore
        $s = preg_replace('/-+/', '_', $s); // Replace multiple divisions with underscore
        $s = preg_replace('/_+/', '_', $s); // Replace multiple underscores with just one
        $s = ereg_replace('[^A-Za-z0-9_-]', '', $s);
        $s = strtolower($s);
        
        return $s;
        
    }
#2

[eluser]TheLoops[/eluser]
If by "Replace multiple spaces/divisions/underscores" you mean "only if > 1 are found"…
… then try this combination:
Code:
$s = preg_replace('/( {2,}|-{2,}|_{2,})/', '_', $s); // Replace multiple spaces/divisions/underscores with a single underscore

… else try this:
Code:
$s = preg_replace('/[ \-_]+/', '_', $s); // Replace multiple spaces/divisions/underscores with a single underscore
#3

[eluser]Pascal Kriete[/eluser]
Here's my take on the thing. It's a little shorter and deals with some exceptions. Not tested though, so no guarantees Smile .
Code:
// This way we don't have to mess with uppercase the rest of the time
$s = strtolower(htmlentities($s));

// We don't need to capture the second group, so I made it optional || added slash for oslash
$s = preg_replace ('/&([a-z])(?:uml|acute|grave|circ|tilde|cedil|ring|slash);/', '$1', $s);

// Weird characters that don't get caught above - also includes ðand þ, but I don't know what the best replacement for those would be.
// While we're at it, we'll also get the http
$s = str_replace( array('ß', 'æ', 'http://'), array('ss', 'ae', ''), $s);

$s = html_entity_decode($s);

// Normalize multiple spaces, dashes, and underscores
$s = preg_replace( array('/\s+/', '/-+/', '/_+/'), '_', $s);

// I've added a space here
$s = preg_replace('/[^a-z 0-9_-]/', '', $s);

return $s;
#4

[eluser]codex[/eluser]
[quote author="inparo" date="1211055482"]Here's my take on the thing. It's a little shorter and deals with some exceptions. Not tested though, so no guarantees Smile .
Code:
// This way we don't have to mess with uppercase the rest of the time
$s = strtolower(htmlentities($s));

// We don't need to capture the second group, so I made it optional || added slash for oslash
$s = preg_replace ('/&([a-z])(?:uml|acute|grave|circ|tilde|cedil|ring|slash);/', '$1', $s);

// Weird characters that don't get caught above - also includes ðand þ, but I don't know what the best replacement for those would be.
// While we're at it, we'll also get the http
$s = str_replace( array('ß', 'æ', 'http://'), array('ss', 'ae', ''), $s);

$s = html_entity_decode($s);

// Normalize multiple spaces, dashes, and underscores
$s = preg_replace( array('/\s+/', '/-+/', '/_+/'), '_', $s);

// I've added a space here
$s = preg_replace('/[^a-z 0-9_-]/', '', $s);

return $s;
[/quote]

That works indeed. It's not spectacularly shorter (which I hoped was possible), but you shaved off a few bits, which is always good. Plus I learned something new in the process. So thank you!
#5

[eluser]xwero[/eluser]
Nice use of htmlentities/html_entity_decode to catch accented chars. I think those cover German, French, Dutch, Spanish and Scandinavic languages.




Theme © iAndrew 2016 - Forum software by © MyBB