Welcome Guest, Not a member yet? Register   Sign In
Charset, CI or MySQL problem ??? ( or dumb coder problem ??? )
#1

[eluser]charlie spider[/eluser]
So in the administrative backend for a boot company site i'm building, i have a text area in a form where they can write a description for each boot, and then click save to put everything into the MySQL database.

it's a standard text area and the code i'm using to save it is like this:

Code:
if ( $this->Boot_model->update_boot_text(
            $this->input->post('ID'),
            $this->input->post('number'),
            $this->input->post('name'),
            $this->input->post('type'),
            $this->input->post('desc1'),
            $this->input->post('desc2')
        ))

then i use standard Active Record method to insert it into the db.

Code:
$this->db->update('boot_table', $data_array)

To redisplay the info in the admin area, or on the public side of the site, i pass the entire object to my controller ( instead of converting to an array ):

Code:
$query = $this->db->get();
if ($query->num_rows() > 0)
{
    foreach ($query->result() as $row)
    {
        $content = $row;
    }
    return $content;
}

and then refer to the data object directly:

Code:
<p class="desc">' . $boot->desc1 . '</p>';
<p class="desc">' . $boot->desc2 . '</p>';

this imho is way easier than dealing with the extra layers of arrays that happens when you convert the query row results to arrays


ANYWAYS...

the problem is that i am trying to populate the database with info from a PDF version of this company's catalogue, and if i copy a bunch of text from the PDF, such as:

Code:
A 10” lace-to-toe Stitch down boot with
Vibram® Sierra oil resistant lug sole with
urethane cushioned heel. The best easy entry
style to fit high instep and special widths.
CSA grade 1 steel toe and plate.

paste it into the text area and save it...

it comes back looking like this:

Code:
A 10

or it will break on other characters such as the registered symbol:

Code:
Vibram®


It's not even that these characters are getting stripped from the paragraph, the entire paragraph just ends whenever any kind of special character comes along.

Is this because of CI ?

I have never had the time to go through every line of code in the CI libraries ( nor do i want to at this moment ), so does anybody know specifically which library / function i should be looking for if i need to do some customization ?

Or is this a charset problem ? Or is it a MySQL problem ?

i can add the same paragraph directly into the db with phpMyAdmin and it doesn't break the paragraphs like above, but it also doesn't convert the characters properly, and adds all kinds of strangeness. So whatever other problems i have ( not including my only personal issues Wink ) i also seem to have a charset problem.

Any and all help is greatly appreciated

Thank you
#2

[eluser]Randy Casburn[/eluser]
this likely has to do with the textarea. Are you using an editor in the textarea like Xinha or something similar? If so, what happens when you use the "plain text" mode of the editor and paste it directly into the text box?
#3

[eluser]Randy Casburn[/eluser]
I just realized the text was breaking on newline characters and quotes. You are sanitizing your inputs prior to sending to your and retrieving from your database right?

If you're using the Active Record class, this is done for you. Otherwise, you must ensure to escape your input strings to account for these types of characters.

Randy
#4

[eluser]charlie spider[/eluser]
just using plain old text areas. nothing fancy.

the text is breaking when it hits quotes, but with other blocks of text without quotes it is adding funny characters such as
Code:
With 9 “ tops, Vibram®

as far as i know i am using the Active Record method ie:
Code:
$this->db->update();

but am not doing anythng else to the data because i thought, as you have stated, that the AR method is supposed to escape the data for me.

would this have anything to do with the web pages charset encoding ?
#5

[eluser]Randy Casburn[/eluser]
Hey charlie -- I can't help but be caught up on this "if i copy a bunch of text from the PDF".

What happens if the text is "typed" by hand. Can we eliminate the cut and paste from the PDF thing? Can you send me the PDF? I have Acrobat and would like to look at the file.

I guess the _last_ place I want to look is CI <grin>

Randy
#6

[eluser]charlie spider[/eluser]
i'm pretty set on it being a charset / collation problem

even when i copy the text from the PDF to notepad, and then from notepad to my form, it breaks on certain characters.

i also noticed that the double quotes used in the PDF are more slanted than the ones that appear when i just use regular duoble quotes ( the ones beside Enter on the keyboard )

so i think i just need to research different charsets and pick the most appropriate one.

both my database and CI database.php file are set to UTF charset and UTF_general_ci for collation (as are the various tables collation)

but what confuses me and made me think it has something to do with CI is that i can cut n paste the same text from the PDF directly into phpMyAdmin and it doesn't break. All characters are copied correctly. It's only if i use my form that it breaks.

:\
#7

[eluser]Randy Casburn[/eluser]
OK -- Post up your Boot_model->update_boot_text in code tags pls.
#8

[eluser]charlie spider[/eluser]
from boot_model.php:
Code:
function update_boot_text($bootID, $number, $name, $type, $desc1, $desc2)
{
    $data = array('catID' => $type, 'Nmbr' => $number, 'Name' => $name, 'desc1' => $desc1, 'desc2' => $desc2);
    $this->db->where('bootID', $bootID);
    if ( $this->db->update('boot_table', $data))
    { return true; }
}





from controller:
Code:
$rules['number'] = "trim|required|min_length[2]|max_length[12]";
$rules['name'] = "trim|required|min_length[4]|max_length[24]";
$rules['desc1'] = "trim|required|max_length[2048]";
$rules['desc2'] = "trim|required|max_length[4096]";                    

if ( $this->Boot_model->update_boot_text(    
    $this->input->post('bootID'),
    $this->input->post('nmbr'),
    $this->input->post('name'),
    $this->input->post('boot_type'),
    $this->input->post('shortDesc'),
    $this->input->post('longDesc')
    ))
{
// go back to form
}




from admin section form:
Code:
echo heading ('Short Description (bold text)', 5) . '<br />';
$value = ($this->validation->desc1) != NULL ? $this->validation->desc1: $boot->desc1;
echo form_textarea(array('name' => 'desc1', 'class' => 'boot', 'value' => $value, 'cols'=>'12', 'rows'=>'4'));  

echo '<br />';

echo heading ('Long Description', 5) . '<br />';
$value = ($this->validation->desc2) != NULL ? $this->validation->desc2: $boot->desc2;
echo form_textarea(array('name' => 'desc2', 'class' => 'boot', 'value' => $value, 'cols'=>'24', 'rows'=>'24'));


from PDF:
( this is the desc1 text )
Code:
A 10” lace-to-toe Stitch down boot with Vibram® Sierra oil resistant lug sole with urethane cushioned heel. The best easy entry style to fit high instep and special widths. CSA grade 1 steel toe and plate.
#9

[eluser]Randy Casburn[/eluser]
Alrighty then -- let's cut to the chase.

before this line "$data = array('catID' =>..." at line one (1) of function update_boot_text() let's take a look at the encoding of the string prior to getting stuffed into the table (then, after it comes out too)...

Prior to what goes into the DB:
Code:
$boolResult = mb_check_encoding($desc1, 'UTF-8' );

Then check the result of that function. Obviously you've got to have the mb string extension loaded up for this to work. True means char codes match as it goes to the DB.

Do the same thing as you retrieve the record. Test the data again. Important! Make sure you're testing the raw data and not something massaged by CI first.

I suspect both with return 'true' and the char_sets will match. if not, you'll know where it is getting mangled.

If they both test true, the next thing I really think you should try is manually changing every single quote and double quote. The prevents "smart quotes" software (like you've observered in the PDF) from changing the character code to something odd that may not exist in your font set. Make certain you change anything that looks like a backtick or straight up and down single quote into a normal single quote.

Try all that and let's keep going.

Randy
#10

[eluser]charlie spider[/eluser]
true and true

charsets are good

one dumb dumb thing i overlooked was the encoding of a template view file i use that all of my content gets dumped into. somewhere along the line, it somehow got changed to iso-8859-1 but i've changed it back to utf-8

that fixes the problem with the registered symbol but it still chokes on those non-standard double quotes.

i will just edit them manually and run with that.

i wish there was an obvious solution to this

thanks for all of your help btw
it is greatly appreciated




Theme © iAndrew 2016 - Forum software by © MyBB