Welcome Guest, Not a member yet? Register   Sign In
I know there is a better way...

What I'm trying to do is parse an HTML table, turning it into a CSV document. (each <tr> gets it's own line, each <td> value is separated by a ",")

I'm trying to turn:
Into this:

The PHP I have written is for a specific format, but I want a more general parser, so that I can just use this form quickly in the future with any other table no matter what the dimensions are.

Here is what I have (please, save coding practices preachyness for later):


if(isset($_POST['data'])) {
    $data = $_POST['data'];
    $tables = preg_split("/<table.*?&gt;/i",$data);

    $rows = preg_split("/<tr.*?&gt;/i",$tables[1]);
    for($x=0;$x<count($rows);$x++) {
        $cells[$x] = preg_split("/<\/td.*?&gt;/i",$rows[$x]);
    $s = array("\n","\r");
    $r = array("","");
    echo "<pre>\n";
    for($i=1;$i<count($cells);$i++) {
        $line = array($cells[$i][0],$cells[$i][1],str_replace(", ","\, ",$cells[$i][3]));
        $subject = strip_tags(implode(",",$line));
        echo str_replace($s,$r,$subject) . "\n";
    echo "\n</pre>";
} else {
    echo "This form takes tabular (HTML Tables) data and converts it to CSV format.";
    echo "&lt;form method=\"post\"&gt;\n";
    echo "&lt;textarea name=\"data\" cols=100 rows=15&gt;&lt;/textarea&gt;\n";
    echo "<br />&lt;input type=\"submit\" value=\"Convert\" /&gt;\n";
    echo "&lt;/form&gt;\n";


This works for a specific situation (as you can see I manually specified the cells to use in $line). But, like I said, I want this to be more flexible in the respect that the table could be any dimension.

I know I will figure this out eventually, but I've been thinking myself into a corner for a few hours and I figure maybe someone can smack some sense into me.

Thanks in advance!

I would treat the HTML table as a XML document.

Well that's the thing... the html spacing/lines might differ. That's gonna be a problem, isn't it?

Change the input-data. It's not smart to send chunks of HTML to your scripts. If you can, then write JavaScript code that uses the Browser's DOM structure to convert that HTML table to something like JSON.

I want to use PHP to handle the whole thing. I know it can be done in less steps than I am doing and that's what I'm asking help with doing. (and making it so I don't have to choose the columns)

I do not see any problem at all, besides the fact that under PHP4 the XML treatment is not very comfortable. But for a (comparatively) easy structure like this You should be quite fast through.

What do You mean by '...the html spacing/lines ...' ? The whitespace nodes ?

Some xml functions for PHP4
$table; // the variable containing the HTMl  table
if ($dom = domxml_open_mem($table))
    $root = $dom->document_element();
    foreach($root->child_nodes() as $child_node)
        // get the nodes by name
        if( $child_node->node_name() == 'tr')
            foreach($child_node->child_nodes() as $td_node)
                 if( $td_node->node_name() == 'td')
                    $my_td_node = $td_node;
                    // here You can extract whatever You want from this node

// some other functions (see php manual DOM XML functions and DOM functions)

Theme © iAndrew 2016 - Forum software by © MyBB