Welcome Guest, Not a member yet? Register   Sign In
I hate Regex
#1

[eluser]Iverson[/eluser]
I consider myself a decent programmer but I have yet to master regexes. I'm taking over a journalist's website and the guys who designed the original site actually used FLAT FILES to store his columns. Not only that, but the flatfiles contain multiple columns! UGH! This guy has 100+ columns so I figure the best way is to write a parser to simply take the title and content from the files. Any help with the regex code is MORE than welcomed! Below is the format:


<B><A NAME="999999999"><FONT COLOR="#00000" SIZE="2" FACE="arial">Column Title</FONT></B>

<FONT COLOR="#000000" SIZE="2" FACE="arial"> <A HREF="mailto:[email protected]"></A> <br>
<FONT COLOR="#000000" SIZE="2" FACE="arial"><br><img src='http://example.com/image.jpg' width='110' height='117' hspace='5' align='left'>Column Content<br><br>###


I realize this might not be the best place for this, so Admins feel free to relocate to wherever you think it should go...
#2

[eluser]GSV Sleeper Service[/eluser]
eek, that's a nasty looking system right there.
I've found that this is a great help when dealing with regular expressions.
http://regex.powertoy.org/
#3

[eluser]Eric Cope[/eluser]
Can you be a little more clear about what you are trying to do?
On a side note, I used to hate regular expressions, but this site helped:
http://www.regular-expressions.info/
I use this tool to test my regular expressions:
http://kodos.sourceforge.net/
#4

[eluser]Iverson[/eluser]
[quote author="Eric Cope" date="1231455838"]Can you be a little more clear about what you are trying to do?[/quote]

I would like to parse through the HTML above and have an array where say, $array_titles[0] = 'Column Title' and $array_content[0] = 'Column Content'
#5

[eluser]meigwilym[/eluser]
Not being an expert either, but I don't think that you need much regexing for this.

Take the first line for example, you need to match a string that starts with '<B><A ><FONT COLOR=”#00000” SIZE=“2” FACE=“arial”>' and ends with '</FONT></B>'. Thank simply remove them from the string, and what you have left is the column title.

Code:
&lt;?php

# string
$s = 'klhklghfdjskg <B><A NAME="999999999"><FONT COLOR="#00000" SIZE="2" FACE="arial">ColumnTitle</FONT></B> fdsfsdf dfsdf ';

# expression
$p = '/<B><A NAME="999999999"><FONT COLOR="#00000" SIZE="2" FACE="arial">([a-zA-Z0-9 ]*)<\/FONT><\/B>/';


preg_match($p, $s, $m);

var_dump($m);

The above code will return

Code:
array(2) {
  [0]=>
  string(88) "<B><A NAME="999999999"><FONT COLOR="#00000" SIZE="2" FACE="arial">ColumnTitle</FONT></B>"
  [1]=>
  string(11) "ColumnTitle"
}

I'm sure you can pick the rest of it up.

Here's an excellent tutorial and cheatsheet

http://neverfear.org/blog/view/Regex_tut...___Part_1/

http://www.addedbytes.com/cheat-sheets/r...eat-sheet/

Best,

Mei
#6

[eluser]Iverson[/eluser]
Thanks. This is definitely a good start. What about the fact that each file has multiple columns in it?
#7

[eluser]Eric Cope[/eluser]
This is thinking outside the box... but if you need to get the data from each column, can you copy and paste special into Excel? That will strip the formatting, and then you can export to csv for importing into a database.
#8

[eluser]Iverson[/eluser]
[quote author="Eric Cope" date="1231457814"]This is thinking outside the box... but if you need to get the data from each column, can you copy and paste special into Excel? That will strip the formatting, and then you can export to csv for importing into a database.[/quote]

Love the idea. But what exactly is pasting special? I'll admit, I'm not the Microsoft Office guru, lol. I just tried pasting the plain HTML as "Paste Special" and it didn't seem to be anything legible to export. I definitely like that idea though!
#9

[eluser]Nick Husher[/eluser]
If you're using perl-compatible regexes (preg_match, preg_test, etc), you can use the MochiRegExp utility.

http://mochikit.com/examples/mochiregexp/index.html

It allows you to recieve instant feedback on a given regexp matches a given string.
#10

[eluser]Eric Cope[/eluser]
In the edit menu, you can Copy, Paste, and Paste Special. If you select Paste Special, it gives you several options (html, unicode, plain text, etc). One of them pastes each HTML td cell into an Excel cell.




Theme © iAndrew 2016 - Forum software by © MyBB