Welcome Guest, Not a member yet? Register   Sign In
Security risks with parsing a feed in to a DB?
#1

[eluser]kyleect[/eluser]
I'm writing a simple app that runs through a list of feeds, looks for new or updated feed items and inserts the feed item's title, description and timestamp in to a database. This works great however I noticed some of the feeds I'm putting in to it has HTML and Javascript inside the feed item's description. I need the HTML intact because some of the posts have both text and images however just having HTML and javascript being outputted from the database and run on a page is a very scary thought to me. Is there a way to sanitize the feed item's content without changing the post's text and images.

Here is an example of one of the feeds I am working with:

http://www.facebook.com/feeds/share_post...rmat=rss20
#2

[eluser]kgill[/eluser]
Why is it scary? The db is just outputting exactly what you'd see if you loaded the page in your browser, if you don't trust loading the feed in your browser then you've got bigger problems than your app. Tongue That said you're going to have to do some parsing work if you want to sanitize things, either you search for the text and any images and strip all the html replacing it with your own version or you try and strip out anything you think is bad like JS. Either way you're stuck writing some sort of parser.




Theme © iAndrew 2016 - Forum software by © MyBB