Is it actually possible to:
1) Parse the HTML output of BBCode tags
2) Scan it
3) Accurately recreate the original tags?
I see a lot of tags like [indent][/indent] that look like the following in HTML:
I don't know, does indent _always_ produce that HTML? It just looks so unsafe to meHTML Code:<div style="margin-left:40px">indent_text</div>
The reason I'm asking is because I'm trying to parse individual posts on the site and recreate the BBCode tags; I need to know what type of BBCode tag I'm looking at as depending on the tag I either want to get the text (only the text) in the tag (including descendants), or get nothing - for instance, if the tag is an image tag or a quote tag I ignore it.
At the moment what I'm doing is I look for certain characteristics that distinguish tags from one another (for instance, [center][/center] is
), and then I recreate the tag (I have specific types for each tag) as an object consisting of options plus the list of tags contained within the tag. So something like:HTML Code:<div style = "text-align: center">TEXT</div>
[color=#ffd421]
[b]
Text, text
[/b]
[/color]
Would become:
btw the code isn't actually PHP; that's just pseudocode.PHP Code:
new Color("#ffd421",
{ new Bold(
{new Text("Text, text")}
)
}
);