I need help writing a regex function that converts HTML string to a valid XML tag name. Ex: It takes a string and does the following:
- If an alphabet or underscore occurs in the string, it keeps it
- If any other character occurs, it's removed from the output string.
- If any other character occurs between words or letters, it's replaced with an Underscore.
Ex:
Input: Date Created
Ouput: Date_Created
Input: Date
Created
Output: Date_Created
Input: Date\nCreated
Output: Date_Created
Input: Date 1 2 3 Created
Output: Date_Created
Basically the regex function should convert the HTML string to a valid XML tag.
Answer
A bit of regex and a bit of standard functions:
function mystrip($s)
{
// add spaces around angle brackets to separate tag-like parts
// e.g. "
" becomes "
"
// then let strip_tags take care of removing html tags
$s = strip_tags(str_replace(array('<', '>'), array(' <', '> '), $s));
// any sequence of characters that are not alphabet or underscore
// gets replaced by a single underscore
return preg_replace('/[^a-z_]+/i', '_', $s);
}
No comments:
Post a Comment