Saturday 13 July 2019

php - Regex - Convert HTML to valid XML tag






I need help writing a regex function that converts HTML string to a valid XML tag name. Ex: It takes a string and does the following:




  • If an alphabet or underscore occurs in the string, it keeps it

  • If any other character occurs, it's removed from the output string.

  • If any other character occurs between words or letters, it's replaced with an Underscore.





Ex:
Input: Date Created
Ouput: Date_Created

Input: Date
Created
Output: Date_Created

Input: Date\nCreated

Output: Date_Created

Input: Date 1 2 3 Created
Output: Date_Created



Basically the regex function should convert the HTML string to a valid XML tag.


Answer



A bit of regex and a bit of standard functions:




function mystrip($s)
{
// add spaces around angle brackets to separate tag-like parts
// e.g. "
" becomes "
"
// then let strip_tags take care of removing html tags
$s = strip_tags(str_replace(array('<', '>'), array(' <', '> '), $s));

// any sequence of characters that are not alphabet or underscore
// gets replaced by a single underscore

return preg_replace('/[^a-z_]+/i', '_', $s);
}

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...