Saturday, 11 August 2018

bash - Can't remove first two encode characters using text editors in linux




When I use text editor for seeing content of file a.csv it shows me:



aaa bbb ccc ddd eee fff ggg hhh iii jjj kkk


But when I cat it I have:



��aaa   bbb ccc ddd eee fff ggg hhh iii jjj kkk



So when I want to remove first to characters �� I can't do that. For example:



cat a.csv | sed 's/\(.\{2\}\)//'


The result is:



��aa    bbb ccc ddd eee fff ggg hhh iii jjj kkk


Answer



This looks like a byte order mark that's prepended to your text.



If that is correct, you can fix this by converting your file to an encoding that doesn't use a byte order mark (for example plain UTF-8), and these two characters should be gone.



How you change the encoding of a file depends on the editor you use, in vim the command to use is :set nobomb.


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...