Friday, 25 October 2019

utf 8 - Setting the default Java character encoding



How do I properly set the default character encoding used by the JVM (1.5.x) programmatically?




I have read that -Dfile.encoding=whatever used to be the way to go for older JVMs. I don't have that luxury for reasons I wont get into.



I have tried:



System.setProperty("file.encoding", "UTF-8");


And the property gets set, but it doesn't seem to cause the final getBytes call below to use UTF8:




System.setProperty("file.encoding", "UTF-8");

byte inbytes[] = new byte[1024];

FileInputStream fis = new FileInputStream("response.txt");
fis.read(inbytes);
FileOutputStream fos = new FileOutputStream("response-2.txt");
String in = new String(inbytes, "UTF8");
fos.write(in.getBytes());


Answer



Unfortunately, the file.encoding property has to be specified as the JVM starts up; by the time your main method is entered, the character encoding used by String.getBytes() and the default constructors of InputStreamReader and OutputStreamWriter has been permanently cached.



As Edward Grech points out, in a special case like this, the environment variable JAVA_TOOL_OPTIONS can be used to specify this property, but it's normally done like this:



java -Dfile.encoding=UTF-8 … com.x.Main


Charset.defaultCharset() will reflect changes to the file.encoding property, but most of the code in the core Java libraries that need to determine the default character encoding do not use this mechanism.




When you are encoding or decoding, you can query the file.encoding property or Charset.defaultCharset() to find the current default encoding, and use the appropriate method or constructor overload to specify it.


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...