Thursday, 2 November 2017

Why do some Unicode characters display in matrices, but not data frames in R?

For at least some cases, Asian characters are printable if
they are contained in a matrix, or a
vector, but not in a data.frame. Here
is an example


q<-'天'
q #
Works
# [1] "天"
matrix(q) # Works
# [,1]
# [1,]
"天"
q2<-data.frame(q,stringsAsFactors=FALSE)
q2 # Does not
work
# q
# 1
q2[1,] # Works
again.
# [1]
"天"

Clearly, my device is capable
of displaying the character, but when it is in a data.frame, it
does not work.


Doing some digging, I found that the
print.data.frame function runs format
on each column. It turns out that if you run format.default
directly, the same problem
occurs:


format(q)
#
""

Digging into
format.default, I find that it is calling the internal
format, written in C.


Before I
dig any further, I want to know if others can reproduce this behaviour. Is
there some configuration of R that would allow me to display these characters within
data.frames?


My
sessionInfo(), if it
helps:


R version 3.0.1
(2013-05-16)
Platform: x86_64-w64-mingw32/x64
(64-bit)
locale:
[1] LC_COLLATE=English_Canada.1252
LC_CTYPE=English_Canada.1252
[3] LC_MONETARY=English_Canada.1252
LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252
attached base
packages:
[1] stats graphics grDevices utils datasets methods
base
loaded via a namespace (and not attached):
[1]
tools_3.0.1

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...