後來不了了之..XD
今天又碰到同樣的問題了..
原來如下所說:
======================================================================
There is no such thing as an "UTF-8 String". A String is composed of characters, whereas UTF-8 is a method of converting between chracters and bytes.
======================================================================
What do you mean by "a String in UTF-8 format"? Java Strings are composed of 16-bit chars, so they are UTF-16 (although Unicode surrogates aren't handled properly until 1.5). UTF-8 is an appropriate encoding for an array of bytes, which you already have.
===============================================================================
There is not such thing as a UTF-8 string. A String is a string of characters, each one of which can be returned by the charAt(int pos) method.
======================================================================
So, you want to store a byte-array (in this case it contains characters in UTF-8 format) into a String in such a way that the byte-array does not get changed/encoded? You want to circumvent the UTF-8 to UTF-16 encoding? I
don't think that is possible.
A String contains an array of 'char', not an array of 'byte'. And a char is a UTF-16 character.... A 'byte' is not a 'char', so conversion is necessary. Any String-constructor taking a byte-array will do some kind of conversion on the input byte-array (to properly convert it into a char-array).
===============================================================================
所以在A.java的程式碼中寫的 String str = new String("哈囉");
在A.class中, 這個"哈囉"字串會被JVM 編譯成UTF-16,
然後當A.class在被執行時, "哈囉"字串又會被轉碼成作業系統的charset.
所以在中文的windows執行A.class, "哈囉"字串就會變成Big5的編碼.
而在Linux環境上執行A.class, "哈囉"字串就會是UTF8的編碼.
假設A.class是一隻在Linux 上的server程式, 這個"哈囉"會被send給client端.
若client端用utf8的方式來存取stream, 就可以正確顯示.
(若client端的程式是由IDE run起來, 那麼IDE的console charset也要是utf-8, 才能在console正確看到"哈囉", 否則也會是亂碼)
但若A.class在中文的Windows上被執行, 那麼同一隻client用utf8的方式去接"哈囉"時, 就會看到亂碼.
debug的方式, 是把字串的byte array 以raw data印出來, 看其編碼是big5還是utf8.
解決編碼的問題(localization), 就是不把中文寫在程式中, 而是寫在文檔中, 而將該文檔轉成utf8.
A.class在run time時從文檔中讀出utf8的"哈囉"字串再send出去, 就保證A.class不會因為系統平台的差異而丟出不同編碼的"哈囉"字串了.
在程式碼中寫中文是不好的習慣,
無論是註解還是字串都一樣...
請參考"許功蓋"issue.
沒有留言:
張貼留言