Why does wide file-stream in C++ narrow written data by default?(为什么默认情况下 C++ 中的宽文件流会缩小写入的数据?)
问题描述
老实说,我只是在 C++ 标准库中没有得到以下设计决策.将宽字符写入文件时,wofstream 会将
wchar_t 转换为
char
字符:
Honestly, I just don't get the following design decision in C++ Standard library. When writing wide characters to a file, the wofstream
converts wchar_t
into char
characters:
提供了一个 我知道这与标准
的codecvt
有关.utf8codecvt
" rel="nofollow noreferrer">Boost
.此外,utf16codecvt
binary-mode/208431#208431">马丁约克在这里.问题是为什么 standard codecvt
转换宽字符?为什么不按原样写字符!
I am aware that this has to do with the standard codecvt
. There is codecvt
for utf8
in Boost
. Also, there is a codecvt
for utf16
by Martin York here on SO. The question is why the standard codecvt
converts wide-characters? why not write the characters as they are!
另外,我们会用 C++0x 获得真正的 unicode 流
还是我在这里遗漏了什么?
Also, are we gonna get real unicode streams
with C++0x or am I missing something here?
推荐答案
C++ 用于字符集的模型继承自 C,因此至少可以追溯到 1989 年.
The model used by C++ for charsets is inherited from C, and so dates back to at least 1989.
两个要点:
- IO 是根据字符完成的.
- 确定字符序列化的宽度是语言环境的工作
- 默认语言环境(名为C")非常小(我不记得标准中的约束,这里它只能将 7 位 ASCII 作为窄字符集和宽字符集处理).
- 有一个名为"的环境确定的语言环境
所以要得到任何东西,你必须设置语言环境.
So to get anything, you have to set the locale.
如果我使用简单的程序
使用环境语言环境并将代码 0x00FF 的宽字符输出到文件中.如果我要求使用C"语言环境,我得到
which use the environment locale and output the wide character of code 0x00FF to a file. If I ask to use the "C" locale, I get
语言环境无法处理宽字符,我们会在 IO 失败时收到问题通知.如果我运行询问 UTF-8 语言环境,我会得到
the locale has been unable to handle the wide character and we get notified of the problem as the IO failed. If I run ask an UTF-8 locale, I get
(od -t x1 只是转储以十六进制表示的文件),正是我对 UTF-8 编码文件的期望.
(od -t x1 just dump the file represented in hex), exactly what I expect for an UTF-8 encoded file.
这篇关于为什么默认情况下 C++ 中的宽文件流会缩小写入的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!