① 图片文件名(汉字)批量转换成 unicode编码。比如:一.bmp转成 004E.bmp 谢谢!求方法或软件!
一.bmp转成unicode是4e00.bmp啊,跟你的要求不一样啊
② 下载的文件名都是乱码
这种情况往往是由于网站和浏览器之间存在兼容性问题导致的,在安装 Python 后,可以利用批处内理命令容调用代码python -c "import sys, urllib as ul; print ul.unquote_plus(sys.argv[1])"进行文件名修复。
③ 如何解压文件名是UTF8编码的压缩包
UTF8其实和Unicode是同类,就是在编码方式上不同! 首先UTF8编码后的大小是不一定,不像Unicode编码后的大小是一样的! 我们先来看Unicode的编码:一个英文字母 “a” 和 一个汉字 “好”,编码后都是占用的空间大小是一样的,都是两个字节! 而UTF8编码:一个英文字母“a” 和 一个汉字 “好”,编码后占用的空间大小就不样了,前者是一个字节,后者是三个字节! 现在就让我们来看看UTF8编码的原理吧: 因为一个字母还有一些键盘上的符号加起来只用二进制七位就可以表示出来,而一个字节就是八位,所以UTF8就用一个字节来表式字母和一些键盘上的符号。然而当我们拿到被编码后的一个字节后怎么知道它的组成?它有可能是英文字母的一个字节,也有可能是汉字的三个字节中的一个字节!所以,UTF8是有标志位的! 当要表示的内容是 7位 的时候就用一个字节:0******* 第一个0为标志位,剩下的空间正好可以表示ASCII 0-127 的内容。 当要表示的内容在 8 到 11 位的时候就用两个字节:110***** 10****** 第一个字节的110和第二个字节的10为标志位。 当要表示的内容在 12 到 16 位的时候就用三个字节:1110***** 10****** 10****** 和上面一样,第一个字节的1110和第二、三个字节的10都是标志位,剩下的占湔�每梢员硎竞鹤帧?BR> 以此类推: 四个字节:11110**** 10****** 10****** 10****** 五个字节:111110*** 10****** 10****** 10****** 10****** 六个字节:1111110** 10****** 10****** 10****** 10****** 10****** UTF-7:A Mail-Safe Transformation Format of Unicode(RFC1642)。这是一种使用 7 位 ASCII 码对 Unicode 码进行转换的编码。它的设计目的仍然是为了在只能传递 7 为编码的邮件网关中传递信息。 UTF-7 对英语字母、数字和常见符号直接显示,而对其他符号用修正的 Base64 编码。符号 + 和 – 号控制编码过程的开始和暂停。所以乱码中如果夹有英文单词,并且相伴有 + 号和 – 号,这就有可能是 UTF-7 编码。 关于UTF7的更多资料如下(都是英语的,如果想具体了解再看): UTF-7 A Mail-Safe Transformation Format of Unicode Status of this Memo This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Abstract The Unicode Standard, version 2.0, and ISO/IEC 10646-1:1993(E) (as amended) jointly define a character set (hereafter referred to as Unicode) which encompasses most of the world's writing systems. However, Internet mail (STD 11, RFC 822) currently supports only 7- bit US ASCII as a character set. MIME (RFC 2045 through 2049) extends Internet mail to support different media types and character sets, and thus could support Unicode in mail messages. MIME neither defines Unicode as a permitted character set nor specifies how it would be encoded, although it does provide for the registration of additional character sets over time. This document describes a transformation format of Unicode that contains only 7-bit ASCII octets and is intended to be readable by humans in the limiting case that the document consists of characters from the US-ASCII repertoire. It also specifies how this transformation format is used in the context of MIME and RFC 1641, "Using Unicode with MIME". Motivation Although other transformation formats of Unicode exist and could conceivably be used in this context (most notably UTF-8, also known as UTF-2 or UTF-FSS), they suffer the disadvantage that they use octets in the range decimal 128 through 255 to encode Unicode characters outside the US-ASCII range. Thus, in the context of mail, those octets must themselves be encoded. This requires putting text through two successive encoding processes, and leads to a significant expansion of characters outside the US-ASCII range, putting non- English speakers at a disadvantage. For example, using UTF-8 together with the Quoted-Printable content transfer encoding of MIME represents US-ASCII characters in one octet, but other characters may require up to nine octets. Overview UTF-7 encodes Unicode characters as US-ASCII octets, together with shift sequences to encode characters outside that range. For this purpose, one of the characters in the US-ASCII repertoire is reserved for use as a shift character. Many mail gateways and systems cannot handle the entire US-ASCII character set (those based on EBCDIC, for example), and so UTF-7 contains provisions for encoding characters within US-ASCII in a way that all mail systems can accomodate. UTF-7 should normally be used only in the context of 7 bit transports, such as mail. In other contexts, straight Unicode or UTF-8 is preferred. See RFC 1641, "Using Unicode with MIME" for the overall specification on usage of Unicode transformation formats with MIME. Definitions First, the definition of Unicode: The 16 bit character set Unicode is defined by "The Unicode Standard, Version 2.0". This character set is identical with the character repertoire and coding of the international standard ISO/IEC 10646-1:1993(E); Coded Representation Form=UCS-2; Subset=300; Implementation Level=3, including the first 7 amendments to 10646 plus editorial corrections. Note. Unicode 2.0 further specifies the use and interaction of these character codes beyond the ISO standard. However, any valid 10646 sequence is a valid Unicode sequence, and vice versa; Unicode supplies interpretations of sequences on which the ISO standard is silent as to interpretation. Next, some handy definitions of US-ASCII character subsets: Set D (directly encoded characters) consists of the following characters (derived from RFC 1521, Appendix B, which no longer appears in RFC 2045): the upper and lower case letters A through Z and a through z, the 10 digits 0-9, and the following nine special characters (note that "+" and "=" are omitted): Character ASCII & Unicode Value (decimal) ' 39 ( 40 ) 41 , 44 – 45 . 46 / 47 : 58 ? 63 Set O (optional direct characters) consists of the following characters (note that "\" and "~" are omitted): Character ASCII & Unicode Value (decimal) ! 33 " 34 # 35 $ 36 % 37 & 38 * 42 ; 59 < 60 = 61 > 62 @ 64 [ 91 ] 93 ^ 94 _ 95 ' 96 { 123 | 124 } 125 Rationale. The characters "\" and "~" are omitted because they are often redefined in variants of ASCII. Set B (Modified Base 64) is the set of characters in the Base64 alphabet defined in RFC 2045, excluding the pad character "=" (decimal value 61). Rationale. The pad character = is excluded because UTF-7 is designed for use within header fields as set forth in RFC 2047. Since the only readable encoding in RFC 2047 is "Q" (based on RFC 2045's Quoted- Printable), the "=" character is not available for use (without a lot of escape sequences). This was very unfortunate but unavoidable. The "=" character could otherwise have been used as the UTF-7 escape character as well (rather than using "+"). Note that all characters in US-ASCII have the same value in Unicode when zero-extended to 16 bits. UTF-7 Definition A UTF-7 stream represents 16-bit Unicode characters using 7-bit US- ASCII octets as follows: Rule 1: (direct encoding) Unicode characters in set D above may be encoded directly as their ASCII equivalents. Unicode characters in Set O may optionally be encoded directly as their ASCII equivalents, bearing in mind that many of these characters are illegal in header fields, or may not pass correctly through some mail gateways. Rule 2: (Unicode shifted encoding) Any Unicode character sequence may be encoded using a sequence of characters in set B, when preceded by the shift character "+" (US-ASCII character value decimal 43). The "+" signals that subsequent octets are to be interpreted as elements of the Modified Base64 alphabet until a character not in that alphabet is encountered. Such characters include control characters such as carriage returns and line feeds; thus, a Unicode shifted sequence always terminates at the of a line. As a special case, if the sequence terminates with the character "-" (US-ASCII decimal 45) then that character is absorbed; other terminating characters are not absorbed and are processed normally. Note that if the first character after the shifted sequence is "-" then an extra "-" must be present to terminate the shifted sequence so that the actual "-" is not itself absorbed. Rationale. A terminating character is necessary for cases where the next character after the Modified Base64 sequence is part of character set B or is itself the terminating character. It can also enhance readability by delimiting encoded sequences. Also as a special case, the sequence "+-" may be used to encode the character "+". A "+" character followed immediately by any character other than members of set B or "-" is an ill-formed sequence. Unicode is encoded using Modified Base64 by first converting Unicode 16-bit quantities to an octet stream (with the most significant octet first). Surrogate pairs (UTF-16) are converted by treating each half of the pair as a separate 16 bit quantity (i.e., no special treatment). Text with an odd number of octets is ill-formed. ISO 10646 characters outside the range addressable via surrogate pairs cannot be encoded. Rationale. ISO/IEC 10646-1:1993(E) specifies that when characters the UCS-2 form are serialized as octets, that the most significant octet appear first. This is also in keeping with common network practice of choosing a canonical format for transmission. Rationale. The policy for code point allocation within ISO 10646 and Unicode is that the repertoires be kept synchronized. No code points will be allocated in ISO 10646 outside the range addressable by surrogate pairs. Next, the octet stream is encoded by applying the Base64 content transfer encoding algorithm as defined in RFC 2045, modified to omit the "=" pad character. Instead, when encoding, zero bits are added to pad to a Base64 character boundary. When decoding, any bits at the end of the Modified Base64 sequence that do not constitute a complete 16-bit Unicode character are discarded. If such discarded bits are non-zero the sequence is ill-formed. Rationale. The pad character "=" is not used when encoding Modified Base64 because of the conflict with its use as an escape character for the Q content transfer encoding in RFC 2047 header fields, as mentioned above. Rule 3: The space (decimal 32), tab (decimal 9), carriage return (decimal 13), and line feed (decimal 10) characters may be directly represented by their ASCII equivalents. However, note that MIME content transfer encodings have rules concerning the use of such characters. Usage that does not conform to the restrictions of RFC 822, for example, would have to be encoded using MIME content transfer encodings other than 7bit or 8bit, such as quoted-printable, binary, or base64. Given this set of rules, Unicode characters which may be encoded via rules 1 or 3 take one octet per character, and other Unicode characters are encoded on average with 2 2/3 octets per character plus one octet to switch into Modified Base64 and an optional octet to switch out. Example. The Unicode sequence "A<NOT IDENTICAL TO><ALPHA>." (hexadecimal 0041,2262,0391,002E) may be encoded as follows: A+ImIDkQ. Example. The Unicode sequence "Hi Mom -<WHITE SMILING FACE>-!" (hexadecimal 0048, 0069, 0020, 004D, 006F, 006D, 0020, 002D, 263A, 002D, 0021) may be encoded as follows: Hi Mom -+Jjo–! Example. The Unicode sequence representing the Han characters for the Japanese word "nihongo" (hexadecimal 65E5,672C,8A9E) may be encoded as follows: +ZeVnLIqe-
④ Windows下的文件名是什么编码的
Windows7 英文版系统 的 文件名内部编码是 Unicode,也有网友亲测说 Windows7简体中文版的系统 的 文件名内部编码是版GB2312 通常简体中文windows采用GB2312,极个别权文本字符采用utf-8字符集,有时文件编码格式也通常指代文件的类型,windows对程序或数据文件的识别,一般通过文件扩展名来完成,也就是通过扩展名来识别一个文件是数据文件、可执行文件、文本文件、音乐文件~~你的电脑是win系列吗?可以查看的:用记事本打开,然后文件–另存为 在对话框最下面的编码那一栏就有文件对应的编码。
⑤ linux下的文件名怎么是乱码的
如果你需要在Linux中操作windows下的文件,那么你可能会经常遇到文件编码转换的问题。中默认的文件格式是GBK(gb2312),而Linux一般都是UTF-8。查看编码的方法 方法一:file filename方法二:在Vim中可以直接查看文件编码:set fileencoding如果你只是想查看其它编码格式的文件或者想解决用Vim查看文件乱码的问题,那么你可以在~/.vimrc 文件中添加以下内容:set encoding=utf-8 fileencodings=ucs-bom,utf-8,cp936这样,就可以让vim自动识别文件编码(可以自动识别UTF-8或者GBK编码的文件),其实就是依照fileencodings提供的编码列表尝试,如果没有找到合适的编码,就用latin-1(ASCII)编码打开文件编码转换 多平台方法:iconv 提供标准的程序和API来进行编码转换;convert_encoding.py 基于Python的文本文件转换工具;decodeh.py 提供算法和模块来谈测字符的编码;linux下文件编码转换: 方法一: 在Vim中直接进行转换文件编码,比如将一个文件转换成utf-8格式:set fileencoding=utf-8 或者 11)设置文件集合,即要对哪些文件进行操作,可以使用通配符,比如我通常是对 C/C++ 源程序进行编码转换 :args *.h *.cpp 2)给出要在每个文件上执行的命令,这里是转换编码: :argdo set fenc=utf-8 | update 方法二: iconv 转换 5.案例: 假如说我们将windows下的一个UTF-8的文件传到linux环境下,linux环境下的系统编码是GB18030,我们cat的时候就会出现乱码,这个时候就应该想到转码了,下面我们来进行试验: 我们将windows下一个名为UTF-8.sh的文件传到linux系统中,其中UTF-8.sh的内容如下: 我是中文编码UTF-8模式~ 而linux系统的系统语言设置为: [[email protected] zy]# cat /etc/sysconfig/i18n LANG=zh_CN.GB18030SYSFONT="latarcyrheb-sun16"这个时候查看一下文件的内容及编码: [[email protected] zy]# file UTF-8.sh UTF-8.sh: UTF-8 Unicode text, with no line terminators[[email protected] zy]# cat UTF-8.sh 锘挎垜鏄?腑鏂囩紪镰乁TF-8妯″纺~[[email protected] zy]# [[email protected] zy]# 这个时候我们就需要转换编码了,记得使用iconv [[email protected] zy]# iconv -f UTF-8 -t GB18030 UTF-8.sh -o GB18030.sh[[email protected] zy]# cat GB18030.sh??我是中文编码UTF-8模式~[[email protected] zy]# [[email protected] zy]# file GB18030.sh GB18030.sh: Non-ISO extended-ASCII text, with no line terminators[[email protected] zy]# convmv就是更改文件名编码方式的一个工具。比如 sudo convmv -f gbk -t utf-8 -r –notest /home 就是将/home目录下原来文件名是gbk编码方式的全部改为utf-8格式的。这里 -f 后面为原来的编码方式,-t 后面是要更改为的编码方式, -r 表示这个目录下面的所有文件, –notest 表示马上执行,而不是仅仅测试而已。另外这命令好像要root才能执行,因此要加上 sudo。
⑥ potplayer显示一堆信息(文件名,视频解码器,音频解码器),怎么关闭
OSD打开关闭的快捷键都是Tab。
从PotPlayer的OSD就能直接获取到当前视频和播放器的大部分状态,这也是PotPlayer吸引人的一大重要特色。PotPlayer的OSD给出的信息很直观,且信息量大,多数情况下需要直接按Tab根据OSD信息来分析当前状态。可看到视频编码器,编码,分辨率,帧率,位率(码率),音频解码器等。
⑦ 电脑保存文件时中文文件名使用的编码格式
文件名的编码是跟系统默认编码一样
⑧ excel文件名带一串编码似的东西,请问怎么去掉
这是要批量修改文件名吧可以用DOS命令解决。先用DIR命令将文件夹中的文档名列表保存到TXT文件中,比如:dir [盘符][路径][/w]>ml.txt在文本编辑器或者EXCEL中将文件名处理好,最后形成如下格式的行数据:ren 原文件名 新文件名将其保存后将其保存为.TXT文件,再将后缀改为.bat在DOS环境下运行该批处理文件即可。
⑨ windows ren命令时文件名含有url编码,如何重命名
这是因为百分号(%)在cmd命令中属于特殊符号,所以会出现问题。因此您要把整个文件名路径用英文双引号括起来才行。具体如下:
ren"C:4_%20%E5%BC%A0%E5%85%A8%E8%9B%8B%20_1234.jpg"3456.jpg
⑩ 解码文件安装失败,请查看解码文件是否存在!文件名tsccvid.dll
试试下载一个tsccvid.dll文件到C:\WINDOWS\system32,下载地址 http://download.pchome.net/dll/t/detail-182944-1.html有可能可以解决问题。
未经允许不得转载:山九号 » 文件名解码|Windows下的文件名是什么编码的