模块 URI

URI 模块提供类来处理统一资源标识符 (RFC2396)。

特性¶ ↑

统一处理 URI 的方式。
引入自定义 URI 方案的灵活性。
拥有替代的 URI::Parser（或只是不同的模式和正则表达式）的灵活性。

基本示例¶ ↑

require 'uri'

uri = URI("http://foo.com/posts?id=30&limit=5#time=1305298413")
#=> #<URI::HTTP http://foo.com/posts?id=30&limit=5#time=1305298413>

uri.scheme    #=> "http"
uri.host      #=> "foo.com"
uri.path      #=> "/posts"
uri.query     #=> "id=30&limit=5"
uri.fragment  #=> "time=1305298413"

uri.to_s      #=> "http://foo.com/posts?id=30&limit=5#time=1305298413"

添加自定义 URI¶ ↑

module URI
  class RSYNC < Generic
    DEFAULT_PORT = 873
  end
  register_scheme 'RSYNC', RSYNC
end
#=> URI::RSYNC

URI.scheme_list
#=> {"FILE"=>URI::File, "FTP"=>URI::FTP, "HTTP"=>URI::HTTP,
#    "HTTPS"=>URI::HTTPS, "LDAP"=>URI::LDAP, "LDAPS"=>URI::LDAPS,
#    "MAILTO"=>URI::MailTo, "RSYNC"=>URI::RSYNC}

uri = URI("rsync://rsync.foo.com")
#=> #<URI::RSYNC rsync://rsync.foo.com>

RFC 参考¶ ↑

查看 RFC 规范的好地方是 www.ietf.org/rfc.html。

以下是所有相关 RFC 的列表

`Class` 树¶ ↑

URI::Generic（在 uri/generic.rb 中）
- URI::File - (在 uri/file.rb 中)
- URI::FTP - (在 uri/ftp.rb 中)
- URI::HTTP - (在 uri/http.rb 中)
  - URI::HTTPS - (在 uri/https.rb 中)
- URI::LDAP - (在 uri/ldap.rb 中)
  - URI::LDAPS - (在 uri/ldaps.rb 中)
- URI::MailTo - (在 uri/mailto.rb 中)
URI::Parser - (在 uri/common.rb 中)
URI::REGEXP - (在 uri/common.rb 中)
- URI::REGEXP::PATTERN - (在 uri/common.rb 中)
URI::Util - (在 uri/common.rb 中)
URI::Error - (在 uri/common.rb 中)
- URI::InvalidURIError - (在 uri/common.rb 中)
- URI::InvalidComponentError - (在 uri/common.rb 中)
- URI::BadURIError - (在 uri/common.rb 中)

版权信息¶ ↑

作者: Akira Yamada <akira@ruby-lang.org>
文档: Akira Yamada <akira@ruby-lang.org> Dmitry V. Sabanin <sdmitry@lrn.ru> Vincent Batts <vbatts@hashbangbash.com>
许可证: 版权所有 © 2001 akira yamada <akira@ruby-lang.org> 您可以在与 Ruby 相同的条款下重新分发和/或修改它。

常量

DEFAULT_PARSER
INITIAL_SCHEMES
RFC2396_PARSER
RFC3986_PARSER
TBLENCURICOMP_

公共类方法

const_missing (const)

源码

# File lib/uri/common.rb, line 43
def self.const_missing(const)
  if const == :REGEXP
    warn "URI::REGEXP is obsolete. Use URI::RFC2396_REGEXP explicitly.", uplevel: 1 if $VERBOSE
    URI::RFC2396_REGEXP
  elsif value = RFC2396_PARSER.regexp[const]
    warn "URI::#{const} is obsolete. Use RFC2396_PARSER.regexp[#{const.inspect}] explicitly.", uplevel: 1 if $VERBOSE
    value
  elsif value = RFC2396_Parser.const_get(const)
    warn "URI::#{const} is obsolete. Use RFC2396_Parser::#{const} explicitly.", uplevel: 1 if $VERBOSE
    value
  else
    super
  end
end

调用父类方法

decode_uri_component (str, enc=Encoding::UTF_8)

源码

# File lib/uri/common.rb, line 402
def self.decode_uri_component(str, enc=Encoding::UTF_8)
  _decode_uri_component(/%\h\h/, str, enc)
end

类似于 URI.decode_www_form_component，但保留 '+'。

decode_www_form (str, enc=Encoding::UTF_8, separator: '&', use__charset_: false, isindex: false)

源码

# File lib/uri/common.rb, line 577
def self.decode_www_form(str, enc=Encoding::UTF_8, separator: '&', use__charset_: false, isindex: false)
  raise ArgumentError, "the input of #{self.name}.#{__method__} must be ASCII only string" unless str.ascii_only?
  ary = []
  return ary if str.empty?
  enc = Encoding.find(enc)
  str.b.each_line(separator) do |string|
    string.chomp!(separator)
    key, sep, val = string.partition('=')
    if isindex
      if sep.empty?
        val = key
        key = +''
      end
      isindex = false
    end

    if use__charset_ and key == '_charset_' and e = get_encoding(val)
      enc = e
      use__charset_ = false
    end

    key.gsub!(/\+|%\h\h/, TBLDECWWWCOMP_)
    if val
      val.gsub!(/\+|%\h\h/, TBLDECWWWCOMP_)
    else
      val = +''
    end

    ary << [key, val]
  end
  ary.each do |k, v|
    k.force_encoding(enc)
    k.scrub!
    v.force_encoding(enc)
    v.scrub!
  end
  ary
end

返回从给定字符串 str 派生的名称/值对，该字符串必须是 ASCII 字符串。

该方法可用于解码 Net::HTTPResponse 对象 res 的主体，其中 res['Content-Type'] 为 'application/x-www-form-urlencoded'。

返回的数据是一个包含 2 个元素的子数组数组；每个子数组是一个名称/值对（都是字符串）。每个返回的字符串都具有编码 enc，并通过 String#scrub 删除了无效字符。

一个简单的例子

URI.decode_www_form('foo=0&bar=1&baz')
# => [["foo", "0"], ["bar", "1"], ["baz", ""]]

返回的字符串具有某些转换，类似于在 URI.decode_www_form_component 中执行的转换

URI.decode_www_form('f%23o=%2F&b-r=%24&b+z=%40')
# => [["f#o", "/"], ["b-r", "$"], ["b z", "@"]]

给定的字符串可能包含连续的分隔符

URI.decode_www_form('foo=0&&bar=1&&baz=2')
# => [["foo", "0"], ["", ""], ["bar", "1"], ["", ""], ["baz", "2"]]

可以指定不同的分隔符

URI.decode_www_form('foo=0--bar=1--baz', separator: '--')
# => [["foo", "0"], ["bar", "1"], ["baz", ""]]

decode_www_form_component (str, enc=Encoding::UTF_8)

源码

# File lib/uri/common.rb, line 391
def self.decode_www_form_component(str, enc=Encoding::UTF_8)
  _decode_uri_component(/\+|%\h\h/, str, enc)
end

返回从给定的 URL 编码字符串 str 解码的字符串。

给定的字符串首先被编码为 Encoding::ASCII-8BIT（使用 String#b），然后解码（如下所示），最后强制编码为给定的编码 enc。

返回的字符串

保留
- 字符 '*'、'.'、'-' 和 '_'。
- 字符在范围 'a'..'z'、'A'..'Z' 和 '0'..'9' 中。
示例
```
URI.decode_www_form_component('*.-_azAZ09')
# => "*.-_azAZ09"
```

转换

字符 '+' 为字符 ' '。
每个“百分比表示法”为 ASCII 字符。

示例

URI.decode_www_form_component('Here+are+some+punctuation+characters%3A+%2C%3B%3F%3A')
# => "Here are some punctuation characters: ,;?:"

相关：URI.decode_uri_component (保留 '+')。

encode_uri_component (str, enc=nil)

源码

# File lib/uri/common.rb, line 397
def self.encode_uri_component(str, enc=nil)
  _encode_uri_component(/[^*\-.0-9A-Z_a-z]/, TBLENCURICOMP_, str, enc)
end

类似于 URI.encode_www_form_component，但 ' '（空格）被编码为 '%20'（而不是 '+'）。

encode_www_form (enum, enc=nil)

源码

# File lib/uri/common.rb, line 524
def self.encode_www_form(enum, enc=nil)
  enum.map do |k,v|
    if v.nil?
      encode_www_form_component(k, enc)
    elsif v.respond_to?(:to_ary)
      v.to_ary.map do |w|
        str = encode_www_form_component(k, enc)
        unless w.nil?
          str << '='
          str << encode_www_form_component(w, enc)
        end
      end.join('&')
    else
      str = encode_www_form_component(k, enc)
      str << '='
      str << encode_www_form_component(v, enc)
    end
  end.join('&')
end

返回从给定 Enumerable enum 派生的 URL 编码字符串。

该结果适用于用作 Content-Type 为 'application/x-www-form-urlencoded' 的 HTTP 请求的表单数据。

返回的字符串由 enum 的元素组成，每个元素都转换为一个或多个 URL 编码的字符串，并用字符 '&' 连接。

简单示例

URI.encode_www_form([['foo', 0], ['bar', 1], ['baz', 2]])
# => "foo=0&bar=1&baz=2"
URI.encode_www_form({foo: 0, bar: 1, baz: 2})
# => "foo=0&bar=1&baz=2"

返回的字符串使用方法 URI.encode_www_form_component 形成，该方法转换某些字符

URI.encode_www_form('f#o': '/', 'b-r': '$', 'b z': '@')
# => "f%23o=%2F&b-r=%24&b+z=%40"

当 enum 类似于数组时，每个元素 ele 都转换为一个字段

如果 ele 是一个包含两个或多个元素的数组，则该字段由其前两个元素形成（并忽略任何其他元素）

name = URI.encode_www_form_component(ele[0], enc)
value = URI.encode_www_form_component(ele[1], enc)
"#{name}=#{value}"

示例

URI.encode_www_form([%w[foo bar], %w[baz bat bah]])
# => "foo=bar&baz=bat"
URI.encode_www_form([['foo', 0], ['bar', :baz, 'bat']])
# => "foo=0&bar=baz"

如果 ele 是一个包含一个元素的数组，则该字段由 ele[0] 形成

URI.encode_www_form_component(ele[0])

示例

URI.encode_www_form([['foo'], [:bar], [0]])
# => "foo&bar&0"

否则，该字段由 ele 形成

URI.encode_www_form_component(ele)

示例

URI.encode_www_form(['foo', :bar, 0])
# => "foo&bar&0"

类数组的 enum 的元素可以是混合的

URI.encode_www_form([['foo', 0], ['bar', 1, 2], ['baz'], :bat])
# => "foo=0&bar=1&baz&bat"

当 enum 类似于哈希时，每个 key/value 对都转换为一个或多个字段

如果 value 是可转换为数组的对象，则 value 中的每个元素 ele 都与 key 配对以形成一个字段

name = URI.encode_www_form_component(key, enc)
value = URI.encode_www_form_component(ele, enc)
"#{name}=#{value}"

示例

URI.encode_www_form({foo: [:bar, 1], baz: [:bat, :bam, 2]})
# => "foo=bar&foo=1&baz=bat&baz=bam&baz=2"

否则，key 和 value 配对以形成一个字段

name = URI.encode_www_form_component(key, enc)
value = URI.encode_www_form_component(value, enc)
"#{name}=#{value}"

示例

URI.encode_www_form({foo: 0, bar: 1, baz: 2})
# => "foo=0&bar=1&baz=2"

类哈希的 enum 的元素可以是混合的

URI.encode_www_form({foo: [0, 1], bar: 2})
# => "foo=0&foo=1&bar=2"

encode_www_form_component (str, enc=nil)

源码

# File lib/uri/common.rb, line 358
def self.encode_www_form_component(str, enc=nil)
  _encode_uri_component(/[^*\-.0-9A-Z_a-z]/, TBLENCWWWCOMP_, str, enc)
end

返回从给定字符串 str 派生的 URL 编码字符串。

返回的字符串

保留
- 字符 '*'、'.'、'-' 和 '_'。
- 字符在范围 'a'..'z'、'A'..'Z' 和 '0'..'9' 中。
示例
```
URI.encode_www_form_component('*.-_azAZ09')
# => "*.-_azAZ09"
```

转换

字符 ' ' 为字符 '+'。
任何其他字符为“百分比表示法”；字符 c 的百分比表示法为 '%%%X' % c.ord。

示例

URI.encode_www_form_component('Here are some punctuation characters: ,;?:')
# => "Here+are+some+punctuation+characters%3A+%2C%3B%3F%3A"

编码

如果 str 具有编码 Encoding::ASCII_8BIT，则忽略参数 enc。
否则，str 首先转换为 Encoding::UTF_8（使用适当的字符替换），然后再转换为编码 enc。

在任何一种情况下，返回的字符串都强制编码为 Encoding::US_ASCII。

相关：URI.encode_uri_component (将 ' ' 编码为 '%20')。

for (scheme, *arguments, default: Generic)

源码

# File lib/uri/common.rb, line 146
def self.for(scheme, *arguments, default: Generic)
  const_name = scheme.to_s.upcase

  uri_class = INITIAL_SCHEMES[const_name]
  uri_class ||= if /\A[A-Z]\w*\z/.match?(const_name) && Schemes.const_defined?(const_name, false)
    Schemes.const_get(const_name, false)
  end
  uri_class ||= default

  return uri_class.new(scheme, *arguments)
end

返回一个由给定的 scheme、arguments 和 default 构建的新对象

新对象是 URI.scheme_list[scheme.upcase] 的实例。
通过使用 scheme 和 arguments 调用类初始化器来初始化该对象。请参阅 URI::Generic.new。

示例

values = ['john.doe', 'www.example.com', '123', nil, '/forum/questions/', nil, 'tag=networking&order=newest', 'top']
URI.for('https', *values)
# => #<URI::HTTPS https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top>
URI.for('foo', *values, default: URI::HTTP)
# => #<URI::HTTP foo://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top>

join (*str)

源码

# File lib/uri/common.rb, line 234
def self.join(*str)
  DEFAULT_PARSER.join(*str)
end

根据 RFC 2396 合并给定的 URI 字符串 str。

在合并之前，str 中的每个字符串都转换为 RFC3986 URI。

示例

URI.join("http://example.com/","main.rbx")
# => #<URI::HTTP http://example.com/main.rbx>

URI.join('http://example.com', 'foo')
# => #<URI::HTTP http://example.com/foo>

URI.join('http://example.com', '/foo', '/bar')
# => #<URI::HTTP http://example.com/bar>

URI.join('http://example.com', '/foo', 'bar')
# => #<URI::HTTP http://example.com/bar>

URI.join('http://example.com', '/foo/', 'bar')
# => #<URI::HTTP http://example.com/foo/bar>

open (name, *rest, &block)

源码

# File lib/open-uri.rb, line 23
def self.open(name, *rest, &block)
  if name.respond_to?(:open)
    name.open(*rest, &block)
  elsif name.respond_to?(:to_str) &&
        %r{\A[A-Za-z][A-Za-z0-9+\-\.]*://} =~ name &&
        (uri = URI.parse(name)).respond_to?(:open)
    uri.open(*rest, &block)
  else
    super
  end
end

允许打开各种资源，包括 URI。

如果第一个参数响应 'open' 方法，则使用其余的参数在其上调用 'open'。

如果第一个参数是以 (protocol):// 开头的字符串，则它会被 URI.parse 解析。如果解析后的对象响应 'open' 方法，则使用其余的参数在其上调用 'open'。

否则，将调用 Kernel#open。

OpenURI::OpenRead#open 提供 URI::HTTP#open、URI::HTTPS#open 和 URI::FTP#open、Kernel#open。

我们可以接受以 http://、https:// 和 ftp:// 开头的 URI 和字符串。在这些情况下，打开的文件对象会通过 OpenURI::Meta 扩展。

调用父类方法

parse (uri)

源码

# File lib/uri/common.rb, line 207
def self.parse(uri)
  DEFAULT_PARSER.parse(uri)
end

返回一个从给定字符串 uri 构建的新 URI 对象

URI.parse('https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top')
# => #<URI::HTTPS https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top>
URI.parse('http://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top')
# => #<URI::HTTP http://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top>

建议首先 ::escape 字符串 uri，如果它可能包含无效的 URI 字符。

parser= (parser = RFC3986_PARSER)

源码

# File lib/uri/common.rb, line 25
def self.parser=(parser = RFC3986_PARSER)
  remove_const(:Parser) if defined?(::URI::Parser)
  const_set("Parser", parser.class)

  remove_const(:REGEXP) if defined?(::URI::REGEXP)
  remove_const(:PATTERN) if defined?(::URI::PATTERN)
  if Parser == RFC2396_Parser
    const_set("REGEXP", URI::RFC2396_REGEXP)
    const_set("PATTERN", URI::RFC2396_REGEXP::PATTERN)
  end

  Parser.new.regexp.each_pair do |sym, str|
    remove_const(sym) if const_defined?(sym, false)
    const_set(sym, str)
  end
end

register_scheme (scheme, klass)

源码

# File lib/uri/common.rb, line 102
def self.register_scheme(scheme, klass)
  Schemes.const_set(scheme.to_s.upcase, klass)
end

将给定的 klass 注册为解析具有给定 scheme 的 URI 时要实例化的类

URI.register_scheme('MS_SEARCH', URI::Generic) # => URI::Generic
URI.scheme_list['MS_SEARCH']                   # => URI::Generic

请注意，在对 scheme 调用 String#upcase 之后，它必须是有效的常量名称。

scheme_list ()

源码

# File lib/uri/common.rb, line 120
def self.scheme_list
  Schemes.constants.map { |name|
    [name.to_s.upcase, Schemes.const_get(name)]
  }.to_h
end

返回已定义的方案的哈希

URI.scheme_list
# =>
{"MAILTO"=>URI::MailTo,
 "LDAPS"=>URI::LDAPS,
 "WS"=>URI::WS,
 "HTTP"=>URI::HTTP,
 "HTTPS"=>URI::HTTPS,
 "LDAP"=>URI::LDAP,
 "FILE"=>URI::File,
 "FTP"=>URI::FTP}

私有类方法

_decode_uri_component (regexp, str, enc)

源码

# File lib/uri/common.rb, line 420
def self._decode_uri_component(regexp, str, enc)
  raise ArgumentError, "invalid %-encoding (#{str})" if /%(?!\h\h)/.match?(str)
  str.b.gsub(regexp, TBLDECWWWCOMP_).force_encoding(enc)
end

_encode_uri_component (regexp, table, str, enc)

源码

# File lib/uri/common.rb, line 406
def self._encode_uri_component(regexp, table, str, enc)
  str = str.to_s.dup
  if str.encoding != Encoding::ASCII_8BIT
    if enc && enc != Encoding::ASCII_8BIT
      str.encode!(Encoding::UTF_8, invalid: :replace, undef: :replace)
      str.encode!(enc, fallback: ->(x){"&##{x.ord};"})
    end
    str.force_encoding(Encoding::ASCII_8BIT)
  end
  str.gsub!(regexp, table)
  str.force_encoding(Encoding::US_ASCII)
end