check the user v
component for RFC2396 compliance and against the URI::Parser
Regexp
for :USERINFO
Can not have a registry or opaque component defined, with a user component defined.
protect setter for the user
component, and password
if available. (with validation)
see also URI::Generic.userinfo=
returns the userinfo ui
as user, password if properly formatted as ‘user:password’
escapes ‘user:password’ v
based on RFC 1738 section 3.1
private method to cleanup attributes
, scope
, filter
and extensions
, from using the query
component attribute
private setter for filter val
private setter for headers v
returns Regexp
that is default self.regexp, unless schemes
is provided. Then it is a Regexp.union
with self.pattern
Constructs the default Hash
of patterns
Constructs the default Hash
of Regexp’s
returns Regexp
that is default self.regexp, unless schemes
is provided. Then it is a Regexp.union
with self.pattern
Constructs the default Hash
of patterns
Constructs the default Hash
of Regexp’s
Creates an error page for exception ex
with an optional backtrace
Finds a servlet for path
Attempts to obtain the lock and returns immediately. Returns true
if the lock was granted.
Returns the one-character string which cause Encoding::UndefinedConversionError
.
ec = Encoding::Converter.new("ISO-8859-1", "EUC-JP") begin ec.convert("\xa0") rescue Encoding::UndefinedConversionError puts $!.error_char.dump #=> "\xC2\xA0" p $!.error_char.encoding #=> #<Encoding:UTF-8> end
Returns the discarded bytes when Encoding::InvalidByteSequenceError
occurs.
ec = Encoding::Converter.new("EUC-JP", "ISO-8859-1") begin ec.convert("abc\xA1\xFFdef") rescue Encoding::InvalidByteSequenceError p $! #=> #<Encoding::InvalidByteSequenceError: "\xA1" followed by "\xFF" on EUC-JP> puts $!.error_bytes.dump #=> "\xA1" puts $!.readagain_bytes.dump #=> "\xFF" end
possible opt elements:
hash form: :partial_input => true # source buffer may be part of larger source :after_output => true # stop conversion after output before input integer form: Encoding::Converter::PARTIAL_INPUT Encoding::Converter::AFTER_OUTPUT
possible results:
:invalid_byte_sequence :incomplete_input :undefined_conversion :after_output :destination_buffer_full :source_buffer_empty :finished
primitive_convert
converts source_buffer into destination_buffer.
source_buffer should be a string or nil. nil means an empty string.
destination_buffer should be a string.
destination_byteoffset should be an integer or nil. nil means the end of destination_buffer. If it is omitted, nil is assumed.
destination_bytesize should be an integer or nil. nil means unlimited. If it is omitted, nil is assumed.
opt should be nil, a hash or an integer. nil means no flags. If it is omitted, nil is assumed.
primitive_convert
converts the content of source_buffer from beginning and store the result into destination_buffer.
destination_byteoffset and destination_bytesize specify the region which the converted result is stored. destination_byteoffset specifies the start position in destination_buffer in bytes. If destination_byteoffset is nil, destination_buffer.bytesize is used for appending the result. destination_bytesize specifies maximum number of bytes. If destination_bytesize is nil, destination size is unlimited. After conversion, destination_buffer is resized to destination_byteoffset + actually produced number of bytes. Also destination_buffer’s encoding is set to destination_encoding.
primitive_convert
drops the converted part of source_buffer. the dropped part is converted in destination_buffer or buffered in Encoding::Converter
object.
primitive_convert
stops conversion when one of following condition met.
invalid byte sequence found in source buffer (:invalid_byte_sequence) primitive_errinfo
and last_error
methods returns the detail of the error.
unexpected end of source buffer (:incomplete_input) this occur only when :partial_input is not specified. primitive_errinfo
and last_error
methods returns the detail of the error.
character not representable in output encoding (:undefined_conversion) primitive_errinfo
and last_error
methods returns the detail of the error.
after some output is generated, before input is done (:after_output) this occur only when :after_output is specified.
destination buffer is full (:destination_buffer_full) this occur only when destination_bytesize is non-nil.
source buffer is empty (:source_buffer_empty) this occur only when :partial_input is specified.
conversion is finished (:finished)
example:
ec = Encoding::Converter.new("UTF-8", "UTF-16BE") ret = ec.primitive_convert(src="pi", dst="", nil, 100) p [ret, src, dst] #=> [:finished, "", "\x00p\x00i"] ec = Encoding::Converter.new("UTF-8", "UTF-16BE") ret = ec.primitive_convert(src="pi", dst="", nil, 1) p [ret, src, dst] #=> [:destination_buffer_full, "i", "\x00"] ret = ec.primitive_convert(src, dst="", nil, 1) p [ret, src, dst] #=> [:destination_buffer_full, "", "p"] ret = ec.primitive_convert(src, dst="", nil, 1) p [ret, src, dst] #=> [:destination_buffer_full, "", "\x00"] ret = ec.primitive_convert(src, dst="", nil, 1) p [ret, src, dst] #=> [:finished, "", "i"]
primitive_errinfo
returns important information regarding the last error as a 5-element array:
[result, enc1, enc2, error_bytes, readagain_bytes]
result is the last result of primitive_convert.
Other elements are only meaningful when result is :invalid_byte_sequence, :incomplete_input or :undefined_conversion.
enc1 and enc2 indicate a conversion step as a pair of strings. For example, a converter from EUC-JP to ISO-8859-1 converts a string as follows: EUC-JP -> UTF-8 -> ISO-8859-1. So [enc1, enc2] is either [“EUC-JP”, “UTF-8”] or [“UTF-8”, “ISO-8859-1”].
error_bytes and readagain_bytes indicate the byte sequences which caused the error. error_bytes is discarded portion. readagain_bytes is buffered portion which is read again on next conversion.
Example:
# \xff is invalid as EUC-JP. ec = Encoding::Converter.new("EUC-JP", "Shift_JIS") ec.primitive_convert(src="\xff", dst="", nil, 10) p ec.primitive_errinfo #=> [:invalid_byte_sequence, "EUC-JP", "UTF-8", "\xFF", ""] # HIRAGANA LETTER A (\xa4\xa2 in EUC-JP) is not representable in ISO-8859-1. # Since this error is occur in UTF-8 to ISO-8859-1 conversion, # error_bytes is HIRAGANA LETTER A in UTF-8 (\xE3\x81\x82). ec = Encoding::Converter.new("EUC-JP", "ISO-8859-1") ec.primitive_convert(src="\xa4\xa2", dst="", nil, 10) p ec.primitive_errinfo #=> [:undefined_conversion, "UTF-8", "ISO-8859-1", "\xE3\x81\x82", ""] # partial character is invalid ec = Encoding::Converter.new("EUC-JP", "ISO-8859-1") ec.primitive_convert(src="\xa4", dst="", nil, 10) p ec.primitive_errinfo #=> [:incomplete_input, "EUC-JP", "UTF-8", "\xA4", ""] # Encoding::Converter::PARTIAL_INPUT prevents invalid errors by # partial characters. ec = Encoding::Converter.new("EUC-JP", "ISO-8859-1") ec.primitive_convert(src="\xa4", dst="", nil, 10, Encoding::Converter::PARTIAL_INPUT) p ec.primitive_errinfo #=> [:source_buffer_empty, nil, nil, nil, nil] # \xd8\x00\x00@ is invalid as UTF-16BE because # no low surrogate after high surrogate (\xd8\x00). # It is detected by 3rd byte (\00) which is part of next character. # So the high surrogate (\xd8\x00) is discarded and # the 3rd byte is read again later. # Since the byte is buffered in ec, it is dropped from src. ec = Encoding::Converter.new("UTF-16BE", "UTF-8") ec.primitive_convert(src="\xd8\x00\x00@", dst="", nil, 10) p ec.primitive_errinfo #=> [:invalid_byte_sequence, "UTF-16BE", "UTF-8", "\xD8\x00", "\x00"] p src #=> "@" # Similar to UTF-16BE, \x00\xd8@\x00 is invalid as UTF-16LE. # The problem is detected by 4th byte. ec = Encoding::Converter.new("UTF-16LE", "UTF-8") ec.primitive_convert(src="\x00\xd8@\x00", dst="", nil, 10) p ec.primitive_errinfo #=> [:invalid_byte_sequence, "UTF-16LE", "UTF-8", "\x00\xD8", "@\x00"] p src #=> ""
Inserts string into the encoding converter. The string will be converted to the destination encoding and output on later conversions.
If the destination encoding is stateful, string is converted according to the state and the state is updated.
This method should be used only when a conversion error occurs.
ec = Encoding::Converter.new("utf-8", "iso-8859-1") src = "HIRAGANA LETTER A is \u{3042}." dst = "" p ec.primitive_convert(src, dst) #=> :undefined_conversion puts "[#{dst.dump}, #{src.dump}]" #=> ["HIRAGANA LETTER A is ", "."] ec.insert_output("<err>") p ec.primitive_convert(src, dst) #=> :finished puts "[#{dst.dump}, #{src.dump}]" #=> ["HIRAGANA LETTER A is <err>.", ""] ec = Encoding::Converter.new("utf-8", "iso-2022-jp") src = "\u{306F 3041 3068 2661 3002}" # U+2661 is not representable in iso-2022-jp dst = "" p ec.primitive_convert(src, dst) #=> :undefined_conversion puts "[#{dst.dump}, #{src.dump}]" #=> ["\e$B$O$!$H".force_encoding("ISO-2022-JP"), "\xE3\x80\x82"] ec.insert_output "?" # state change required to output "?". p ec.primitive_convert(src, dst) #=> :finished puts "[#{dst.dump}, #{src.dump}]" #=> ["\e$B$O$!$H\e(B?\e$B!#\e(B".force_encoding("ISO-2022-JP"), ""]
Returns an exception object for the last conversion. Returns nil if the last conversion did not produce an error.
“error” means that Encoding::InvalidByteSequenceError
and Encoding::UndefinedConversionError
for Encoding::Converter#convert
and :invalid_byte_sequence, :incomplete_input and :undefined_conversion for Encoding::Converter#primitive_convert
.
ec = Encoding::Converter.new("utf-8", "iso-8859-1") p ec.primitive_convert(src="\xf1abcd", dst="") #=> :invalid_byte_sequence p ec.last_error #=> #<Encoding::InvalidByteSequenceError: "\xF1" followed by "a" on UTF-8> p ec.primitive_convert(src, dst, nil, 1) #=> :destination_buffer_full p ec.last_error #=> nil