An Encoding
instance represents a character encoding usable in Ruby. It is defined as a constant under the Encoding
namespace. It has a name and optionally, aliases:
Encoding::ISO_8859_1.name #=> "ISO-8859-1" Encoding::ISO_8859_1.names #=> ["ISO-8859-1", "ISO8859-1"]
Ruby methods dealing with encodings return or accept Encoding
instances as arguments (when a method accepts an Encoding
instance as an argument, it can be passed an Encoding
name or alias instead).
"some string".encoding #=> #<Encoding:UTF-8> string = "some string".encode(Encoding::ISO_8859_1) #=> "some string" string.encoding #=> #<Encoding:ISO-8859-1> "some string".encode "ISO-8859-1" #=> "some string"
Encoding::ASCII_8BIT is a special encoding that is usually used for a byte string, not a character string. But as the name insists, its characters in the range of ASCII are considered as ASCII characters. This is useful when you use ASCII-8BIT characters with other ASCII compatible characters.
The associated Encoding
of a String
can be changed in two different ways.
First, it is possible to set the Encoding
of a string to a new Encoding
without changing the internal byte representation of the string, with String#force_encoding
. This is how you can tell Ruby the correct encoding of a string.
string #=> "R\xC3\xA9sum\xC3\xA9" string.encoding #=> #<Encoding:ISO-8859-1> string.force_encoding(Encoding::UTF_8) #=> "R\u00E9sum\u00E9"
Second, it is possible to transcode a string, i.e. translate its internal byte representation to another encoding. Its associated encoding is also set to the other encoding. See String#encode
for the various forms of transcoding, and the Encoding::Converter
class for additional control over the transcoding process.
string #=> "R\u00E9sum\u00E9" string.encoding #=> #<Encoding:UTF-8> string = string.encode!(Encoding::ISO_8859_1) #=> "R\xE9sum\xE9" string.encoding #=> #<Encoding::ISO-8859-1>
All Ruby script code has an associated Encoding
which any String
literal created in the source code will be associated to.
The default script encoding is Encoding::UTF_8 after v2.0, but it can be changed by a magic comment on the first line of the source code file (or second line, if there is a shebang line on the first). The comment must contain the word coding
or encoding
, followed by a colon, space and the Encoding
name or alias:
# encoding: UTF-8 "some string".encoding #=> #<Encoding:UTF-8>
The __ENCODING__
keyword returns the script encoding of the file which the keyword is written:
# encoding: ISO-8859-1 __ENCODING__ #=> #<Encoding:ISO-8859-1>
ruby -K
will change the default locale encoding, but this is not recommended. Ruby source files should declare its script encoding by a magic comment even when they only depend on US-ASCII strings or regular expressions.
The default encoding of the environment. Usually derived from locale.
see Encoding.locale_charmap
, Encoding.find
(‘locale’)
The default encoding of strings from the filesystem of the environment. This is used for strings of file names or paths.
see Encoding.find
(‘filesystem’)
Each IO
object has an external encoding which indicates the encoding that Ruby will use to read its data. By default Ruby sets the external encoding of an IO
object to the default external encoding. The default external encoding is set by locale encoding or the interpreter -E
option. Encoding.default_external
returns the current value of the external encoding.
ENV["LANG"] #=> "UTF-8" Encoding.default_external #=> #<Encoding:UTF-8> $ ruby -E ISO-8859-1 -e "p Encoding.default_external" #<Encoding:ISO-8859-1> $ LANG=C ruby -e 'p Encoding.default_external' #<Encoding:US-ASCII>
The default external encoding may also be set through Encoding.default_external=
, but you should not do this as strings created before and after the change will have inconsistent encodings. Instead use ruby -E
to invoke ruby with the correct external encoding.
When you know that the actual encoding of the data of an IO
object is not the default external encoding, you can reset its external encoding with IO#set_encoding
or set it at IO
object creation (see IO.new
options).
To process the data of an IO
object which has an encoding different from its external encoding, you can set its internal encoding. Ruby will use this internal encoding to transcode the data when it is read from the IO
object.
Conversely, when data is written to the IO
object it is transcoded from the internal encoding to the external encoding of the IO
object.
The internal encoding of an IO
object can be set with IO#set_encoding
or at IO
object creation (see IO.new
options).
The internal encoding is optional and when not set, the Ruby default internal encoding is used. If not explicitly set this default internal encoding is nil
meaning that by default, no transcoding occurs.
The default internal encoding can be set with the interpreter option -E
. Encoding.default_internal
returns the current internal encoding.
$ ruby -e 'p Encoding.default_internal' nil $ ruby -E ISO-8859-1:UTF-8 -e "p [Encoding.default_external, \ Encoding.default_internal]" [#<Encoding:ISO-8859-1>, #<Encoding:UTF-8>]
The default internal encoding may also be set through Encoding.default_internal=
, but you should not do this as strings created before and after the change will have inconsistent encodings. Instead use ruby -E
to invoke ruby with the correct internal encoding.
IO
encoding example In the following example a UTF-8 encoded string “Ru00E9sumu00E9” is transcoded for output to ISO-8859-1 encoding, then read back in and transcoded to UTF-8:
string = "R\u00E9sum\u00E9" open("transcoded.txt", "w:ISO-8859-1") do |io| io.write(string) end puts "raw text:" p File.binread("transcoded.txt") puts open("transcoded.txt", "r:ISO-8859-1:UTF-8") do |io| puts "transcoded text:" p io.read end
While writing the file, the internal encoding is not specified as it is only necessary for reading. While reading the file both the internal and external encoding must be specified to obtain the correct result.
$ ruby t.rb raw text: "R\xE9sum\xE9" transcoded text: "R\u00E9sum\u00E9"
EncodingError
is the base class for encoding errors.
Objects of class Binding
encapsulate the execution context at some particular place in the code and retain this context for future use. The variables, methods, value of self
, and possibly an iterator block that can be accessed in this context are all retained. Binding
objects can be created using Kernel#binding
, and are made available to the callback of Kernel#set_trace_func
and instances of TracePoint
.
These binding objects can be passed as the second argument of the Kernel#eval
method, establishing an environment for the evaluation.
class Demo def initialize(n) @secret = n end def get_binding binding end end k1 = Demo.new(99) b1 = k1.get_binding k2 = Demo.new(-3) b2 = k2.get_binding eval("@secret", b1) #=> 99 eval("@secret", b2) #=> -3 eval("@secret") #=> nil
Binding
objects have no class-specific methods.
MatchData
encapsulates the result of matching a Regexp
against string. It is returned by Regexp#match
and String#match
, and also stored in a global variable returned by Regexp.last_match
.
Usage:
url = 'https://docs.ruby-lang.org/en/2.5.0/MatchData.html' m = url.match(/(\d\.?)+/) # => #<MatchData "2.5.0" 1:"0"> m.string # => "https://docs.ruby-lang.org/en/2.5.0/MatchData.html" m.regexp # => /(\d\.?)+/ # entire matched substring: m[0] # => "2.5.0" # Working with unnamed captures m = url.match(%r{([^/]+)/([^/]+)\.html$}) m.captures # => ["2.5.0", "MatchData"] m[1] # => "2.5.0" m.values_at(1, 2) # => ["2.5.0", "MatchData"] # Working with named captures m = url.match(%r{(?<version>[^/]+)/(?<module>[^/]+)\.html$}) m.captures # => ["2.5.0", "MatchData"] m.named_captures # => {"version"=>"2.5.0", "module"=>"MatchData"} m[:version] # => "2.5.0" m.values_at(:version, :module) # => ["2.5.0", "MatchData"] # Numerical indexes are working, too m[1] # => "2.5.0" m.values_at(1, 2) # => ["2.5.0", "MatchData"]
Parts of last MatchData
(returned by Regexp.last_match
) are also aliased as global variables:
$~
is Regexp.last_match
;
$&
is Regexp.last_match
[ 0 ]
;
$1
, $2
, and so on are Regexp.last_match
[ i ]
(captures by number);
$`
is Regexp.last_match
.pre_match
;
$'
is Regexp.last_match
.post_match
;
$+
is Regexp.last_match
[ -1 ]
(the last capture).
See also “Special global variables” section in Regexp
documentation.
Raised when attempting to divide an integer by 0.
42 / 0 #=> ZeroDivisionError: divided by 0
Note that only division by an exact 0 will raise the exception:
42 / 0.0 #=> Float::INFINITY 42 / -0.0 #=> -Float::INFINITY 0 / 0.0 #=> NaN
This module provides a framework for message digest libraries.
You may want to look at OpenSSL::Digest
as it supports more algorithms.
A cryptographic hash function is a procedure that takes data and returns a fixed bit string: the hash value, also known as digest. Hash
functions are also called one-way functions, it is easy to compute a digest from a message, but it is infeasible to generate a message from a digest.
require 'digest' # Compute a complete digest Digest::SHA256.digest 'message' #=> "\xABS\n\x13\xE4Y..." sha256 = Digest::SHA256.new sha256.digest 'message' #=> "\xABS\n\x13\xE4Y..." # Other encoding formats Digest::SHA256.hexdigest 'message' #=> "ab530a13e459..." Digest::SHA256.base64digest 'message' #=> "q1MKE+RZFJgr..." # Compute digest by chunks md5 = Digest::MD5.new md5.update 'message1' md5 << 'message2' # << is an alias for update md5.hexdigest #=> "94af09c09bb9..." # Compute digest for a file sha256 = Digest::SHA256.file 'testfile' sha256.hexdigest
Additionally digests can be encoded in “bubble babble” format as a sequence of consonants and vowels which is more recognizable and comparable than a hexadecimal digest.
require 'digest/bubblebabble' Digest::SHA256.bubblebabble 'message' #=> "xopoh-fedac-fenyh-..."
See the bubble babble specification at web.mit.edu/kenta/www/one/bubblebabble/spec/jrtrjwzi/draft-huima-01.txt.
Digest
algorithms Different digest algorithms (or hash functions) are available:
MD5
See RFC 1321 The MD5
Message-Digest Algorithm
As Digest::RMD160
. See homes.esat.kuleuven.be/~bosselae/ripemd160.html.
SHA1
See FIPS 180 Secure Hash
Standard.
SHA2
family
See FIPS 180 Secure Hash
Standard which defines the following algorithms:
SHA512
SHA384
SHA256
The latest versions of the FIPS publications can be found here: csrc.nist.gov/publications/PubsFIPS.html.
The DidYouMean
gem adds functionality to suggest possible method/class names upon errors such as NameError
and NoMethodError
. In Ruby 2.3 or later, it is automatically activated during startup.
@example
methosd # => NameError: undefined local variable or method `methosd' for main:Object # Did you mean? methods # method OBject # => NameError: uninitialized constant OBject # Did you mean? Object @full_name = "Yuki Nishijima" first_name, last_name = full_name.split(" ") # => NameError: undefined local variable or method `full_name' for main:Object # Did you mean? @full_name @@full_name = "Yuki Nishijima" @@full_anme # => NameError: uninitialized class variable @@full_anme in Object # Did you mean? @@full_name full_name = "Yuki Nishijima" full_name.starts_with?("Y") # => NoMethodError: undefined method `starts_with?' for "Yuki Nishijima":String # Did you mean? start_with? hash = {foo: 1, bar: 2, baz: 3} hash.fetch(:fooo) # => KeyError: key not found: :fooo # Did you mean? :foo
did_you_mean
Occasionally, you may want to disable the did_you_mean
gem for e.g. debugging issues in the error object itself. You can disable it entirely by specifying --disable-did_you_mean
option to the ruby
command:
$ ruby --disable-did_you_mean -e "1.zeor?" -e:1:in `<main>': undefined method `zeor?' for 1:Integer (NameError)
When you do not have direct access to the ruby
command (e.g. +rails console+, irb
), you could applyoptions using the RUBYOPT
environment variable:
$ RUBYOPT='--disable-did_you_mean' irb irb:0> 1.zeor? # => NoMethodError (undefined method `zeor?' for 1:Integer)
Sometimes, you do not want to disable the gem entirely, but need to get the original error message without suggestions (e.g. testing). In this case, you could use the #original_message
method on the error object:
no_method_error = begin 1.zeor? rescue NoMethodError => error error end no_method_error.message # => NoMethodError (undefined method `zeor?' for 1:Integer) # Did you mean? zero? no_method_error.original_message # => NoMethodError (undefined method `zeor?' for 1:Integer)
Raised on redirection, only occurs when redirect
option for HTTP is false
.
FIXME: This isn’t documented in Nutshell.
Since MonitorMixin.new_cond
returns a ConditionVariable
, and the example above calls while_wait and signal, this class should be documented.
OpenSSL::Digest
allows you to compute message digests (sometimes interchangeably called “hashes”) of arbitrary data that are cryptographically secure, i.e. a Digest
implements a secure one-way function.
One-way functions offer some useful properties. E.g. given two distinct inputs the probability that both yield the same output is highly unlikely. Combined with the fact that every message digest algorithm has a fixed-length output of just a few bytes, digests are often used to create unique identifiers for arbitrary data. A common example is the creation of a unique id for binary documents that are stored in a database.
Another useful characteristic of one-way functions (and thus the name) is that given a digest there is no indication about the original data that produced it, i.e. the only way to identify the original input is to “brute-force” through every possible combination of inputs.
These characteristics make one-way functions also ideal companions for public key signature algorithms: instead of signing an entire document, first a hash of the document is produced with a considerably faster message digest algorithm and only the few bytes of its output need to be signed using the slower public key algorithm. To validate the integrity of a signed document, it suffices to re-compute the hash and verify that it is equal to that in the signature.
You can get a list of all digest algorithms supported on your system by running this command in your terminal:
openssl list -digest-algorithms
Among the OpenSSL
1.1.1 supported message digest algorithms are:
SHA224, SHA256, SHA384, SHA512, SHA512-224 and SHA512-256
SHA3-224, SHA3-256, SHA3-384 and SHA3-512
BLAKE2s256 and BLAKE2b512
Each of these algorithms can be instantiated using the name:
digest = OpenSSL::Digest.new('SHA256')
“Breaking” a message digest algorithm means defying its one-way function characteristics, i.e. producing a collision or finding a way to get to the original data by means that are more efficient than brute-forcing etc. Most of the supported digest algorithms can be considered broken in this sense, even the very popular MD5 and SHA1 algorithms. Should security be your highest concern, then you should probably rely on SHA224, SHA256, SHA384 or SHA512.
data = File.binread('document') sha256 = OpenSSL::Digest.new('SHA256') digest = sha256.digest(data)
data1 = File.binread('file1') data2 = File.binread('file2') data3 = File.binread('file3') sha256 = OpenSSL::Digest.new('SHA256') sha256 << data1 sha256 << data2 sha256 << data3 digest = sha256.digest
Digest
instance data1 = File.binread('file1') sha256 = OpenSSL::Digest.new('SHA256') digest1 = sha256.digest(data1) data2 = File.binread('file2') sha256.reset digest2 = sha256.digest(data2)
Subclass of Zlib::Error
When zlib returns a Z_NEED_DICT if a preset dictionary is needed at this point.
Used by Zlib::Inflate.inflate
and Zlib.inflate
Exception
raised when there is an invalid encoding detected