This module provides a framework for message digest libraries.
You may want to look at OpenSSL::Digest
as it supports more algorithms.
A cryptographic hash function is a procedure that takes data and returns a fixed bit string: the hash value, also known as digest. Hash
functions are also called one-way functions, it is easy to compute a digest from a message, but it is infeasible to generate a message from a digest.
require 'digest' # Compute a complete digest Digest::SHA256.digest 'message' #=> "\xABS\n\x13\xE4Y..." sha256 = Digest::SHA256.new sha256.digest 'message' #=> "\xABS\n\x13\xE4Y..." # Other encoding formats Digest::SHA256.hexdigest 'message' #=> "ab530a13e459..." Digest::SHA256.base64digest 'message' #=> "q1MKE+RZFJgr..." # Compute digest by chunks md5 = Digest::MD5.new md5.update 'message1' md5 << 'message2' # << is an alias for update md5.hexdigest #=> "94af09c09bb9..." # Compute digest for a file sha256 = Digest::SHA256.file 'testfile' sha256.hexdigest
Additionally digests can be encoded in “bubble babble” format as a sequence of consonants and vowels which is more recognizable and comparable than a hexadecimal digest.
require 'digest/bubblebabble' Digest::SHA256.bubblebabble 'message' #=> "xopoh-fedac-fenyh-..."
See the bubble babble specification at web.mit.edu/kenta/www/one/bubblebabble/spec/jrtrjwzi/draft-huima-01.txt.
Digest
algorithms Different digest algorithms (or hash functions) are available:
MD5
See RFC 1321 The MD5
Message-Digest Algorithm
As Digest::RMD160
. See homes.esat.kuleuven.be/~bosselae/ripemd160.html.
SHA1
See FIPS 180 Secure Hash
Standard.
SHA2
familySee FIPS 180 Secure Hash
Standard which defines the following algorithms:
The latest versions of the FIPS publications can be found here: csrc.nist.gov/publications/PubsFIPS.html.
JSON is a lightweight data-interchange format.
A JSON value is one of the following:
Double-quoted text: "foo"
.
Number: 1
, 1.0
, 2.0e2
.
Boolean: true
, false
.
Null: null
.
Array: an ordered list of values, enclosed by square brackets:
["foo", 1, 1.0, 2.0e2, true, false, null]
Object: a collection of name/value pairs, enclosed by curly braces; each name is double-quoted text; the values may be any JSON values:
{"a": "foo", "b": 1, "c": 1.0, "d": 2.0e2, "e": true, "f": false, "g": null}
A JSON array or object may contain nested arrays, objects, and scalars to any depth:
{"foo": {"bar": 1, "baz": 2}, "bat": [0, 1, 2]} [{"foo": 0, "bar": 1}, ["baz", 2]]
To make module JSON available in your code, begin with:
require 'json'
All examples here assume that this has been done.
You can parse a String containing JSON data using either of two methods:
where
source
is a Ruby
object.
opts
is a Hash object containing options that control both input allowed and output formatting.
The difference between the two methods is that JSON.parse!
omits some checks and may not be safe for some source
data; use it only for data from trusted sources. Use the safer method JSON.parse
for less trusted sources.
When source
is a JSON array, JSON.parse
by default returns a Ruby
Array:
json = '["foo", 1, 1.0, 2.0e2, true, false, null]' ruby = JSON.parse(json) ruby # => ["foo", 1, 1.0, 200.0, true, false, nil] ruby.class # => Array
The JSON array may contain nested arrays, objects, and scalars to any depth:
json = '[{"foo": 0, "bar": 1}, ["baz", 2]]' JSON.parse(json) # => [{"foo"=>0, "bar"=>1}, ["baz", 2]]
When the source is a JSON object, JSON.parse
by default returns a Ruby
Hash:
json = '{"a": "foo", "b": 1, "c": 1.0, "d": 2.0e2, "e": true, "f": false, "g": null}' ruby = JSON.parse(json) ruby # => {"a"=>"foo", "b"=>1, "c"=>1.0, "d"=>200.0, "e"=>true, "f"=>false, "g"=>nil} ruby.class # => Hash
The JSON object may contain nested arrays, objects, and scalars to any depth:
json = '{"foo": {"bar": 1, "baz": 2}, "bat": [0, 1, 2]}' JSON.parse(json) # => {"foo"=>{"bar"=>1, "baz"=>2}, "bat"=>[0, 1, 2]}
When the source is a JSON scalar (not an array or object), JSON.parse
returns a Ruby
scalar.
String:
ruby = JSON.parse('"foo"') ruby # => 'foo' ruby.class # => String
Integer:
ruby = JSON.parse('1') ruby # => 1 ruby.class # => Integer
Float:
ruby = JSON.parse('1.0') ruby # => 1.0 ruby.class # => Float ruby = JSON.parse('2.0e2') ruby # => 200 ruby.class # => Float
Boolean:
ruby = JSON.parse('true') ruby # => true ruby.class # => TrueClass ruby = JSON.parse('false') ruby # => false ruby.class # => FalseClass
Null:
ruby = JSON.parse('null') ruby # => nil ruby.class # => NilClass
Option max_nesting
(Integer) specifies the maximum nesting depth allowed; defaults to 100
; specify false
to disable depth checking.
With the default, false
:
source = '[0, [1, [2, [3]]]]' ruby = JSON.parse(source) ruby # => [0, [1, [2, [3]]]]
Too deep:
# Raises JSON::NestingError (nesting of 2 is too deep): JSON.parse(source, {max_nesting: 1})
Bad value:
# Raises TypeError (wrong argument type Symbol (expected Fixnum)): JSON.parse(source, {max_nesting: :foo})
Option allow_nan
(boolean) specifies whether to allow NaN
, Infinity
, and MinusInfinity
in source
; defaults to false
.
With the default, false
:
# Raises JSON::ParserError (225: unexpected token at '[NaN]'): JSON.parse('[NaN]') # Raises JSON::ParserError (232: unexpected token at '[Infinity]'): JSON.parse('[Infinity]') # Raises JSON::ParserError (248: unexpected token at '[-Infinity]'): JSON.parse('[-Infinity]')
Allow:
source = '[NaN, Infinity, -Infinity]' ruby = JSON.parse(source, {allow_nan: true}) ruby # => [NaN, Infinity, -Infinity]
Option symbolize_names
(boolean) specifies whether returned Hash keys should be Symbols; defaults to false
(use Strings).
With the default, false
:
source = '{"a": "foo", "b": 1.0, "c": true, "d": false, "e": null}' ruby = JSON.parse(source) ruby # => {"a"=>"foo", "b"=>1.0, "c"=>true, "d"=>false, "e"=>nil}
Use Symbols:
ruby = JSON.parse(source, {symbolize_names: true}) ruby # => {:a=>"foo", :b=>1.0, :c=>true, :d=>false, :e=>nil}
Option object_class
(Class) specifies the Ruby
class to be used for each JSON object; defaults to Hash.
With the default, Hash:
source = '{"a": "foo", "b": 1.0, "c": true, "d": false, "e": null}' ruby = JSON.parse(source) ruby.class # => Hash
Use class OpenStruct:
ruby = JSON.parse(source, {object_class: OpenStruct}) ruby # => #<OpenStruct a="foo", b=1.0, c=true, d=false, e=nil>
Option array_class
(Class) specifies the Ruby
class to be used for each JSON array; defaults to Array.
With the default, Array:
source = '["foo", 1.0, true, false, null]' ruby = JSON.parse(source) ruby.class # => Array
Use class Set:
ruby = JSON.parse(source, {array_class: Set}) ruby # => #<Set: {"foo", 1.0, true, false, nil}>
Option create_additions
(boolean) specifies whether to use JSON additions in parsing. See JSON Additions.
To generate a Ruby
String containing JSON data, use method JSON.generate(source, opts)
, where
source
is a Ruby
object.
opts
is a Hash object containing options that control both input allowed and output formatting.
When the source is a Ruby
Array, JSON.generate
returns a String containing a JSON array:
ruby = [0, 's', :foo] json = JSON.generate(ruby) json # => '[0,"s","foo"]'
The Ruby
Array array may contain nested arrays, hashes, and scalars to any depth:
ruby = [0, [1, 2], {foo: 3, bar: 4}] json = JSON.generate(ruby) json # => '[0,[1,2],{"foo":3,"bar":4}]'
When the source is a Ruby
Hash, JSON.generate
returns a String containing a JSON object:
ruby = {foo: 0, bar: 's', baz: :bat} json = JSON.generate(ruby) json # => '{"foo":0,"bar":"s","baz":"bat"}'
The Ruby
Hash array may contain nested arrays, hashes, and scalars to any depth:
ruby = {foo: [0, 1], bar: {baz: 2, bat: 3}, bam: :bad} json = JSON.generate(ruby) json # => '{"foo":[0,1],"bar":{"baz":2,"bat":3},"bam":"bad"}'
When the source is neither an Array nor a Hash, the generated JSON data depends on the class of the source.
When the source is a Ruby
Integer or Float, JSON.generate
returns a String containing a JSON number:
JSON.generate(42) # => '42' JSON.generate(0.42) # => '0.42'
When the source is a Ruby
String, JSON.generate
returns a String containing a JSON string (with double-quotes):
JSON.generate('A string') # => '"A string"'
When the source is true
, false
or nil
, JSON.generate
returns a String containing the corresponding JSON token:
JSON.generate(true) # => 'true' JSON.generate(false) # => 'false' JSON.generate(nil) # => 'null'
When the source is none of the above, JSON.generate
returns a String containing a JSON string representation of the source:
JSON.generate(:foo) # => '"foo"' JSON.generate(Complex(0, 0)) # => '"0+0i"' JSON.generate(Dir.new('.')) # => '"#<Dir>"'
Option allow_nan
(boolean) specifies whether NaN
, Infinity
, and -Infinity
may be generated; defaults to false
.
With the default, false
:
# Raises JSON::GeneratorError (920: NaN not allowed in JSON): JSON.generate(JSON::NaN) # Raises JSON::GeneratorError (917: Infinity not allowed in JSON): JSON.generate(JSON::Infinity) # Raises JSON::GeneratorError (917: -Infinity not allowed in JSON): JSON.generate(JSON::MinusInfinity)
Allow:
ruby = [Float::NaN, Float::Infinity, Float::MinusInfinity] JSON.generate(ruby, allow_nan: true) # => '[NaN,Infinity,-Infinity]'
Option max_nesting
(Integer) specifies the maximum nesting depth in obj
; defaults to 100
.
With the default, 100
:
obj = [[[[[[0]]]]]] JSON.generate(obj) # => '[[[[[[0]]]]]]'
Too deep:
# Raises JSON::NestingError (nesting of 2 is too deep): JSON.generate(obj, max_nesting: 2)
Options script_safe
(boolean) specifies wether '\u2028'
, '\u2029'
and '/'
should be escaped as to make the JSON
object safe to interpolate in script tags.
Options ascii_only
(boolean) specifies wether all characters outside the ASCII range should be escaped.
The default formatting options generate the most compact JSON data, all on one line and with no whitespace.
You can use these formatting options to generate JSON data in a more open format, using whitespace. See also JSON.pretty_generate
.
Option array_nl
(String) specifies a string (usually a newline) to be inserted after each JSON array; defaults to the empty String, ''
.
Option object_nl
(String) specifies a string (usually a newline) to be inserted after each JSON object; defaults to the empty String, ''
.
Option indent
(String) specifies the string (usually spaces) to be used for indentation; defaults to the empty String, ''
; defaults to the empty String, ''
; has no effect unless options array_nl
or object_nl
specify newlines.
Option space
(String) specifies a string (usually a space) to be inserted after the colon in each JSON object’s pair; defaults to the empty String, ''
.
Option space_before
(String) specifies a string (usually a space) to be inserted before the colon in each JSON object’s pair; defaults to the empty String, ''
.
In this example, obj
is used first to generate the shortest JSON data (no whitespace), then again with all formatting options specified:
obj = {foo: [:bar, :baz], bat: {bam: 0, bad: 1}} json = JSON.generate(obj) puts 'Compact:', json opts = { array_nl: "\n", object_nl: "\n", indent: ' ', space_before: ' ', space: ' ' } puts 'Open:', JSON.generate(obj, opts)
Output:
Compact: {"foo":["bar","baz"],"bat":{"bam":0,"bad":1}} Open: { "foo" : [ "bar", "baz" ], "bat" : { "bam" : 0, "bad" : 1 } }
When you “round trip” a non-String object from Ruby
to JSON and back, you have a new String, instead of the object you began with:
ruby0 = Range.new(0, 2) json = JSON.generate(ruby0) json # => '0..2"' ruby1 = JSON.parse(json) ruby1 # => '0..2' ruby1.class # => String
You can use JSON additions to preserve the original object. The addition is an extension of a ruby class, so that:
JSON.generate stores more information in the JSON string.
JSON.parse, called with option create_additions
, uses that information to create a proper Ruby
object.
This example shows a Range being generated into JSON and parsed back into Ruby
, both without and with the addition for Range:
ruby = Range.new(0, 2) # This passage does not use the addition for Range. json0 = JSON.generate(ruby) ruby0 = JSON.parse(json0) # This passage uses the addition for Range. require 'json/add/range' json1 = JSON.generate(ruby) ruby1 = JSON.parse(json1, create_additions: true) # Make a nice display. display = <<~EOT Generated JSON: Without addition: #{json0} (#{json0.class}) With addition: #{json1} (#{json1.class}) Parsed JSON: Without addition: #{ruby0.inspect} (#{ruby0.class}) With addition: #{ruby1.inspect} (#{ruby1.class}) EOT puts display
This output shows the different results:
Generated JSON: Without addition: "0..2" (String) With addition: {"json_class":"Range","a":[0,2,false]} (String) Parsed JSON: Without addition: "0..2" (String) With addition: 0..2 (Range)
The JSON module includes additions for certain classes. You can also craft custom additions. See Custom JSON Additions.
The JSON module includes additions for certain classes. To use an addition, require
its source:
BigDecimal: require 'json/add/bigdecimal'
Complex: require 'json/add/complex'
Date: require 'json/add/date'
DateTime: require 'json/add/date_time'
Exception: require 'json/add/exception'
OpenStruct: require 'json/add/ostruct'
Range: require 'json/add/range'
Rational: require 'json/add/rational'
Regexp: require 'json/add/regexp'
Set: require 'json/add/set'
Struct: require 'json/add/struct'
Symbol: require 'json/add/symbol'
Time: require 'json/add/time'
To reduce punctuation clutter, the examples below show the generated JSON via puts
, rather than the usual inspect
,
BigDecimal:
require 'json/add/bigdecimal' ruby0 = BigDecimal(0) # 0.0 json = JSON.generate(ruby0) # {"json_class":"BigDecimal","b":"27:0.0"} ruby1 = JSON.parse(json, create_additions: true) # 0.0 ruby1.class # => BigDecimal
Complex:
require 'json/add/complex' ruby0 = Complex(1+0i) # 1+0i json = JSON.generate(ruby0) # {"json_class":"Complex","r":1,"i":0} ruby1 = JSON.parse(json, create_additions: true) # 1+0i ruby1.class # Complex
Date:
require 'json/add/date' ruby0 = Date.today # 2020-05-02 json = JSON.generate(ruby0) # {"json_class":"Date","y":2020,"m":5,"d":2,"sg":2299161.0} ruby1 = JSON.parse(json, create_additions: true) # 2020-05-02 ruby1.class # Date
DateTime:
require 'json/add/date_time' ruby0 = DateTime.now # 2020-05-02T10:38:13-05:00 json = JSON.generate(ruby0) # {"json_class":"DateTime","y":2020,"m":5,"d":2,"H":10,"M":38,"S":13,"of":"-5/24","sg":2299161.0} ruby1 = JSON.parse(json, create_additions: true) # 2020-05-02T10:38:13-05:00 ruby1.class # DateTime
Exception (and its subclasses including RuntimeError):
require 'json/add/exception' ruby0 = Exception.new('A message') # A message json = JSON.generate(ruby0) # {"json_class":"Exception","m":"A message","b":null} ruby1 = JSON.parse(json, create_additions: true) # A message ruby1.class # Exception ruby0 = RuntimeError.new('Another message') # Another message json = JSON.generate(ruby0) # {"json_class":"RuntimeError","m":"Another message","b":null} ruby1 = JSON.parse(json, create_additions: true) # Another message ruby1.class # RuntimeError
OpenStruct:
require 'json/add/ostruct' ruby0 = OpenStruct.new(name: 'Matz', language: 'Ruby') # #<OpenStruct name="Matz", language="Ruby"> json = JSON.generate(ruby0) # {"json_class":"OpenStruct","t":{"name":"Matz","language":"Ruby"}} ruby1 = JSON.parse(json, create_additions: true) # #<OpenStruct name="Matz", language="Ruby"> ruby1.class # OpenStruct
Range:
require 'json/add/range' ruby0 = Range.new(0, 2) # 0..2 json = JSON.generate(ruby0) # {"json_class":"Range","a":[0,2,false]} ruby1 = JSON.parse(json, create_additions: true) # 0..2 ruby1.class # Range
Rational:
require 'json/add/rational' ruby0 = Rational(1, 3) # 1/3 json = JSON.generate(ruby0) # {"json_class":"Rational","n":1,"d":3} ruby1 = JSON.parse(json, create_additions: true) # 1/3 ruby1.class # Rational
Regexp:
require 'json/add/regexp' ruby0 = Regexp.new('foo') # (?-mix:foo) json = JSON.generate(ruby0) # {"json_class":"Regexp","o":0,"s":"foo"} ruby1 = JSON.parse(json, create_additions: true) # (?-mix:foo) ruby1.class # Regexp
Set:
require 'json/add/set' ruby0 = Set.new([0, 1, 2]) # #<Set: {0, 1, 2}> json = JSON.generate(ruby0) # {"json_class":"Set","a":[0,1,2]} ruby1 = JSON.parse(json, create_additions: true) # #<Set: {0, 1, 2}> ruby1.class # Set
Struct:
require 'json/add/struct' Customer = Struct.new(:name, :address) # Customer ruby0 = Customer.new("Dave", "123 Main") # #<struct Customer name="Dave", address="123 Main"> json = JSON.generate(ruby0) # {"json_class":"Customer","v":["Dave","123 Main"]} ruby1 = JSON.parse(json, create_additions: true) # #<struct Customer name="Dave", address="123 Main"> ruby1.class # Customer
Symbol:
require 'json/add/symbol' ruby0 = :foo # foo json = JSON.generate(ruby0) # {"json_class":"Symbol","s":"foo"} ruby1 = JSON.parse(json, create_additions: true) # foo ruby1.class # Symbol
Time:
require 'json/add/time' ruby0 = Time.now # 2020-05-02 11:28:26 -0500 json = JSON.generate(ruby0) # {"json_class":"Time","s":1588436906,"n":840560000} ruby1 = JSON.parse(json, create_additions: true) # 2020-05-02 11:28:26 -0500 ruby1.class # Time
In addition to the JSON additions provided, you can craft JSON additions of your own, either for Ruby
built-in classes or for user-defined classes.
Here’s a user-defined class Foo
:
class Foo attr_accessor :bar, :baz def initialize(bar, baz) self.bar = bar self.baz = baz end end
Here’s the JSON addition for it:
# Extend class Foo with JSON addition. class Foo # Serialize Foo object with its class name and arguments def to_json(*args) { JSON.create_id => self.class.name, 'a' => [ bar, baz ] }.to_json(*args) end # Deserialize JSON string by constructing new Foo object with arguments. def self.json_create(object) new(*object['a']) end end
Demonstration:
require 'json' # This Foo object has no custom addition. foo0 = Foo.new(0, 1) json0 = JSON.generate(foo0) obj0 = JSON.parse(json0) # Lood the custom addition. require_relative 'foo_addition' # This foo has the custom addition. foo1 = Foo.new(0, 1) json1 = JSON.generate(foo1) obj1 = JSON.parse(json1, create_additions: true) # Make a nice display. display = <<~EOT Generated JSON: Without custom addition: #{json0} (#{json0.class}) With custom addition: #{json1} (#{json1.class}) Parsed JSON: Without custom addition: #{obj0.inspect} (#{obj0.class}) With custom addition: #{obj1.inspect} (#{obj1.class}) EOT puts display
Output:
Generated JSON: Without custom addition: "#<Foo:0x0000000006534e80>" (String) With custom addition: {"json_class":"Foo","a":[0,1]} (String) Parsed JSON: Without custom addition: "#<Foo:0x0000000006534e80>" (String) With custom addition: #<Foo:0x0000000006473bb8 @bar=0, @baz=1> (Foo)
OpenSSL
provides SSL
, TLS and general purpose cryptography. It wraps the OpenSSL library.
All examples assume you have loaded OpenSSL
with:
require 'openssl'
These examples build atop each other. For example the key created in the next is used in throughout these examples.
This example creates a 2048 bit RSA keypair and writes it to the current directory.
key = OpenSSL::PKey::RSA.new 2048 File.write 'private_key.pem', key.private_to_pem File.write 'public_key.pem', key.public_to_pem
Keys saved to disk without encryption are not secure as anyone who gets ahold of the key may use it unless it is encrypted. In order to securely export a key you may export it with a password.
cipher = OpenSSL::Cipher.new 'aes-256-cbc' password = 'my secure password goes here' key_secure = key.private_to_pem cipher, password File.write 'private.secure.pem', key_secure
OpenSSL::Cipher.ciphers
returns a list of available ciphers.
A key can also be loaded from a file.
key2 = OpenSSL::PKey.read File.read 'private_key.pem' key2.public? # => true key2.private? # => true
or
key3 = OpenSSL::PKey.read File.read 'public_key.pem' key3.public? # => true key3.private? # => false
OpenSSL
will prompt you for your password when loading an encrypted key. If you will not be able to type in the password you may provide it when loading the key:
key4_pem = File.read 'private.secure.pem' password = 'my secure password goes here' key4 = OpenSSL::PKey.read key4_pem, password
RSA provides encryption and decryption using the public and private keys. You can use a variety of padding methods depending upon the intended use of encrypted data.
Asymmetric public/private key encryption is slow and victim to attack in cases where it is used without padding or directly to encrypt larger chunks of data. Typical use cases for RSA encryption involve “wrapping” a symmetric key with the public key of the recipient who would “unwrap” that symmetric key again using their private key. The following illustrates a simplified example of such a key transport scheme. It shouldn’t be used in practice, though, standardized protocols should always be preferred.
wrapped_key = key.public_encrypt key
A symmetric key encrypted with the public key can only be decrypted with the corresponding private key of the recipient.
original_key = key.private_decrypt wrapped_key
By default PKCS#1 padding will be used, but it is also possible to use other forms of padding, see PKey::RSA
for further details.
Using “private_encrypt” to encrypt some data with the private key is equivalent to applying a digital signature to the data. A verifying party may validate the signature by comparing the result of decrypting the signature with “public_decrypt” to the original data. However, OpenSSL::PKey
already has methods “sign” and “verify” that handle digital signatures in a standardized way - “private_encrypt” and “public_decrypt” shouldn’t be used in practice.
To sign a document, a cryptographically secure hash of the document is computed first, which is then signed using the private key.
signature = key.sign 'SHA256', document
To validate the signature, again a hash of the document is computed and the signature is decrypted using the public key. The result is then compared to the hash just computed, if they are equal the signature was valid.
if key.verify 'SHA256', signature, document puts 'Valid' else puts 'Invalid' end
If supported by the underlying OpenSSL
version used, Password-based Encryption should use the features of PKCS5
. If not supported or if required by legacy applications, the older, less secure methods specified in RFC 2898 are also supported (see below).
PKCS5
supports PBKDF2 as it was specified in PKCS#5 v2.0. It still uses a password, a salt, and additionally a number of iterations that will slow the key derivation process down. The slower this is, the more work it requires being able to brute-force the resulting key.
The strategy is to first instantiate a Cipher
for encryption, and then to generate a random IV plus a key derived from the password using PBKDF2. PKCS #5 v2.0 recommends at least 8 bytes for the salt, the number of iterations largely depends on the hardware being used.
cipher = OpenSSL::Cipher.new 'aes-256-cbc' cipher.encrypt iv = cipher.random_iv pwd = 'some hopefully not to easily guessable password' salt = OpenSSL::Random.random_bytes 16 iter = 20000 key_len = cipher.key_len digest = OpenSSL::Digest.new('SHA256') key = OpenSSL::PKCS5.pbkdf2_hmac(pwd, salt, iter, key_len, digest) cipher.key = key Now encrypt the data: encrypted = cipher.update document encrypted << cipher.final
Use the same steps as before to derive the symmetric AES key, this time setting the Cipher
up for decryption.
cipher = OpenSSL::Cipher.new 'aes-256-cbc' cipher.decrypt cipher.iv = iv # the one generated with #random_iv pwd = 'some hopefully not to easily guessable password' salt = ... # the one generated above iter = 20000 key_len = cipher.key_len digest = OpenSSL::Digest.new('SHA256') key = OpenSSL::PKCS5.pbkdf2_hmac(pwd, salt, iter, key_len, digest) cipher.key = key Now decrypt the data: decrypted = cipher.update encrypted decrypted << cipher.final
X509
Certificates This example creates a self-signed certificate using an RSA key and a SHA1 signature.
key = OpenSSL::PKey::RSA.new 2048 name = OpenSSL::X509::Name.parse '/CN=nobody/DC=example' cert = OpenSSL::X509::Certificate.new cert.version = 2 cert.serial = 0 cert.not_before = Time.now cert.not_after = Time.now + 3600 cert.public_key = key.public_key cert.subject = name
You can add extensions to the certificate with OpenSSL::SSL::ExtensionFactory to indicate the purpose of the certificate.
extension_factory = OpenSSL::X509::ExtensionFactory.new nil, cert cert.add_extension \ extension_factory.create_extension('basicConstraints', 'CA:FALSE', true) cert.add_extension \ extension_factory.create_extension( 'keyUsage', 'keyEncipherment,dataEncipherment,digitalSignature') cert.add_extension \ extension_factory.create_extension('subjectKeyIdentifier', 'hash')
The list of supported extensions (and in some cases their possible values) can be derived from the “objects.h” file in the OpenSSL
source code.
To sign a certificate set the issuer and use OpenSSL::X509::Certificate#sign
with a digest algorithm. This creates a self-signed cert because we’re using the same name and key to sign the certificate as was used to create the certificate.
cert.issuer = name cert.sign key, OpenSSL::Digest.new('SHA1') open 'certificate.pem', 'w' do |io| io.write cert.to_pem end
Like a key, a cert can also be loaded from a file.
cert2 = OpenSSL::X509::Certificate.new File.read 'certificate.pem'
Certificate#verify will return true when a certificate was signed with the given public key.
raise 'certificate can not be verified' unless cert2.verify key
A certificate authority (CA) is a trusted third party that allows you to verify the ownership of unknown certificates. The CA issues key signatures that indicate it trusts the user of that key. A user encountering the key can verify the signature by using the CA’s public key.
CA keys are valuable, so we encrypt and save it to disk and make sure it is not readable by other users.
ca_key = OpenSSL::PKey::RSA.new 2048 password = 'my secure password goes here' cipher = 'aes-256-cbc' open 'ca_key.pem', 'w', 0400 do |io| io.write ca_key.private_to_pem(cipher, password) end
A CA certificate is created the same way we created a certificate above, but with different extensions.
ca_name = OpenSSL::X509::Name.parse '/CN=ca/DC=example' ca_cert = OpenSSL::X509::Certificate.new ca_cert.serial = 0 ca_cert.version = 2 ca_cert.not_before = Time.now ca_cert.not_after = Time.now + 86400 ca_cert.public_key = ca_key.public_key ca_cert.subject = ca_name ca_cert.issuer = ca_name extension_factory = OpenSSL::X509::ExtensionFactory.new extension_factory.subject_certificate = ca_cert extension_factory.issuer_certificate = ca_cert ca_cert.add_extension \ extension_factory.create_extension('subjectKeyIdentifier', 'hash')
This extension indicates the CA’s key may be used as a CA.
ca_cert.add_extension \ extension_factory.create_extension('basicConstraints', 'CA:TRUE', true)
This extension indicates the CA’s key may be used to verify signatures on both certificates and certificate revocations.
ca_cert.add_extension \ extension_factory.create_extension( 'keyUsage', 'cRLSign,keyCertSign', true)
Root CA certificates are self-signed.
ca_cert.sign ca_key, OpenSSL::Digest.new('SHA1')
The CA certificate is saved to disk so it may be distributed to all the users of the keys this CA will sign.
open 'ca_cert.pem', 'w' do |io| io.write ca_cert.to_pem end
The CA signs keys through a Certificate Signing Request (CSR). The CSR contains the information necessary to identify the key.
csr = OpenSSL::X509::Request.new csr.version = 0 csr.subject = name csr.public_key = key.public_key csr.sign key, OpenSSL::Digest.new('SHA1')
A CSR is saved to disk and sent to the CA for signing.
open 'csr.pem', 'w' do |io| io.write csr.to_pem end
Upon receiving a CSR the CA will verify it before signing it. A minimal verification would be to check the CSR’s signature.
csr = OpenSSL::X509::Request.new File.read 'csr.pem' raise 'CSR can not be verified' unless csr.verify csr.public_key
After verification a certificate is created, marked for various usages, signed with the CA key and returned to the requester.
csr_cert = OpenSSL::X509::Certificate.new csr_cert.serial = 0 csr_cert.version = 2 csr_cert.not_before = Time.now csr_cert.not_after = Time.now + 600 csr_cert.subject = csr.subject csr_cert.public_key = csr.public_key csr_cert.issuer = ca_cert.subject extension_factory = OpenSSL::X509::ExtensionFactory.new extension_factory.subject_certificate = csr_cert extension_factory.issuer_certificate = ca_cert csr_cert.add_extension \ extension_factory.create_extension('basicConstraints', 'CA:FALSE') csr_cert.add_extension \ extension_factory.create_extension( 'keyUsage', 'keyEncipherment,dataEncipherment,digitalSignature') csr_cert.add_extension \ extension_factory.create_extension('subjectKeyIdentifier', 'hash') csr_cert.sign ca_key, OpenSSL::Digest.new('SHA1') open 'csr_cert.pem', 'w' do |io| io.write csr_cert.to_pem end
SSL
and TLS Connections Using our created key and certificate we can create an SSL
or TLS connection. An SSLContext is used to set up an SSL
session.
context = OpenSSL::SSL::SSLContext.new
SSL
Server An SSL
server requires the certificate and private key to communicate securely with its clients:
context.cert = cert context.key = key
Then create an SSLServer with a TCP server socket and the context. Use the SSLServer like an ordinary TCP server.
require 'socket' tcp_server = TCPServer.new 5000 ssl_server = OpenSSL::SSL::SSLServer.new tcp_server, context loop do ssl_connection = ssl_server.accept data = ssl_connection.gets response = "I got #{data.dump}" puts response ssl_connection.puts "I got #{data.dump}" ssl_connection.close end
SSL
client An SSL
client is created with a TCP socket and the context. SSLSocket#connect must be called to initiate the SSL
handshake and start encryption. A key and certificate are not required for the client socket.
Note that SSLSocket#close doesn’t close the underlying socket by default. Set
SSLSocket#sync_close to true if you want.
require 'socket' tcp_socket = TCPSocket.new 'localhost', 5000 ssl_client = OpenSSL::SSL::SSLSocket.new tcp_socket, context ssl_client.sync_close = true ssl_client.connect ssl_client.puts "hello server!" puts ssl_client.gets ssl_client.close # shutdown the TLS connection and close tcp_socket
An unverified SSL
connection does not provide much security. For enhanced security the client or server can verify the certificate of its peer.
The client can be modified to verify the server’s certificate against the certificate authority’s certificate:
context.ca_file = 'ca_cert.pem' context.verify_mode = OpenSSL::SSL::VERIFY_PEER require 'socket' tcp_socket = TCPSocket.new 'localhost', 5000 ssl_client = OpenSSL::SSL::SSLSocket.new tcp_socket, context ssl_client.connect ssl_client.puts "hello server!" puts ssl_client.gets
If the server certificate is invalid or context.ca_file
is not set when verifying peers an OpenSSL::SSL::SSLError
will be raised.
FileTest
implements file test operations similar to those used in File::Stat
. It exists as a standalone module, and its methods are also insinuated into the File
class. (Note that this is not done by inclusion: the interpreter cheats).
The Singleton
module implements the Singleton
pattern.
To use Singleton
, include the module in your class.
class Klass include Singleton # ... end
This ensures that only one instance of Klass can be created.
a,b = Klass.instance, Klass.instance a == b # => true Klass.new # => NoMethodError - new is private ...
The instance is created at upon the first call of Klass.instance().
class OtherKlass include Singleton # ... end ObjectSpace.each_object(OtherKlass){} # => 0 OtherKlass.instance ObjectSpace.each_object(OtherKlass){} # => 1
This behavior is preserved under inheritance and cloning.
This above is achieved by:
Making Klass.new and Klass.allocate private.
Overriding Klass.inherited(sub_klass) and Klass.clone() to ensure that the Singleton
properties are kept when inherited and cloned.
Providing the Klass.instance() method that returns the same object each time it is called.
Overriding Klass._load(str) to call Klass.instance().
Overriding Klass#clone and Klass#dup to raise TypeErrors to prevent cloning or duping.
Singleton
and Marshal
By default Singleton’s _dump(depth)
returns the empty string. Marshalling by default will strip state information, e.g. instance variables from the instance. Classes using Singleton
can provide custom _load(str) and _dump(depth) methods to retain some of the previous state of the instance.
require 'singleton' class Example include Singleton attr_accessor :keep, :strip def _dump(depth) # this strips the @strip information from the instance Marshal.dump(@keep, depth) end def self._load(str) instance.keep = Marshal.load(str) instance end end a = Example.instance a.keep = "keep this" a.strip = "get rid of this" stored_state = Marshal.dump(a) a.keep = nil a.strip = nil b = Marshal.load(stored_state) p a == b # => true p a.keep # => "keep this" p a.strip # => nil
define UnicodeNormalize module here so that we don’t have to look it up
Response class for Found
responses (status code 302).
The Found
response indicates that the client should look at (browse to) another URL.
References:
Response class for Moved Permanently
responses (status code 301).
The Moved Permanently
response indicates that links or records returning this response should be updated to use the given URL.
References:
Response class for Temporary Redirect
responses (status code 307).
The request should be repeated with another URI
; however, future requests should still use the original URI
.
References:
Signals that a remote operation cannot be conducted, probably due to not being connected (or just not finding host).
RemoteFetcher
handles the details of fetching gems and gem information from a remote source.
A Requirement
is a set of one or more version restrictions. It supports a few (=, !=, >, <, >=, <=, ~>
) different restriction operators.
See Gem::Version
for a description on how versions and requirements work together in RubyGems.
Raised on attempt to Ractor#take
if there was an uncaught exception in the Ractor
. Its cause
will contain the original exception, and ractor
is the original ractor it was raised in.
r = Ractor.new { raise "Something weird happened" } begin r.take rescue => e p e # => #<Ractor::RemoteError: thrown by remote Ractor.> p e.ractor == r # => true p e.cause # => #<RuntimeError: Something weird happened> end
Raised on an attempt to access an object which was moved in Ractor#send
or Ractor.yield.
r = Ractor.new { sleep } ary = [1, 2, 3] r.send(ary, move: true) ary.inspect # Ractor::MovedError (can not send any methods to a moved object)
A special object which replaces any value that was moved to another ractor in Ractor#send
or Ractor.yield. Any attempt to access the object results in Ractor::MovedError
.
r = Ractor.new { receive } ary = [1, 2, 3] r.send(ary, move: true) p Ractor::MovedObject === ary # => true ary.inspect # Ractor::MovedError (can not send any methods to a moved object)
Raised when an invalid operation is attempted on a Fiber
, in particular when attempting to call/resume a dead fiber, attempting to yield from the root fiber, or calling a fiber across threads.
fiber = Fiber.new{} fiber.resume #=> nil fiber.resume #=> FiberError: dead fiber called
Raised when a feature is not implemented on the current platform. For example, methods depending on the fsync
or fork
system calls may raise this exception if the underlying operating system or Ruby
runtime does not support them.
Note that if fork
raises a NotImplementedError
, then respond_to?(:fork)
returns false
.
A Module
is a collection of methods and constants. The methods in a module may be instance methods or module methods. Instance methods appear as methods in a class when the module is included, module methods do not. Conversely, module methods may be called without creating an encapsulating object, while instance methods may not. (See Module#module_function
.)
In the descriptions that follow, the parameter sym refers to a symbol, which is either a quoted string or a Symbol
(such as :name
).
module Mod include Math CONST = 1 def meth # ... end end Mod.class #=> Module Mod.constants #=> [:CONST, :PI, :E] Mod.instance_methods #=> [:meth]
A regular expression (also called a regexp) is a match pattern (also simply called a pattern).
A common notation for a regexp uses enclosing slash characters:
/foo/
A regexp may be applied to a target string; The part of the string (if any) that matches the pattern is called a match, and may be said to match:
re = /red/ re.match?('redirect') # => true # Match at beginning of target. re.match?('bored') # => true # Match at end of target. re.match?('credit') # => true # Match within target. re.match?('foo') # => false # No match.
A regexp may be used:
To extract substrings based on a given pattern:
re = /foo/ # => /foo/ re.match('food') # => #<MatchData "foo"> re.match('good') # => nil
See sections Method match and Operator =~.
To determine whether a string matches a given pattern:
re.match?('food') # => true re.match?('good') # => false
See section Method match?.
As an argument for calls to certain methods in other classes and modules; most such methods accept an argument that may be either a string or the (much more powerful) regexp.
See Regexp Methods.
A regexp object has:
A source; see Sources.
Several modes; see Modes.
A timeout; see Timeouts.
An encoding; see Encodings.
A regular expression may be created with:
A regexp literal using slash characters (see Regexp Literals):
# This is a very common usage. /foo/ # => /foo/
A %r
regexp literal (see Regexp Literals):
# Same delimiter character at beginning and end; # useful for avoiding escaping characters %r/name\/value pair/ # => /name\/value pair/ %r:name/value pair: # => /name\/value pair/ %r|name/value pair| # => /name\/value pair/ # Certain "paired" characters can be delimiters. %r[foo] # => /foo/ %r{foo} # => /foo/ %r(foo) # => /foo/ %r<foo> # => /foo/
Method
match
Each of the methods Regexp#match
, String#match
, and Symbol#match
returns a MatchData
object if a match was found, nil
otherwise; each also sets global variables:
'food'.match(/foo/) # => #<MatchData "foo"> 'food'.match(/bar/) # => nil
=~
Each of the operators Regexp#=~
, String#=~
, and Symbol#=~
returns an integer offset if a match was found, nil
otherwise; each also sets global variables:
/bar/ =~ 'foo bar' # => 4 'foo bar' =~ /bar/ # => 4 /baz/ =~ 'foo bar' # => nil
Method
match?
Each of the methods Regexp#match?
, String#match?
, and Symbol#match?
returns true
if a match was found, false
otherwise; none sets global variables:
'food'.match?(/foo/) # => true 'food'.match?(/bar/) # => false
Certain regexp-oriented methods assign values to global variables:
match
: see Method match.
=~
: see Operator =~.
The affected global variables are:
$~
: Returns a MatchData
object, or nil
.
$&
: Returns the matched part of the string, or nil
.
$`
: Returns the part of the string to the left of the match, or nil
.
$'
: Returns the part of the string to the right of the match, or nil
.
$+
: Returns the last group matched, or nil
.
$1
, $2
, etc.: Returns the first, second, etc., matched group, or nil
. Note that $0
is quite different; it returns the name of the currently executing program.
These variables, except for $~
, are shorthands for methods of $~
. See Global variables equivalence at MatchData
.
Examples:
# Matched string, but no matched groups. 'foo bar bar baz'.match('bar') $~ # => #<MatchData "bar"> $& # => "bar" $` # => "foo " $' # => " bar baz" $+ # => nil $1 # => nil # Matched groups. /s(\w{2}).*(c)/.match('haystack') $~ # => #<MatchData "stac" 1:"ta" 2:"c"> $& # => "stac" $` # => "hay" $' # => "k" $+ # => "c" $1 # => "ta" $2 # => "c" $3 # => nil # No match. 'foo'.match('bar') $~ # => nil $& # => nil $` # => nil $' # => nil $+ # => nil $1 # => nil
Note that Regexp#match?
, String#match?
, and Symbol#match?
do not set global variables.
As seen above, the simplest regexp uses a literal expression as its source:
re = /foo/ # => /foo/ re.match('food') # => #<MatchData "foo"> re.match('good') # => nil
A rich collection of available subexpressions gives the regexp great power and flexibility:
Regexp special characters, called metacharacters, have special meanings in certain contexts; depending on the context, these are sometimes metacharacters:
. ? - + * ^ \ | $ ( ) [ ] { }
To match a metacharacter literally, backslash-escape it:
# Matches one or more 'o' characters. /o+/.match('foo') # => #<MatchData "oo"> # Would match 'o+'. /o\+/.match('foo') # => nil
To match a backslash literally, backslash-escape it:
/\./.match('\.') # => #<MatchData "."> /\\./.match('\.') # => #<MatchData "\\.">
Method
Regexp.escape
returns an escaped string:
Regexp.escape('.?-+*^\|$()[]{}') # => "\\.\\?\\-\\+\\*\\^\\\\\\|\\$\\(\\)\\[\\]\\{\\}"
The source literal largely behaves like a double-quoted string; see Regexp Literals.
In particular, a source literal may contain interpolated expressions:
s = 'foo' # => "foo" /#{s}/ # => /foo/ /#{s.capitalize}/ # => /Foo/ /#{2 + 2}/ # => /4/
There are differences between an ordinary string literal and a source literal; see Shorthand Character Classes.
\s
in an ordinary string literal is equivalent to a space character; in a source literal, it’s shorthand for matching a whitespace character.
In an ordinary string literal, these are (needlessly) escaped characters; in a source literal, they are shorthands for various matching characters:
\w \W \d \D \h \H \S \R
A character class is delimited by square brackets; it specifies that certain characters match at a given point in the target string:
# This character class will match any vowel. re = /B[aeiou]rd/ re.match('Bird') # => #<MatchData "Bird"> re.match('Bard') # => #<MatchData "Bard"> re.match('Byrd') # => nil
A character class may contain hyphen characters to specify ranges of characters:
# These regexps have the same effect. /[abcdef]/.match('foo') # => #<MatchData "f"> /[a-f]/.match('foo') # => #<MatchData "f"> /[a-cd-f]/.match('foo') # => #<MatchData "f">
When the first character of a character class is a caret (^
), the sense of the class is inverted: it matches any character except those specified.
/[^a-eg-z]/.match('f') # => #<MatchData "f">
A character class may contain another character class. By itself this isn’t useful because [a-z[0-9]]
describes the same set as [a-z0-9]
.
However, character classes also support the &&
operator, which performs set intersection on its arguments. The two can be combined as follows:
/[a-w&&[^c-g]z]/ # ([a-w] AND ([^c-g] OR z))
This is equivalent to:
/[abh-w]/
Each of the following metacharacters serves as a shorthand for a character class:
/./
: Matches any character except a newline:
/./.match('foo') # => #<MatchData "f"> /./.match("\n") # => nil
/./m
: Matches any character, including a newline; see Multiline Mode:
/./m.match("\n") # => #<MatchData "\n">
/\w/
: Matches a word character: equivalent to [a-zA-Z0-9_]
:
/\w/.match(' foo') # => #<MatchData "f"> /\w/.match(' _') # => #<MatchData "_"> /\w/.match(' ') # => nil
/\W/
: Matches a non-word character: equivalent to [^a-zA-Z0-9_]
:
/\W/.match(' ') # => #<MatchData " "> /\W/.match('_') # => nil
/\d/
: Matches a digit character: equivalent to [0-9]
:
/\d/.match('THX1138') # => #<MatchData "1"> /\d/.match('foo') # => nil
/\D/
: Matches a non-digit character: equivalent to [^0-9]
:
/\D/.match('123Jump!') # => #<MatchData "J"> /\D/.match('123') # => nil
/\h/
: Matches a hexdigit character: equivalent to [0-9a-fA-F]
:
/\h/.match('xyz fedcba9876543210') # => #<MatchData "f"> /\h/.match('xyz') # => nil
/\H/
: Matches a non-hexdigit character: equivalent to [^0-9a-fA-F]
:
/\H/.match('fedcba9876543210xyz') # => #<MatchData "x"> /\H/.match('fedcba9876543210') # => nil
/\s/
: Matches a whitespace character: equivalent to /[ \t\r\n\f\v]/
:
/\s/.match('foo bar') # => #<MatchData " "> /\s/.match('foo') # => nil
/\S/
: Matches a non-whitespace character: equivalent to /[^ \t\r\n\f\v]/
:
/\S/.match(" \t\r\n\f\v foo") # => #<MatchData "f"> /\S/.match(" \t\r\n\f\v") # => nil
/\R/
: Matches a linebreak, platform-independently:
/\R/.match("\r") # => #<MatchData "\r"> # Carriage return (CR) /\R/.match("\n") # => #<MatchData "\n"> # Newline (LF) /\R/.match("\f") # => #<MatchData "\f"> # Formfeed (FF) /\R/.match("\v") # => #<MatchData "\v"> # Vertical tab (VT) /\R/.match("\r\n") # => #<MatchData "\r\n"> # CRLF /\R/.match("\u0085") # => #<MatchData "\u0085"> # Next line (NEL) /\R/.match("\u2028") # => #<MatchData "\u2028"> # Line separator (LSEP) /\R/.match("\u2029") # => #<MatchData "\u2029"> # Paragraph separator (PSEP)
An anchor is a metasequence that matches a zero-width position between characters in the target string.
For a subexpression with no anchor, matching may begin anywhere in the target string:
/real/.match('surrealist') # => #<MatchData "real">
For a subexpression with an anchor, matching must begin at the matched anchor.
Each of these anchors matches a boundary:
^
: Matches the beginning of a line:
/^bar/.match("foo\nbar") # => #<MatchData "bar"> /^ar/.match("foo\nbar") # => nil
$
: Matches the end of a line:
/bar$/.match("foo\nbar") # => #<MatchData "bar"> /ba$/.match("foo\nbar") # => nil
\A
: Matches the beginning of the string:
/\Afoo/.match('foo bar') # => #<MatchData "foo"> /\Afoo/.match(' foo bar') # => nil
\Z
: Matches the end of the string; if string ends with a single newline, it matches just before the ending newline:
/foo\Z/.match('bar foo') # => #<MatchData "foo"> /foo\Z/.match('foo bar') # => nil /foo\Z/.match("bar foo\n") # => #<MatchData "foo"> /foo\Z/.match("bar foo\n\n") # => nil
\z
: Matches the end of the string:
/foo\z/.match('bar foo') # => #<MatchData "foo"> /foo\z/.match('foo bar') # => nil /foo\z/.match("bar foo\n") # => nil
\b
: Matches word boundary when not inside brackets; matches backspace ("0x08"
) when inside brackets:
/foo\b/.match('foo bar') # => #<MatchData "foo"> /foo\b/.match('foobar') # => nil
\B
: Matches non-word boundary:
/foo\B/.match('foobar') # => #<MatchData "foo"> /foo\B/.match('foo bar') # => nil
\G
: Matches first matching position:
In methods like String#gsub
and String#scan
, it changes on each iteration. It initially matches the beginning of subject, and in each following iteration it matches where the last match finished.
" a b c".gsub(/ /, '_') # => "____a_b_c" " a b c".gsub(/\G /, '_') # => "____a b c"
In methods like Regexp#match
and String#match
that take an optional offset, it matches where the search begins.
"hello, world".match(/,/, 3) # => #<MatchData ","> "hello, world".match(/\G,/, 3) # => nil
Lookahead anchors:
(?=pat)
: Positive lookahead assertion: ensures that the following characters match pat, but doesn’t include those characters in the matched substring.
(?!pat)
: Negative lookahead assertion: ensures that the following characters do not match pat, but doesn’t include those characters in the matched substring.
Lookbehind anchors:
(?<=pat)
: Positive lookbehind assertion: ensures that the preceding characters match pat, but doesn’t include those characters in the matched substring.
(?<!pat)
: Negative lookbehind assertion: ensures that the preceding characters do not match pat, but doesn’t include those characters in the matched substring.
The pattern below uses positive lookahead and positive lookbehind to match text appearing in … tags without including the tags in the match:
/(?<=<b>)\w+(?=<\/b>)/.match("Fortune favors the <b>bold</b>.") # => #<MatchData "bold">
\K
: Match reset: the matched content preceding \K
in the regexp is excluded from the result. For example, the following two regexps are almost equivalent:
/ab\Kc/.match('abc') # => #<MatchData "c"> /(?<=ab)c/.match('abc') # => #<MatchData "c">
These match same string and $&
equals 'c'
, while the matched position is different.
As are the following two regexps:
/(a)\K(b)\Kc/ /(?<=(?<=(a))(b))c/
The vertical bar metacharacter (|
) may be used within parentheses to express alternation: two or more subexpressions any of which may match the target string.
Two alternatives:
re = /(a|b)/ re.match('foo') # => nil re.match('bar') # => #<MatchData "b" 1:"b">
Four alternatives:
re = /(a|b|c|d)/ re.match('shazam') # => #<MatchData "a" 1:"a"> re.match('cold') # => #<MatchData "c" 1:"c">
Each alternative is a subexpression, and may be composed of other subexpressions:
re = /([a-c]|[x-z])/ re.match('bar') # => #<MatchData "b" 1:"b"> re.match('ooz') # => #<MatchData "z" 1:"z">
Method
Regexp.union
provides a convenient way to construct a regexp with alternatives.
A simple regexp matches one character:
/\w/.match('Hello') # => #<MatchData "H">
An added quantifier specifies how many matches are required or allowed:
*
- Matches zero or more times:
/\w*/.match('') # => #<MatchData ""> /\w*/.match('x') # => #<MatchData "x"> /\w*/.match('xyz') # => #<MatchData "yz">
+
- Matches one or more times:
/\w+/.match('') # => nil /\w+/.match('x') # => #<MatchData "x"> /\w+/.match('xyz') # => #<MatchData "xyz">
?
- Matches zero or one times:
/\w?/.match('') # => #<MatchData ""> /\w?/.match('x') # => #<MatchData "x"> /\w?/.match('xyz') # => #<MatchData "x">
{
n}
- Matches exactly n times:
/\w{2}/.match('') # => nil /\w{2}/.match('x') # => nil /\w{2}/.match('xyz') # => #<MatchData "xy">
{
min,}
- Matches min or more times:
/\w{2,}/.match('') # => nil /\w{2,}/.match('x') # => nil /\w{2,}/.match('xy') # => #<MatchData "xy"> /\w{2,}/.match('xyz') # => #<MatchData "xyz">
{,
max}
- Matches max or fewer times:
/\w{,2}/.match('') # => #<MatchData ""> /\w{,2}/.match('x') # => #<MatchData "x"> /\w{,2}/.match('xyz') # => #<MatchData "xy">
{
min,
max}
- Matches at least min times and at most max times:
/\w{1,2}/.match('') # => nil /\w{1,2}/.match('x') # => #<MatchData "x"> /\w{1,2}/.match('xyz') # => #<MatchData "xy">
Quantifier matching may be greedy, lazy, or possessive:
In greedy matching, as many occurrences as possible are matched while still allowing the overall match to succeed. Greedy quantifiers: *
, +
, ?
, {min, max}
and its variants.
In lazy matching, the minimum number of occurrences are matched. Lazy quantifiers: *?
, +?
, ??
, {min, max}?
and its variants.
In possessive matching, once a match is found, there is no backtracking; that match is retained, even if it jeopardises the overall match. Possessive quantifiers: *+
, ++
, ?+
. Note that {min, max}
and its variants do not support possessive matching.
More:
About greedy and lazy matching, see Choosing Minimal or Maximal Repetition.
About possessive matching, see Eliminate Needless Backtracking.
A simple regexp has (at most) one match:
re = /\d\d\d\d-\d\d-\d\d/ re.match('1943-02-04') # => #<MatchData "1943-02-04"> re.match('1943-02-04').size # => 1 re.match('foo') # => nil
Adding one or more pairs of parentheses, (subexpression)
, defines groups, which may result in multiple matched substrings, called captures:
re = /(\d\d\d\d)-(\d\d)-(\d\d)/ re.match('1943-02-04') # => #<MatchData "1943-02-04" 1:"1943" 2:"02" 3:"04"> re.match('1943-02-04').size # => 4
The first capture is the entire matched string; the other captures are the matched substrings from the groups.
A group may have a quantifier:
re = /July 4(th)?/ re.match('July 4') # => #<MatchData "July 4" 1:nil> re.match('July 4th') # => #<MatchData "July 4th" 1:"th"> re = /(foo)*/ re.match('') # => #<MatchData "" 1:nil> re.match('foo') # => #<MatchData "foo" 1:"foo"> re.match('foofoo') # => #<MatchData "foofoo" 1:"foo"> re = /(foo)+/ re.match('') # => nil re.match('foo') # => #<MatchData "foo" 1:"foo"> re.match('foofoo') # => #<MatchData "foofoo" 1:"foo">
The returned MatchData object gives access to the matched substrings:
re = /(\d\d\d\d)-(\d\d)-(\d\d)/ md = re.match('1943-02-04') # => #<MatchData "1943-02-04" 1:"1943" 2:"02" 3:"04"> md[0] # => "1943-02-04" md[1] # => "1943" md[2] # => "02" md[3] # => "04"
A group may be made non-capturing; it is still a group (and, for example, can have a quantifier), but its matching substring is not included among the captures.
A non-capturing group begins with ?:
(inside the parentheses):
# Don't capture the year. re = /(?:\d\d\d\d)-(\d\d)-(\d\d)/ md = re.match('1943-02-04') # => #<MatchData "1943-02-04" 1:"02" 2:"04">
A group match may also be referenced within the regexp itself; such a reference is called a backreference
:
/[csh](..) [csh]\1 in/.match('The cat sat in the hat') # => #<MatchData "cat sat in" 1:"at">
This table shows how each subexpression in the regexp above matches a substring in the target string:
| Subexpression in Regexp | Matching Substring in Target String | |---------------------------|-------------------------------------| | First '[csh]' | Character 'c' | | '(..)' | First substring 'at' | | First space ' ' | First space character ' ' | | Second '[csh]' | Character 's' | | '\1' (backreference 'at') | Second substring 'at' | | ' in' | Substring ' in' |
A regexp may contain any number of groups:
For a large number of groups:
The ordinary \n
notation applies only for n in range (1..9).
The MatchData[n]
notation applies for any non-negative n.
\0
is a special backreference, referring to the entire matched string; it may not be used within the regexp itself, but may be used outside it (for example, in a substitution method call):
'The cat sat in the hat'.gsub(/[csh]at/, '\0s') # => "The cats sats in the hats"
As seen above, a capture can be referred to by its number. A capture can also have a name, prefixed as ?<name>
or ?'name'
, and the name (symbolized) may be used as an index in MatchData[]
:
md = /\$(?<dollars>\d+)\.(?'cents'\d+)/.match("$3.67") # => #<MatchData "$3.67" dollars:"3" cents:"67"> md[:dollars] # => "3" md[:cents] # => "67" # The capture numbers are still valid. md[2] # => "67"
When a regexp contains a named capture, there are no unnamed captures:
/\$(?<dollars>\d+)\.(\d+)/.match("$3.67") # => #<MatchData "$3.67" dollars:"3">
A named group may be backreferenced as \k<name>
:
/(?<vowel>[aeiou]).\k<vowel>.\k<vowel>/.match('ototomy') # => #<MatchData "ototo" vowel:"o">
When (and only when) a regexp contains named capture groups and appears before the =~
operator, the captured substrings are assigned to local variables with corresponding names:
/\$(?<dollars>\d+)\.(?<cents>\d+)/ =~ '$3.67' dollars # => "3" cents # => "67"
Method
Regexp#named_captures
returns a hash of the capture names and substrings; method Regexp#names
returns an array of the capture names.
A group may be made atomic with (?>
subexpression)
.
This causes the subexpression to be matched independently of the rest of the expression, so that the matched substring becomes fixed for the remainder of the match, unless the entire subexpression must be abandoned and subsequently revisited.
In this way subexpression is treated as a non-divisible whole. Atomic grouping is typically used to optimise patterns to prevent needless backtracking .
Example (without atomic grouping):
/".*"/.match('"Quote"') # => #<MatchData "\"Quote\"">
Analysis:
The leading subexpression "
in the pattern matches the first character "
in the target string.
The next subexpression .*
matches the next substring Quote“
(including the trailing double-quote).
Now there is nothing left in the target string to match the trailing subexpression "
in the pattern; this would cause the overall match to fail.
The matched substring is backtracked by one position: Quote
.
The final subexpression "
now matches the final substring "
, and the overall match succeeds.
If subexpression .*
is grouped atomically, the backtracking is disabled, and the overall match fails:
/"(?>.*)"/.match('"Quote"') # => nil
Atomic grouping can affect performance; see Atomic Group.
As seen above, a backreference number (\n
) or name (\k<name>
) gives access to a captured substring; the corresponding regexp subexpression may also be accessed, via the number (\gn
) or name (\g<name>
):
/\A(?<paren>\(\g<paren>*\))*\z/.match('(())') # ^1 # ^2 # ^3 # ^4 # ^5 # ^6 # ^7 # ^8 # ^9 # ^10
The pattern:
Matches at the beginning of the string, i.e. before the first character.
Enters a named group paren
.
Matches the first character in the string, '('
.
Calls the paren
group again, i.e. recurses back to the second step.
Re-enters the paren
group.
Matches the second character in the string, '('
.
Attempts to call paren
a third time, but fails because doing so would prevent an overall successful match.
Matches the third character in the string, ')'
; marks the end of the second recursive call
Matches the fourth character in the string, ')'
.
Matches the end of the string.
See Subexpression calls.
The conditional construct takes the form (?(cond)yes|no)
, where:
cond may be a capture number or name.
The match to be applied is yes if cond is captured; otherwise the match to be applied is no.
If not needed, |no
may be omitted.
Examples:
re = /\A(foo)?(?(1)(T)|(F))\z/ re.match('fooT') # => #<MatchData "fooT" 1:"foo" 2:"T" 3:nil> re.match('F') # => #<MatchData "F" 1:nil 2:nil 3:"F"> re.match('fooF') # => nil re.match('T') # => nil re = /\A(?<xyzzy>foo)?(?(<xyzzy>)(T)|(F))\z/ re.match('fooT') # => #<MatchData "fooT" xyzzy:"foo"> re.match('F') # => #<MatchData "F" xyzzy:nil> re.match('fooF') # => nil re.match('T') # => nil
The absence operator is a special group that matches anything which does not match the contained subexpressions.
/(?~real)/.match('surrealist') # => #<MatchData "surrea"> /(?~real)ist/.match('surrealist') # => #<MatchData "ealist"> /sur(?~real)ist/.match('surrealist') # => nil
The /\p{property_name}/
construct (with lowercase p
) matches characters using a Unicode property name, much like a character class; property Alpha
specifies alphabetic characters:
/\p{Alpha}/.match('a') # => #<MatchData "a"> /\p{Alpha}/.match('1') # => nil
A property can be inverted by prefixing the name with a caret character (^
):
/\p{^Alpha}/.match('1') # => #<MatchData "1"> /\p{^Alpha}/.match('a') # => nil
Or by using \P
(uppercase P
):
/\P{Alpha}/.match('1') # => #<MatchData "1"> /\P{Alpha}/.match('a') # => nil
See Unicode Properties for regexps based on the numerous properties.
Some commonly-used properties correspond to POSIX bracket expressions:
/\p{Alnum}/
: Alphabetic and numeric character
/\p{Alpha}/
: Alphabetic character
/\p{Blank}/
: Space or tab
/\p{Cntrl}/
: Control character
/\p{Digit}/
: Digit characters, and similar)
/\p{Lower}/
: Lowercase alphabetical character
/\p{Print}/
: Like \p{Graph}
, but includes the space character
/\p{Punct}/
: Punctuation character
/\p{Space}/
: Whitespace character ([:blank:]
, newline, carriage return, etc.)
/\p{Upper}/
: Uppercase alphabetical
/\p{XDigit}/
: Digit allowed in a hexadecimal number (i.e., 0-9a-fA-F)
These are also commonly used:
/\p{Emoji}/
: Unicode emoji.
/\p{Graph}/
: Characters excluding /\p{Cntrl}/
and /\p{Space}/
. Note that invisible characters under the Unicode “Format” category are included.
/\p{Word}/
: A member in one of these Unicode character categories (see below) or having one of these Unicode properties:
Unicode categories:
Mark
(M
).
Decimal Number
(Nd
)
Connector Punctuation
(Pc
).
Unicode properties:
Alpha
Join_Control
/\p{ASCII}/
: A character in the ASCII character set.
/\p{Any}/
: Any Unicode character (including unassigned characters).
/\p{Assigned}/
: An assigned character.
A Unicode character category name:
May be either its full name or its abbreviated name.
Is case-insensitive.
Treats a space, a hyphen, and an underscore as equivalent.
Examples:
/\p{lu}/ # => /\p{lu}/ /\p{LU}/ # => /\p{LU}/ /\p{Uppercase Letter}/ # => /\p{Uppercase Letter}/ /\p{Uppercase_Letter}/ # => /\p{Uppercase_Letter}/ /\p{UPPERCASE-LETTER}/ # => /\p{UPPERCASE-LETTER}/
Below are the Unicode character category abbreviations and names. Enumerations of characters in each category are at the links.
Letters:
L
, Letter
: LC
, Lm
, or Lo
.
LC
, Cased_Letter
: Ll
, Lt
, or Lu
.
Marks:
M
, Mark
: Mc
, Me
, or Mn
.
Numbers:
N
, Number
: Nd
, Nl
, or No
.
Punctuation:
P
, Punctuation
: Pc
, Pd
, Pe
, Pf
, Pi
, Po
, or Ps
.
S
, Symbol
: Sc
, Sk
, Sm
, or So
.
Z
, Separator
: Zl
, Zp
, or Zs
.
C
, Other
: Cc
, Cf
, Cn
, Co
, or Cs
.
Among the Unicode properties are:
A POSIX bracket expression is also similar to a character class. These expressions provide a portable alternative to the above, with the added benefit of encompassing non-ASCII characters:
/\d/
matches only ASCII decimal digits 0
through 9
.
/[[:digit:]]/
matches any character in the Unicode Decimal Number
(Nd
) category; see below.
The POSIX bracket expressions:
/[[:digit:]]/
: Matches a Unicode digit:
/[[:digit:]]/.match('9') # => #<MatchData "9"> /[[:digit:]]/.match("\u1fbf9") # => #<MatchData "9">
/[[:xdigit:]]/
: Matches a digit allowed in a hexadecimal number; equivalent to [0-9a-fA-F]
.
/[[:upper:]]/
: Matches a Unicode uppercase letter:
/[[:upper:]]/.match('A') # => #<MatchData "A"> /[[:upper:]]/.match("\u00c6") # => #<MatchData "Æ">
/[[:lower:]]/
: Matches a Unicode lowercase letter:
/[[:lower:]]/.match('a') # => #<MatchData "a"> /[[:lower:]]/.match("\u01fd") # => #<MatchData "ǽ">
/[[:alpha:]]/
: Matches /[[:upper:]]/
or /[[:lower:]]/
.
/[[:alnum:]]/
: Matches /[[:alpha:]]/
or /[[:digit:]]/
.
/[[:space:]]/
: Matches Unicode space character:
/[[:space:]]/.match(' ') # => #<MatchData " "> /[[:space:]]/.match("\u2005") # => #<MatchData " ">
/[[:blank:]]/
: Matches /[[:space:]]/
or tab character:
/[[:blank:]]/.match(' ') # => #<MatchData " "> /[[:blank:]]/.match("\u2005") # => #<MatchData " "> /[[:blank:]]/.match("\t") # => #<MatchData "\t">
/[[:cntrl:]]/
: Matches Unicode control character:
/[[:cntrl:]]/.match("\u0000") # => #<MatchData "\u0000"> /[[:cntrl:]]/.match("\u009f") # => #<MatchData "\u009F">
/[[:graph:]]/
: Matches any character except /[[:space:]]/
or /[[:cntrl:]]/
.
/[[:print:]]/
: Matches /[[:graph:]]/
or space character.
/[[:punct:]]/
: Matches any (Unicode punctuation character}[www.compart.com/en/unicode/category/Po]:
Ruby
also supports these (non-POSIX) bracket expressions:
/[[:ascii:]]/
: Matches a character in the ASCII character set.
/[[:word:]]/
: Matches a character in one of these Unicode character categories or having one of these Unicode properties:
Unicode categories:
Mark
(M
).
Decimal Number
(Nd
)
Connector Punctuation
(Pc
).
Unicode properties:
Alpha
Join_Control
A comment may be included in a regexp pattern using the (?#
comment)
construct, where comment is a substring that is to be ignored. arbitrary text ignored by the regexp engine:
/foo(?#Ignore me)bar/.match('foobar') # => #<MatchData "foobar">
The comment may not include an unescaped terminator character.
See also Extended Mode.
Each of these modifiers sets a mode for the regexp:
i
: /pattern/i
sets Case-Insensitive Mode.
m
: /pattern/m
sets Multiline Mode.
x
: /pattern/x
sets Extended Mode.
o
: /pattern/o
sets Interpolation Mode.
Any, all, or none of these may be applied.
Modifiers i
, m
, and x
may be applied to subexpressions:
(?modifier)
turns the mode “on” for ensuing subexpressions
(?-modifier)
turns the mode “off” for ensuing subexpressions
(?modifier:subexp)
turns the mode “on” for subexp within the group
(?-modifier:subexp)
turns the mode “off” for subexp within the group
Example:
re = /(?i)te(?-i)st/ re.match('test') # => #<MatchData "test"> re.match('TEst') # => #<MatchData "TEst"> re.match('TEST') # => nil re.match('teST') # => nil re = /t(?i:e)st/ re.match('test') # => #<MatchData "test"> re.match('tEst') # => #<MatchData "tEst"> re.match('tEST') # => nil
Method
Regexp#options
returns an integer whose value showing the settings for case-insensitivity mode, multiline mode, and extended mode.
By default, a regexp is case-sensitive:
/foo/.match('FOO') # => nil
Modifier i
enables case-insensitive mode:
/foo/i.match('FOO') # => #<MatchData "FOO">
Method
Regexp#casefold?
returns whether the mode is case-insensitive.
The multiline-mode in Ruby
is what is commonly called a “dot-all mode”:
Without the m
modifier, the subexpression .
does not match newlines:
/a.c/.match("a\nc") # => nil
With the modifier, it does match:
/a.c/m.match("a\nc") # => #<MatchData "a\nc">
Unlike other languages, the modifier m
does not affect the anchors ^
and $
. These anchors always match at line-boundaries in Ruby
.
Modifier x
enables extended mode, which means that:
Literal white space in the pattern is to be ignored.
Character #
marks the remainder of its containing line as a comment, which is also to be ignored for matching purposes.
In extended mode, whitespace and comments may be used to form a self-documented regexp.
Regexp
not in extended mode (matches some Roman numerals):
pattern = '^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$' re = /#{pattern}/ re.match('MCMXLIII') # => #<MatchData "MCMXLIII" 1:"CM" 2:"XL" 3:"III">
Regexp
in extended mode:
pattern = <<-EOT ^ # beginning of string M{0,3} # thousands - 0 to 3 Ms (CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 Cs), # or 500-800 (D, followed by 0 to 3 Cs) (XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 Xs), # or 50-80 (L, followed by 0 to 3 Xs) (IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 Is), # or 5-8 (V, followed by 0 to 3 Is) $ # end of string EOT re = /#{pattern}/x re.match('MCMXLIII') # => #<MatchData "MCMXLIII" 1:"CM" 2:"XL" 3:"III">
Modifier o
means that the first time a literal regexp with interpolations is encountered, the generated Regexp
object is saved and used for all future evaluations of that literal regexp. Without modifier o
, the generated Regexp
is not saved, so each evaluation of the literal regexp generates a new Regexp
object.
Without modifier o
:
def letters; sleep 5; /[A-Z][a-z]/; end words = %w[abc def xyz] start = Time.now words.each {|word| word.match(/\A[#{letters}]+\z/) } Time.now - start # => 15.0174892
With modifier o
:
start = Time.now words.each {|word| word.match(/\A[#{letters}]+\z/o) } Time.now - start # => 5.0010866
Note that if the literal regexp does not have interpolations, the o
behavior is the default.
By default, a regexp with only US-ASCII characters has US-ASCII encoding:
re = /foo/ re.source.encoding # => #<Encoding:US-ASCII> re.encoding # => #<Encoding:US-ASCII>
A regular expression containing non-US-ASCII characters is assumed to use the source encoding. This can be overridden with one of the following modifiers.
/pat/n
: US-ASCII if only containing US-ASCII characters, otherwise ASCII-8BIT:
/foo/n.encoding # => #<Encoding:US-ASCII> /foo\xff/n.encoding # => #<Encoding:ASCII-8BIT> /foo\x7f/n.encoding # => #<Encoding:US-ASCII>
/pat/u
: UTF-8
/foo/u.encoding # => #<Encoding:UTF-8>
/pat/e
: EUC-JP
/foo/e.encoding # => #<Encoding:EUC-JP>
/pat/s
: Windows-31J
/foo/s.encoding # => #<Encoding:Windows-31J>
A regexp can be matched against a target string when either:
They have the same encoding.
The regexp’s encoding is a fixed encoding and the string contains only ASCII characters. Method
Regexp#fixed_encoding?
returns whether the regexp has a fixed encoding.
If a match between incompatible encodings is attempted an Encoding::CompatibilityError
exception is raised.
Example:
re = eval("# encoding: ISO-8859-1\n/foo\\xff?/") re.encoding # => #<Encoding:ISO-8859-1> re =~ "foo".encode("UTF-8") # => 0 re =~ "foo\u0100" # Raises Encoding::CompatibilityError
The encoding may be explicitly fixed by including Regexp::FIXEDENCODING
in the second argument for Regexp.new
:
# Regexp with encoding ISO-8859-1. re = Regexp.new("a".force_encoding('iso-8859-1'), Regexp::FIXEDENCODING) re.encoding # => #<Encoding:ISO-8859-1> # Target string with encoding UTF-8. s = "a\u3042" s.encoding # => #<Encoding:UTF-8> re.match(s) # Raises Encoding::CompatibilityError.
When either a regexp source or a target string comes from untrusted input, malicious values could become a denial-of-service attack; to prevent such an attack, it is wise to set a timeout.
Regexp has two timeout values:
A class default timeout, used for a regexp whose instance timeout is nil
; this default is initially nil
, and may be set by method Regexp.timeout=
:
Regexp.timeout # => nil Regexp.timeout = 3.0 Regexp.timeout # => 3.0
An instance timeout, which defaults to nil
and may be set in Regexp.new
:
re = Regexp.new('foo', timeout: 5.0) re.timeout # => 5.0
When regexp.timeout is nil
, the timeout “falls through” to Regexp.timeout
; when regexp.timeout is non-nil
, that value controls timing out:
| regexp.timeout Value | Regexp.timeout Value | Result | |----------------------|----------------------|-----------------------------| | nil | nil | Never times out. | | nil | Float | Times out in Float seconds. | | Float | Any | Times out in Float seconds. |
For certain values of the pattern and target string, matching time can grow polynomially or exponentially in relation to the input size; the potential vulnerability arising from this is the regular expression denial-of-service (ReDoS) attack.
Regexp matching can apply an optimization to prevent ReDoS attacks. When the optimization is applied, matching time increases linearly (not polynomially or exponentially) in relation to the input size, and a ReDoS attach is not possible.
This optimization is applied if the pattern meets these criteria:
No backreferences.
No subexpression calls.
No nested lookaround anchors or atomic groups.
No nested quantifiers with counting (i.e. no nested {n}
, {min,}
, {,max}
, or {min,max}
style quantifiers)
You can use method Regexp.linear_time?
to determine whether a pattern meets these criteria:
Regexp.linear_time?(/a*/) # => true Regexp.linear_time?('a*') # => true Regexp.linear_time?(/(a*)\1/) # => false
However, an untrusted source may not be safe even if the method returns true
, because the optimization uses memoization (which may invoke large memory consumption).
Read (online PDF books):
Mastering Regular Expressions by Jeffrey E.F. Friedl.
Regular Expressions Cookbook by Jan Goyvaerts & Steven Levithan.
Explore, test (interactive online editor):
TCPServer
represents a TCP/IP server socket.
A simple TCP server may look like:
require 'socket' server = TCPServer.new 2000 # Server bind to port 2000 loop do client = server.accept # Wait for a client to connect client.puts "Hello !" client.puts "Time is #{Time.now}" client.close end
A more usable server (serving multiple clients):
require 'socket' server = TCPServer.new 2000 loop do Thread.start(server.accept) do |client| client.puts "Hello !" client.puts "Time is #{Time.now}" client.close end end
UNIXServer
represents a UNIX domain stream server socket.