Returns a copy of the receiver with leading and trailing whitespace removed; see Whitespace in Strings:
whitespace = "\x00\t\n\v\f\r " s = whitespace + 'abc' + whitespace s # => "\u0000\t\n\v\f\r abc\u0000\t\n\v\f\r " s.strip # => "abc"
Related: String#lstrip
, String#rstrip
.
A String
object has an arbitrary sequence of bytes, typically representing text or binary data. A String
object may be created using String::new
or as literals.
String
objects differ from Symbol
objects in that Symbol
objects are designed to be used as identifiers, instead of text or data.
You can create a String
object explicitly with:
A string literal.
A string literal.
You can convert certain objects to Strings with:
Some String
methods modify self
. Typically, a method whose name ends with !
modifies self
and returns self
; often, a similarly named method (without the !
) returns a new string.
In general, if both bang and non-bang versions of a method exist, the bang method mutates and the non-bang method does not. However, a method without a bang can also mutate, such as String#replace
.
These methods perform substitutions:
String#sub
: One substitution (or none); returns a new string.
String#sub!
: One substitution (or none); returns self
if any changes, nil
otherwise.
String#gsub
: Zero or more substitutions; returns a new string.
String#gsub!
: Zero or more substitutions; returns self
if any changes, nil
otherwise.
Each of these methods takes:
A first argument, pattern
(String
or Regexp
), that specifies the substring(s) to be replaced.
Either of the following:
The examples in this section mostly use the String#sub
and String#gsub
methods; the principles illustrated apply to all four substitution methods.
Argument pattern
Argument pattern
is commonly a regular expression:
s = 'hello' s.sub(/[aeiou]/, '*') # => "h*llo" s.gsub(/[aeiou]/, '*') # => "h*ll*" s.gsub(/[aeiou]/, '') # => "hll" s.sub(/ell/, 'al') # => "halo" s.gsub(/xyzzy/, '*') # => "hello" 'THX1138'.gsub(/\d+/, '00') # => "THX00"
When pattern
is a string, all its characters are treated as ordinary characters (not as Regexp
special characters):
'THX1138'.gsub('\d+', '00') # => "THX1138"
String
replacement
If replacement
is a string, that string determines the replacing string that is substituted for the matched text.
Each of the examples above uses a simple string as the replacing string.
String
replacement
may contain back-references to the pattern’s captures:
\n
(n is a non-negative integer) refers to $n
.
\k<name>
refers to the named capture name
.
See Regexp
for details.
Note that within the string replacement
, a character combination such as $&
is treated as ordinary text, not as a special match variable. However, you may refer to some special match variables using these combinations:
\&
and \0
correspond to $&
, which contains the complete matched text.
\'
corresponds to $'
, which contains the string after the match.
\`
corresponds to $`
, which contains the string before the match.
\+
corresponds to $+
, which contains the last capture group.
See Regexp
for details.
Note that \\
is interpreted as an escape, i.e., a single backslash.
Note also that a string literal consumes backslashes. See string literal for details about string literals.
A back-reference is typically preceded by an additional backslash. For example, if you want to write a back-reference \&
in replacement
with a double-quoted string literal, you need to write "..\\&.."
.
If you want to write a non-back-reference string \&
in replacement
, you need to first escape the backslash to prevent this method from interpreting it as a back-reference, and then you need to escape the backslashes again to prevent a string literal from consuming them: "..\\\\&.."
.
You may want to use the block form to avoid excessive backslashes.
Hash replacement
If the argument replacement
is a hash, and pattern
matches one of its keys, the replacing string is the value for that key:
h = {'foo' => 'bar', 'baz' => 'bat'} 'food'.sub('foo', h) # => "bard"
Note that a symbol key does not match:
h = {foo: 'bar', baz: 'bat'} 'food'.sub('foo', h) # => "d"
Block
In the block form, the current match string is passed to the block; the block’s return value becomes the replacing string:
s = '@' '1234'.gsub(/\d/) { |match| s.succ! } # => "ABCD"
Special match variables such as $1
, $2
, $`
, $&
, and $'
are set appropriately.
In the class String
, whitespace is defined as a contiguous sequence of characters consisting of any mixture of the following:
NL (null): "\x00"
, "\u0000"
.
HT (horizontal tab): "\x09"
, "\t"
.
LF (line feed): "\x0a"
, "\n"
.
VT (vertical tab): "\x0b"
, "\v"
.
FF (form feed): "\x0c"
, "\f"
.
CR (carriage return): "\x0d"
, "\r"
.
SP (space): "\x20"
, " "
.
Whitespace is relevant for the following methods:
String
Slices A slice of a string is a substring selected by certain criteria.
These instance methods utilize slicing:
String#[]
(aliased as String#slice
): Returns a slice copied from self
.
String#[]=
: Mutates self
with the slice replaced.
String#slice!
: Mutates self
with the slice removed and returns the removed slice.
Each of the above methods takes arguments that determine the slice to be copied or replaced.
The arguments have several forms. For a string string
, the forms are:
string[index]
string[start, length]
string[range]
string[regexp, capture = 0]
string[substring]
string[index]
When a non-negative integer argument index
is given, the slice is the 1-character substring found in self
at character offset index
:
'bar'[0] # => "b" 'bar'[2] # => "r" 'bar'[20] # => nil 'тест'[2] # => "с" 'こんにちは'[4] # => "は"
When a negative integer index
is given, the slice begins at the offset given by counting backward from the end of self
:
'bar'[-3] # => "b" 'bar'[-1] # => "r" 'bar'[-20] # => nil
string[start, length]
When non-negative integer arguments start
and length
are given, the slice begins at character offset start
, if it exists, and continues for length
characters, if available:
'foo'[0, 2] # => "fo" 'тест'[1, 2] # => "ес" 'こんにちは'[2, 2] # => "にち" # Zero length. 'foo'[2, 0] # => "" # Length not entirely available. 'foo'[1, 200] # => "oo" # Start out of range. 'foo'[4, 2] # => nil
Special case: if start
equals the length of self
, the slice is a new empty string:
'foo'[3, 2] # => "" 'foo'[3, 200] # => ""
When a negative start
and non-negative length
are given, the slice begins by counting backward from the end of self
, and continues for length
characters, if available:
'foo'[-2, 2] # => "oo" 'foo'[-2, 200] # => "oo" # Start out of range. 'foo'[-4, 2] # => nil
When a negative length
is given, there is no slice:
'foo'[1, -1] # => nil 'foo'[-2, -1] # => nil
string[range]
When a Range
argument range
is given, it creates a substring of string
using the indices in range
. The slice is then determined as above:
'foo'[0..1] # => "fo" 'foo'[0, 2] # => "fo" 'foo'[2...2] # => "" 'foo'[2, 0] # => "" 'foo'[1..200] # => "oo" 'foo'[1, 200] # => "oo" 'foo'[4..5] # => nil 'foo'[4, 2] # => nil 'foo'[-4..-3] # => nil 'foo'[-4, 2] # => nil 'foo'[3..4] # => "" 'foo'[3, 2] # => "" 'foo'[-2..-1] # => "oo" 'foo'[-2, 2] # => "oo" 'foo'[-2..197] # => "oo" 'foo'[-2, 200] # => "oo"
string[regexp, capture = 0]
When the Regexp
argument regexp
is given, and the capture
argument is 0
, the slice is the first matching substring found in self
:
'foo'[/o/] # => "o" 'foo'[/x/] # => nil s = 'hello there' s[/[aeiou](.)\1/] # => "ell" s[/[aeiou](.)\1/, 0] # => "ell"
If the argument capture
is provided and not 0
, it should be either a capture group index (integer) or a capture group name (String
or Symbol
); the slice is the specified capture (see Groups at Regexp
and Captures):
s = 'hello there' s[/[aeiou](.)\1/, 1] # => "l" s[/(?<vowel>[aeiou])(?<non_vowel>[^aeiou])/, "non_vowel"] # => "l" s[/(?<vowel>[aeiou])(?<non_vowel>[^aeiou])/, :vowel] # => "e"
If an invalid capture group index is given, there is no slice. If an invalid capture group name is given, IndexError
is raised.
string[substring]
When the single String
argument substring
is given, it returns the substring from self
if found, otherwise nil
:
'foo'['oo'] # => "oo" 'foo'['xx'] # => nil
First, what’s elsewhere. Class
String
:
Inherits from the Object class.
Includes the Comparable module.
Here, class String
provides methods that are useful for:
::new
: Returns a new string.
::try_convert
: Returns a new string created from a given object.
+@
: Returns a string that is not frozen: self
if not frozen; self.dup
otherwise.
-@
(aliased as dedup
): Returns a string that is frozen: self
if already frozen; self.freeze
otherwise.
freeze
: Freezes self
if not already frozen; returns self
.
Counts
length
(aliased as size
): Returns the count of characters (not bytes).
empty?
: Returns true
if self.length
is zero; false
otherwise.
bytesize
: Returns the count of bytes.
count
: Returns the count of substrings matching given strings.
Substrings
=~
: Returns the index of the first substring that matches a given Regexp
or other object; returns nil
if no match is found.
byteindex
: Returns the byte index of the first occurrence of a given substring.
index
: Returns the index of the first occurrence of a given substring; returns nil
if none found.
rindex
: Returns the index of the last occurrence of a given substring; returns nil
if none found.
include?
: Returns true
if the string contains a given substring; false
otherwise.
match
: Returns a MatchData
object if the string matches a given Regexp
; nil
otherwise.
match?
: Returns true
if the string matches a given Regexp
; false
otherwise.
start_with?
: Returns true
if the string begins with any of the given substrings.
end_with?
: Returns true
if the string ends with any of the given substrings.
Encodings
encoding
: Returns the Encoding
object that represents the encoding of the string.
unicode_normalized?
: Returns true
if the string is in Unicode normalized form; false
otherwise.
valid_encoding?
: Returns true
if the string contains only characters that are valid for its encoding.
ascii_only?
: Returns true
if the string has only ASCII characters; false
otherwise.
Other
sum
: Returns a basic checksum for the string: the sum of each byte.
hash
: Returns the integer hash code.
==
(aliased as ===
): Returns true
if a given other string has the same content as self
.
eql?
: Returns true
if the content is the same as the given other string.
<=>
: Returns -1, 0, or 1 as a given other string is smaller than, equal to, or larger than self
.
casecmp
: Ignoring case, returns -1, 0, or 1 as a given other string is smaller than, equal to, or larger than self
.
casecmp?
: Returns true
if the string is equal to a given string after Unicode case folding; false
otherwise.
Each of these methods modifies self
.
Insertion
insert
: Returns self
with a given string inserted at a specified offset.
<<
: Returns self
concatenated with a given string or integer.
append_as_bytes
: Returns self
concatenated with strings without performing any encoding validation or conversion.
Substitution
sub!
: Replaces the first substring that matches a given pattern with a given replacement string; returns self
if any changes, nil
otherwise.
gsub!
: Replaces each substring that matches a given pattern with a given replacement string; returns self
if any changes, nil
otherwise.
succ!
(aliased as next!
): Returns self
modified to become its own successor.
initialize_copy
(aliased as replace
): Returns self
with its entire content replaced by a given string.
reverse!
: Returns self
with its characters in reverse order.
setbyte
: Sets the byte at a given integer offset to a given value; returns the argument.
tr!
: Replaces specified characters in self
with specified replacement characters; returns self
if any changes, nil
otherwise.
tr_s!
: Replaces specified characters in self
with specified replacement characters, removing duplicates from the substrings that were modified; returns self
if any changes, nil
otherwise.
Casing
capitalize!
: Upcases the initial character and downcases all others; returns self
if any changes, nil
otherwise.
downcase!
: Downcases all characters; returns self
if any changes, nil
otherwise.
upcase!
: Upcases all characters; returns self
if any changes, nil
otherwise.
swapcase!
: Upcases each downcase character and downcases each upcase character; returns self
if any changes, nil
otherwise.
Encoding
encode!
: Returns self
with all characters transcoded from one encoding to another.
unicode_normalize!
: Unicode-normalizes self
; returns self
.
scrub!
: Replaces each invalid byte with a given character; returns self
.
force_encoding
: Changes the encoding to a given encoding; returns self
.
Deletion
clear
: Removes all content, so that self
is empty; returns self
.
slice!
, []=
: Removes a substring determined by a given index, start/length, range, regexp, or substring.
squeeze!
: Removes contiguous duplicate characters; returns self
.
delete!
: Removes characters as determined by the intersection of substring arguments.
lstrip!
: Removes leading whitespace; returns self
if any changes, nil
otherwise.
rstrip!
: Removes trailing whitespace; returns self
if any changes, nil
otherwise.
strip!
: Removes leading and trailing whitespace; returns self
if any changes, nil
otherwise.
chomp!
: Removes the trailing record separator, if found; returns self
if any changes, nil
otherwise.
chop!
: Removes trailing newline characters if found; otherwise removes the last character; returns self
if any changes, nil
otherwise.
Each of these methods returns a new String
based on self
, often just a modified copy of self
.
Extension
*
: Returns the concatenation of multiple copies of self
.
+
: Returns the concatenation of self
and a given other string.
center
: Returns a copy of self
centered between pad substrings.
concat
: Returns the concatenation of self
with given other strings.
prepend
: Returns the concatenation of a given other string with self
.
ljust
: Returns a copy of self
of a given length, right-padded with a given other string.
rjust
: Returns a copy of self
of a given length, left-padded with a given other string.
Encoding
b
: Returns a copy of self
with ASCII-8BIT encoding.
scrub
: Returns a copy of self
with each invalid byte replaced with a given character.
unicode_normalize
: Returns a copy of self
with each character Unicode-normalized.
encode
: Returns a copy of self
with all characters transcoded from one encoding to another.
Substitution
dump
: Returns a copy of self
with all non-printing characters replaced by xHH notation and all special characters escaped.
undump
: Returns a copy of self
with all \xNN
notations replaced by \uNNNN
notations and all escaped characters unescaped.
sub
: Returns a copy of self
with the first substring matching a given pattern replaced with a given replacement string.
gsub
: Returns a copy of self
with each substring that matches a given pattern replaced with a given replacement string.
succ
(aliased as next
): Returns the string that is the successor to self
.
reverse
: Returns a copy of self
with its characters in reverse order.
tr
: Returns a copy of self
with specified characters replaced with specified replacement characters.
tr_s
: Returns a copy of self
with specified characters replaced with specified replacement characters, removing duplicates from the substrings that were modified.
%
: Returns the string resulting from formatting a given object into self
.
Casing
capitalize
: Returns a copy of self
with the first character upcased and all other characters downcased.
downcase
: Returns a copy of self
with all characters downcased.
upcase
: Returns a copy of self
with all characters upcased.
swapcase
: Returns a copy of self
with all upcase characters downcased and all downcase characters upcased.
Deletion
delete
: Returns a copy of self
with characters removed.
delete_prefix
: Returns a copy of self
with a given prefix removed.
delete_suffix
: Returns a copy of self
with a given suffix removed.
lstrip
: Returns a copy of self
with leading whitespace removed.
rstrip
: Returns a copy of self
with trailing whitespace removed.
strip
: Returns a copy of self
with leading and trailing whitespace removed.
chomp
: Returns a copy of self
with a trailing record separator removed, if found.
chop
: Returns a copy of self
with trailing newline characters or the last character removed.
squeeze
: Returns a copy of self
with contiguous duplicate characters removed.
[]
(aliased as slice
): Returns a substring determined by a given index, start/length, range, regexp, or string.
byteslice
: Returns a substring determined by a given index, start/length, or range.
chr
: Returns the first character.
Duplication
to_s
(aliased as to_str
): If self
is a subclass of String
, returns self
copied into a String
; otherwise, returns self
.
Each of these methods converts the contents of self
to a non-String
.
Characters, Bytes, and Clusters
bytes
: Returns an array of the bytes in self
.
chars
: Returns an array of the characters in self
.
codepoints
: Returns an array of the integer ordinals in self
.
getbyte
: Returns the integer byte at the given index in self
.
grapheme_clusters
: Returns an array of the grapheme clusters in self
.
Splitting
lines
: Returns an array of the lines in self
, as determined by a given record separator.
partition
: Returns a 3-element array determined by the first substring that matches a given substring or regexp.
rpartition
: Returns a 3-element array determined by the last substring that matches a given substring or regexp.
split
: Returns an array of substrings determined by a given delimiter – regexp or string – or, if a block is given, passes those substrings to the block.
Matching
scan
: Returns an array of substrings matching a given regexp or string, or, if a block is given, passes each matching substring to the block.
unpack
: Returns an array of substrings extracted from self
according to a given format.
unpack1
: Returns the first substring extracted from self
according to a given format.
Numerics
hex
: Returns the integer value of the leading characters, interpreted as hexadecimal digits.
oct
: Returns the integer value of the leading characters, interpreted as octal digits.
ord
: Returns the integer ordinal of the first character in self
.
to_i
: Returns the integer value of leading characters, interpreted as an integer.
to_f
: Returns the floating-point value of leading characters, interpreted as a floating-point number.
Strings and Symbols
inspect
: Returns a copy of self
, enclosed in double quotes, with special characters escaped.
intern
(aliased as to_sym
): Returns the symbol corresponding to self
.
each_byte
: Calls the given block with each successive byte in self
.
each_char
: Calls the given block with each successive character in self
.
each_codepoint
: Calls the given block with each successive integer codepoint in self
.
each_grapheme_cluster
: Calls the given block with each successive grapheme cluster in self
.
each_line
: Calls the given block with each successive line in self
, as determined by a given record separator.
upto
: Calls the given block with each string value returned by successive calls to succ
.
IO streams for strings, with access similar to IO
; see IO
.
Examples on this page assume that StringIO has been required:
require 'stringio'
ScriptError
is the superclass for errors raised when a script can not be executed because of a LoadError
, NotImplementedError
or a SyntaxError
. Note these type of ScriptErrors
are not StandardError
and will not be rescued unless it is specified explicitly (or its ancestor Exception
).
Class
Struct provides a convenient way to create a simple class that can store and fetch values.
This example creates a subclass of Struct
, Struct::Customer
; the first argument, a string, is the name of the subclass; the other arguments, symbols, determine the members of the new subclass.
Customer = Struct.new('Customer', :name, :address, :zip) Customer.name # => "Struct::Customer" Customer.class # => Class Customer.superclass # => Struct
Corresponding to each member are two methods, a writer and a reader, that store and fetch values:
methods = Customer.instance_methods false methods # => [:zip, :address=, :zip=, :address, :name, :name=]
An instance of the subclass may be created, and its members assigned values, via method ::new
:
joe = Customer.new("Joe Smith", "123 Maple, Anytown NC", 12345) joe # => #<struct Struct::Customer name="Joe Smith", address="123 Maple, Anytown NC", zip=12345>
The member values may be managed thus:
joe.name # => "Joe Smith" joe.name = 'Joseph Smith' joe.name # => "Joseph Smith"
And thus; note that member name may be expressed as either a string or a symbol:
joe[:name] # => "Joseph Smith" joe[:name] = 'Joseph Smith, Jr.' joe['name'] # => "Joseph Smith, Jr."
See Struct::new
.
First, what’s elsewhere. Class
Struct:
Inherits from class Object.
Includes module Enumerable, which provides dozens of additional methods.
See also Data
, which is a somewhat similar, but stricter concept for defining immutable value objects.
Here, class Struct provides methods that are useful for:
Struct
Subclass ::new
: Returns a new subclass of Struct.
==
: Returns whether a given object is equal to self
, using ==
to compare member values.
eql?
: Returns whether a given object is equal to self
, using eql?
to compare member values.
[]
: Returns the value associated with a given member name.
to_a
(aliased as values
, deconstruct
): Returns the member values in self
as an array.
deconstruct_keys
: Returns a hash of the name/value pairs for given member names.
dig
: Returns the object in nested objects that is specified by a given member name and additional arguments.
members
: Returns an array of the member names.
select
(aliased as filter
): Returns an array of member values from self
, as selected by the given block.
values_at
: Returns an array containing values for given member names.
[]=
: Assigns a given value to a given member name.
each
: Calls a given block with each member name.
each_pair
: Calls a given block with each member name/value pair.
Ripper
is a Ruby
script parser.
You can get information from the parser with event-based style. Information such as abstract syntax trees or simple lexical analysis of the Ruby
program.
Ripper
provides an easy interface for parsing your program into a symbolic expression tree (or S-expression).
Understanding the output of the parser may come as a challenge, it’s recommended you use PP
to format the output for legibility.
require 'ripper' require 'pp' pp Ripper.sexp('def hello(world) "Hello, #{world}!"; end') #=> [:program, [[:def, [:@ident, "hello", [1, 4]], [:paren, [:params, [[:@ident, "world", [1, 10]]], nil, nil, nil, nil, nil, nil]], [:bodystmt, [[:string_literal, [:string_content, [:@tstring_content, "Hello, ", [1, 18]], [:string_embexpr, [[:var_ref, [:@ident, "world", [1, 27]]]]], [:@tstring_content, "!", [1, 33]]]]], nil, nil, nil]]]]
You can see in the example above, the expression starts with :program
.
From here, a method definition at :def
, followed by the method’s identifier :@ident
. After the method’s identifier comes the parentheses :paren
and the method parameters under :params
.
Next is the method body, starting at :bodystmt
(stmt
meaning statement), which contains the full definition of the method.
In our case, we’re simply returning a String
, so next we have the :string_literal
expression.
Within our :string_literal
you’ll notice two @tstring_content
, this is the literal part for Hello,
and !
. Between the two @tstring_content
statements is a :string_embexpr
, where embexpr is an embedded expression. Our expression consists of a local variable, or var_ref
, with the identifier (@ident
) of world
.
ruby 1.9 (support CVS HEAD only)
bison 1.28 or later (Other yaccs do not work)
Ruby
License.
Minero Aoki
aamine@loveruby.net
Raised in case of a stack overflow.
def me_myself_and_i me_myself_and_i end me_myself_and_i
raises the exception:
SystemStackError: stack level too deep
Here we are going to patch StringQuery
to put in the class-level methods so that it can maintain a consistent interface
Query methods that allow categorizing strings based on their context for where they could be valid in a Ruby
syntax tree.
Acts like a StringIO
with reduced API, but without having to require that class.
The original codebase emitted directly to $stderr, but now SyntaxError#detailed_message
needs a string output. To accomplish that we kept the original print infrastructure in place and added this class to accumulate the print output into a string.
AbstractSyntaxTree
provides methods to parse Ruby
code into abstract syntax trees. The nodes in the tree are instances of RubyVM::AbstractSyntaxTree::Node
.
This module is MRI specific as it exposes implementation details of the MRI abstract syntax tree.
This module is experimental and its API is not stable, therefore it might change without notice. As examples, the order of children nodes is not guaranteed, the number of children nodes might change, there is no way to access children nodes by name, etc.
If you are looking for a stable API or an API working under multiple Ruby
implementations, consider using the prism gem, which is the official Ruby
API to parse Ruby
code.
Psych::Stream
is a streaming YAML
emitter. It will not buffer your YAML
, but send it straight to an IO
.
Here is an example use:
stream = Psych::Stream.new($stdout) stream.start stream.push({:foo => 'bar'}) stream.finish
YAML
will be immediately emitted to $stdout with no buffering.
Psych::Stream#start
will take a block and ensure that Psych::Stream#finish
is called, so you can do this form:
stream = Psych::Stream.new($stdout) stream.start do |em| em.push(:foo => 'bar') end
Subclass of Zlib::Error
When zlib returns a Z_STREAM_END is return if the end of the compressed data has been reached and all uncompressed out put has been produced.
Subclass of Zlib::Error
When zlib returns a Z_STREAM_ERROR, usually if the stream state was inconsistent.