A String object has an arbitrary sequence of bytes, typically representing text or binary data. A String object may be created using String::new
or as literals.
String
objects differ from Symbol
objects in that Symbol
objects are designed to be used as identifiers, instead of text or data.
You can create a String object explicitly with:
You can convert certain objects to Strings with:
Method String.
Some String methods modify self
. Typically, a method whose name ends with !
modifies self
and returns self
; often a similarly named method (without the !
) returns a new string.
In general, if there exist both bang and non-bang version of method, the bang! mutates and the non-bang! does not. However, a method without a bang can also mutate, such as String#replace
.
These methods perform substitutions:
String#sub
: One substitution (or none); returns a new string.
String#sub!
: One substitution (or none); returns self
.
String#gsub
: Zero or more substitutions; returns a new string.
String#gsub!
: Zero or more substitutions; returns self
.
Each of these methods takes:
A first argument, pattern
(string or regexp), that specifies the substring(s) to be replaced.
Either of these:
A second argument, replacement
(string or hash), that determines the replacing string.
A block that will determine the replacing string.
The examples in this section mostly use methods String#sub
and String#gsub
; the principles illustrated apply to all four substitution methods.
Argument pattern
Argument pattern
is commonly a regular expression:
s = 'hello' s.sub(/[aeiou]/, '*') # => "h*llo" s.gsub(/[aeiou]/, '*') # => "h*ll*" s.gsub(/[aeiou]/, '') # => "hll" s.sub(/ell/, 'al') # => "halo" s.gsub(/xyzzy/, '*') # => "hello" 'THX1138'.gsub(/\d+/, '00') # => "THX00"
When pattern
is a string, all its characters are treated as ordinary characters (not as regexp special characters):
'THX1138'.gsub('\d+', '00') # => "THX1138"
String replacement
If replacement
is a string, that string will determine the replacing string that is to be substituted for the matched text.
Each of the examples above uses a simple string as the replacing string.
String replacement
may contain back-references to the pattern’s captures:
\n
(n a non-negative integer) refers to $n
.
\k<name>
refers to the named capture name
.
See regexp.rdoc for details.
Note that within the string replacement
, a character combination such as $&
is treated as ordinary text, and not as a special match variable. However, you may refer to some special match variables using these combinations:
\&
and \0
correspond to $&
, which contains the complete matched text.
\'
corresponds to $'
, which contains string after match.
\`
corresponds to $`
, which contains string before match.
+
corresponds to $+
, which contains last capture group.
See regexp.rdoc for details.
Note that \\
is interpreted as an escape, i.e., a single backslash.
Note also that a string literal consumes backslashes. See String Literals for details about string literals.
A back-reference is typically preceded by an additional backslash. For example, if you want to write a back-reference \&
in replacement
with a double-quoted string literal, you need to write "..\\&.."
.
If you want to write a non-back-reference string \&
in replacement
, you need first to escape the backslash to prevent this method from interpreting it as a back-reference, and then you need to escape the backslashes again to prevent a string literal from consuming them: "..\\\\&.."
.
You may want to use the block form to avoid a lot of backslashes.
Hash replacement
If argument replacement
is a hash, and pattern
matches one of its keys, the replacing string is the value for that key:
h = {'foo' => 'bar', 'baz' => 'bat'} 'food'.sub('foo', h) # => "bard"
Note that a symbol key does not match:
h = {foo: 'bar', baz: 'bat'} 'food'.sub('foo', h) # => "d"
Block
In the block form, the current match string is passed to the block; the block’s return value becomes the replacing string:
s = '@' '1234'.gsub(/\d/) {|match| s.succ! } # => "ABCD"
Special match variables such as $1
, $2
, $`
, $&
, and $'
are set appropriately.
First, what’s elsewhere. Class String:
Inherits from class Object.
Includes module Comparable.
Here, class String provides methods that are useful for:
::new
Returns a new string.
::try_convert
Returns a new string created from a given object.
String
Returns a string that is not frozen: self
, if not frozen; self.dup
otherwise.
Returns a string that is frozen: self
, if already frozen; self.freeze
otherwise.
freeze
Freezes self
, if not already frozen; returns self
.
Counts
empty?
Returns true
if self.length
is zero; false
otherwise.
bytesize
Returns the count of bytes.
count
Returns the count of substrings matching given strings.
Substrings
index
Returns the index of the first occurrence of a given substring; returns nil
if none found.
rindex
Returns the index of the last occurrence of a given substring; returns nil
if none found.
include?
Returns true
if the string contains a given substring; false
otherwise.
start_with?
Returns true
if the string begins with any of the given substrings.
end_with?
Returns true
if the string ends with any of the given substrings.
Encodings
unicode_normalized?
Returns true
if the string is in Unicode normalized form; false
otherwise.
valid_encoding?
Returns true
if the string contains only characters that are valid for its encoding.
ascii_only?
Returns true
if the string has only ASCII characters; false
otherwise.
Other
sum
Returns a basic checksum for the string: the sum of each byte.
hash
Returns the integer hash code.
Returns true
if a given other string has the same content as self
.
eql?
Returns true
if the content is the same as the given other string.
Returns -1, 0, or 1 as a given other string is smaller than, equal to, or larger than self
.
casecmp
Ignoring case, returns -1, 0, or 1 as a given other string is smaller than, equal to, or larger than self
.
casecmp?
Returns true
if the string is equal to a given string after Unicode case folding; false
otherwise.
Each of these methods modifies self
.
Insertion
insert
Returns self
with a given string inserted at a given offset.
<<
Returns self
concatenated with a given string or integer.
Substitution
sub!
Replaces the first substring that matches a given pattern with a given replacement string; returns self
if any changes, nil
otherwise.
gsub!
Replaces each substring that matches a given pattern with a given replacement string; returns self
if any changes, nil
otherwise.
replace
Returns self
with its entire content replaced by a given string.
reverse!
Returns self
with its characters in reverse order.
setbyte
Sets the byte at a given integer offset to a given value; returns the argument.
tr!
Replaces specified characters in self
with specified replacement characters; returns self
if any changes, nil
otherwise.
tr_s!
Replaces specified characters in self
with specified replacement characters, removing duplicates from the substrings that were modified; returns self
if any changes, nil
otherwise.
Casing
capitalize!
Upcases the initial character and downcases all others; returns self
if any changes, nil
otherwise.
downcase!
Downcases all characters; returns self
if any changes, nil
otherwise.
upcase!
Upcases all characters; returns self
if any changes, nil
otherwise.
swapcase!
Upcases each downcase character and downcases each upcase character; returns self
if any changes, nil
otherwise.
Encoding
encode!
Returns self
with all characters transcoded from one given encoding into another.
unicode_normalize!
Unicode-normalizes self
; returns self
.
scrub!
Replaces each invalid byte with a given character; returns self
.
force_encoding
Changes the encoding to a given encoding; returns self
.
Deletion
clear
Removes all content, so that self
is empty; returns self
.
squeeze!
Removes contiguous duplicate characters; returns self
.
delete!
Removes characters as determined by the intersection of substring arguments.
lstrip!
Removes leading whitespace; returns self
if any changes, nil
otherwise.
rstrip!
Removes trailing whitespace; returns self
if any changes, nil
otherwise.
strip!
Removes leading and trailing whitespace; returns self
if any changes, nil
otherwise.
chomp!
Removes trailing record separator, if found; returns self
if any changes, nil
otherwise.
chop!
Removes trailing whitespace if found, otherwise removes the last character; returns self
if any changes, nil
otherwise.
Each of these methods returns a new String based on self
, often just a modified copy of self
.
Extension
*
Returns the concatenation of multiple copies of self
,
+
Returns the concatenation of self
and a given other string.
center
Returns a copy of self
centered between pad substring.
concat
Returns the concatenation of self
with given other strings.
prepend
Returns the concatenation of a given other string with self
.
ljust
Returns a copy of self
of a given length, right-padded with a given other string.
rjust
Returns a copy of self
of a given length, left-padded with a given other string.
Encoding
b
Returns a copy of self
with ASCII-8BIT encoding.
scrub
Returns a copy of self
with each invalid byte replaced with a given character.
unicode_normalize
Returns a copy of self
with each character Unicode-normalized.
encode
Returns a copy of self
with all characters transcoded from one given encoding into another.
Substitution
dump
Returns a copy of +self with all non-printing characters replaced by xHH notation and all special characters escaped.
undump
Returns a copy of +self with all \xNN
notation replace by \uNNNN
notation and all escaped characters unescaped.
sub
Returns a copy of self
with the first substring matching a given pattern replaced with a given replacement string;.
gsub
Returns a copy of self
with each substring that matches a given pattern replaced with a given replacement string.
reverse
Returns a copy of self
with its characters in reverse order.
tr
Returns a copy of self
with specified characters replaced with specified replacement characters.
tr_s
Returns a copy of self
with specified characters replaced with specified replacement characters, removing duplicates from the substrings that were modified.
%
Returns the string resulting from formatting a given object into self
Casing
capitalize
Returns a copy of self
with the first character upcased and all other characters downcased.
downcase
Returns a copy of self
with all characters downcased.
upcase
Returns a copy of self
with all characters upcased.
swapcase
Returns a copy of self
with all upcase characters downcased and all downcase characters upcased.
Deletion
delete
Returns a copy of self
with characters removed
delete_prefix
Returns a copy of self
with a given prefix removed.
delete_suffix
Returns a copy of self
with a given suffix removed.
lstrip
Returns a copy of self
with leading whitespace removed.
rstrip
Returns a copy of self
with trailing whitespace removed.
strip
Returns a copy of self
with leading and trailing whitespace removed.
chomp
Returns a copy of self
with a trailing record separator removed, if found.
chop
Returns a copy of self
with trailing whitespace or the last character removed.
squeeze
Returns a copy of self
with contiguous duplicate characters removed.
byteslice
Returns a substring determined by a given index, start/length, or range.
chr
Returns the first character.
Duplication
to_s
, $to_str
If self
is a subclass of String, returns self
copied into a String; otherwise, returns self
.
Each of these methods converts the contents of self
to a non-String.
Characters, Bytes, and Clusters
bytes
Returns an array of the bytes in self
.
chars
Returns an array of the characters in self
.
codepoints
Returns an array of the integer ordinals in self
.
getbyte
Returns an integer byte as determined by a given index.
grapheme_clusters
Returns an array of the grapheme clusters in self
.
Splitting
lines
Returns an array of the lines in self
, as determined by a given record separator.
partition
Returns a 3-element array determined by the first substring that matches a given substring or regexp,
rpartition
Returns a 3-element array determined by the last substring that matches a given substring or regexp,
split
Returns an array of substrings determined by a given delimiter – regexp or string – or, if a block given, passes those substrings to the block.
Matching
scan
Returns an array of substrings matching a given regexp or string, or, if a block given, passes each matching substring to the block.
unpack
Returns an array of substrings extracted from self
according to a given format.
unpack1
Returns the first substring extracted from self
according to a given format.
Numerics
hex
Returns the integer value of the leading characters, interpreted as hexadecimal digits.
oct
Returns the integer value of the leading characters, interpreted as octal digits.
ord
Returns the integer ordinal of the first character in self
.
to_i
Returns the integer value of leading characters, interpreted as an integer.
to_f
Returns the floating-point value of leading characters, interpreted as a floating-point number.
Strings and Symbols
inspect
Returns copy of self
, enclosed in double-quotes, with special characters escaped.
each_byte
Calls the given block with each successive byte in self
.
each_char
Calls the given block with each successive character in self
.
each_codepoint
Calls the given block with each successive integer codepoint in self
.
each_grapheme_cluster
Calls the given block with each successive grapheme cluster in self
.
each_line
Calls the given block with each successive line in self
, as determined by a given record separator.
Returns the string being scanned.
Returns a frozen copy of the string passed in to match
.
m = /(.)(.)(\d+)(\d)/.match("THX1138.") m.string #=> "THX1138."
Returns arg as a String
.
First tries to call its to_str
method, then its to_s
method.
String(self) #=> "main" String(self.class) #=> "Object" String(123456) #=> "123456"
Returns a copy of the receiver with leading and trailing whitespace removed.
Whitespace is defined as any of the following characters: null, horizontal tab, line feed, vertical tab, form feed, carriage return, space.
" hello ".strip #=> "hello" "\tgoodbye\r\n".strip #=> "goodbye" "\x00\t\n\v\f\r ".strip #=> "" "hello".strip #=> "hello"
Returns a copy of the receiver with leading whitespace removed. See also String#rstrip
and String#strip
.
Refer to String#strip
for the definition of whitespace.
" hello ".lstrip #=> "hello " "hello".lstrip #=> "hello"
Returns a copy of the receiver with trailing whitespace removed. See also String#lstrip
and String#strip
.
Refer to String#strip
for the definition of whitespace.
" hello ".rstrip #=> " hello" "hello".rstrip #=> "hello"
Removes leading and trailing whitespace from the receiver. Returns the altered receiver, or nil
if there was no change.
Refer to String#strip
for the definition of whitespace.
" hello ".strip! #=> "hello" "hello".strip! #=> nil
Removes leading whitespace from the receiver. Returns the altered receiver, or nil
if no change was made. See also String#rstrip!
and String#strip!
.
Refer to String#strip
for the definition of whitespace.
" hello ".lstrip! #=> "hello " "hello ".lstrip! #=> nil "hello".lstrip! #=> nil
Removes trailing whitespace from the receiver. Returns the altered receiver, or nil
if no change was made. See also String#lstrip!
and String#strip!
.
Refer to String#strip
for the definition of whitespace.
" hello ".rstrip! #=> " hello" " hello".rstrip! #=> nil "hello".rstrip! #=> nil
Returns the Integer index of the last occurrence of the given substring
, or nil
if none found:
'foo'.rindex('f') # => 0 'foo'.rindex('o') # => 2 'foo'.rindex('oo') # => 1 'foo'.rindex('ooo') # => nil
Returns the Integer index of the last match for the given Regexp regexp
, or nil
if none found:
'foo'.rindex(/f/) # => 0 'foo'.rindex(/o/) # => 2 'foo'.rindex(/oo/) # => 1 'foo'.rindex(/ooo/) # => nil
The last match means starting at the possible last position, not the last of longest matches.
'foo'.rindex(/o+/) # => 2 $~ #=> #<MatchData "o">
To get the last longest match, needs to combine with negative lookbehind.
'foo'.rindex(/(?<!o)o+/) # => 1 $~ #=> #<MatchData "oo">
Or String#index
with negative lookforward.
'foo'.index(/o+(?!.*o)/) # => 1 $~ #=> #<MatchData "oo">
Integer argument offset
, if given and non-negative, specifies the maximum starting position in the
string to _end_ the search: 'foo'.rindex('o', 0) # => nil 'foo'.rindex('o', 1) # => 1 'foo'.rindex('o', 2) # => 2 'foo'.rindex('o', 3) # => 2
If offset
is a negative Integer, the maximum starting position in the string to end the search is the sum of the string’s length and offset
:
'foo'.rindex('o', -1) # => 2 'foo'.rindex('o', -2) # => 1 'foo'.rindex('o', -3) # => nil 'foo'.rindex('o', -4) # => nil
Related: String#index
.
Returns the Encoding
object that represents the encoding of obj.
Inserts the given other_string
into self
; returns self
.
If the Integer index
is positive, inserts other_string
at offset index
:
'foo'.insert(1, 'bar') # => "fbaroo"
If the Integer index
is negative, counts backward from the end of self
and inserts other_string
at offset index+1
(that is, after self[index]
):
'foo'.insert(-2, 'bar') # => "fobaro"
Returns the count of characters (not bytes) in self
:
"\x80\u3042".length # => 2 "hello".length # => 5
String#size
is an alias for String#length
.
Related: String#bytesize
.
Returns the Integer index of the first occurrence of the given substring
, or nil
if none found:
'foo'.index('f') # => 0 'foo'.index('o') # => 1 'foo'.index('oo') # => 1 'foo'.index('ooo') # => nil
Returns the Integer index of the first match for the given Regexp regexp
, or nil
if none found:
'foo'.index(/f/) # => 0 'foo'.index(/o/) # => 1 'foo'.index(/oo/) # => 1 'foo'.index(/ooo/) # => nil
Integer argument offset
, if given, specifies the position in the string to begin the search:
'foo'.index('o', 1) # => 1 'foo'.index('o', 2) # => 2 'foo'.index('o', 3) # => nil
If offset
is negative, counts backward from the end of self
:
'foo'.index('o', -1) # => 2 'foo'.index('o', -2) # => 1 'foo'.index('o', -3) # => 1 'foo'.index('o', -4) # => nil
Related: String#rindex
.
Returns a printable version of self
, enclosed in double-quotes, and with special characters escaped:
s = "foo\tbar\tbaz\n" # => "foo\tbar\tbaz\n" s.inspect # => "\"foo\\tbar\\tbaz\\n\""
Returns an array of lines in str split using the supplied record separator ($/
by default). This is a shorthand for str.each_line(separator, getline_args).to_a
.
If chomp
is true
, separator
will be removed from the end of each line.
"hello\nworld\n".lines #=> ["hello\n", "world\n"] "hello world".lines(' ') #=> ["hello ", " ", "world"] "hello\nworld\n".lines(chomp: true) #=> ["hello", "world"]
If a block is given, which is a deprecated form, works the same as each_line
.
Returns an array of the Integer
ordinals of the characters in str. This is a shorthand for str.each_codepoint.to_a
.
If a block is given, which is a deprecated form, works the same as each_codepoint
.
Returns the Symbol
corresponding to str, creating the symbol if it did not previously exist. See Symbol#id2name
.
"Koala".intern #=> :Koala s = 'cat'.to_sym #=> :cat s == :cat #=> true s = '@cat'.to_sym #=> :@cat s == :@cat #=> true
This can also be used to create symbols that cannot be represented using the :xxx
notation.
'cat and dog'.to_sym #=> :"cat and dog"
Returns true
if self
contains other_string
, false
otherwise:
s = 'foo' s.include?('f') # => true s.include?('fo') # => true s.include?('food') # => false
If integer is greater than the length of str, returns a new String
of length integer with str left justified and padded with padstr; otherwise, returns str.
"hello".ljust(4) #=> "hello" "hello".ljust(20) #=> "hello " "hello".ljust(20, '1234') #=> "hello123412341234123"
If integer is greater than the length of str, returns a new String
of length integer with str right justified and padded with padstr; otherwise, returns str.
"hello".rjust(4) #=> "hello" "hello".rjust(20) #=> " hello" "hello".rjust(20, '1234') #=> "123412341234123hello"
Returns a copy of str
with the characters in from_str
replaced by the corresponding characters in to_str
. If to_str
is shorter than from_str
, it is padded with its last character in order to maintain the correspondence.
"hello".tr('el', 'ip') #=> "hippo" "hello".tr('aeiou', '*') #=> "h*ll*" "hello".tr('aeiou', 'AA*') #=> "hAll*"
Both strings may use the c1-c2
notation to denote ranges of characters, and from_str
may start with a ^
, which denotes all characters except those listed.
"hello".tr('a-y', 'b-z') #=> "ifmmp" "hello".tr('^aeiou', '*') #=> "*e**o"
The backslash character \
can be used to escape ^
or -
and is otherwise ignored unless it appears at the end of a range or the end of the from_str
or to_str
:
"hello^world".tr("\\^aeiou", "*") #=> "h*ll**w*rld" "hello-world".tr("a\\-eo", "*") #=> "h*ll**w*rld" "hello\r\nworld".tr("\r", "") #=> "hello\nworld" "hello\r\nworld".tr("\\r", "") #=> "hello\r\nwold" "hello\r\nworld".tr("\\\r", "") #=> "hello\nworld" "X['\\b']".tr("X\\", "") #=> "['b']" "X['\\b']".tr("X-\\]", "") #=> "'b'"
Translates str in place, using the same rules as String#tr
. Returns str, or nil
if no changes were made.