StringClass
- Namespace
- Ancestors
-
-
Comparable
-
Data
Object
-
A U::String is a sequence of zero or more Unicode characters encoded as UTF-8. It’s interface is an extension of that of Ruby’s built-in String class that provides better Unicode support, as it handles things such as casing, width, collation, and various other Unicode properties that Ruby’s built-in String class simply doesn’t bother itself with. It also provides “backwards compatibility” with Ruby 1.8.7 so that you can use Unicode without upgrading to Ruby 2.0 (which you probably should do, though).
It differs from Ruby’s built-in String class in one other very important way in that it doesn’t provide any way to change an existing object. That is, a U::String is a value object.
A U::String is most easily created from a String by calling String#u. Most U::String methods that return a stringy result will return a U::String, so you only have to do that once. You can get back a String by calling U::String#to_str.
Validation of a U::String’s content isn’t performed until any access to it is made, at which time an ArgumentError will be raised if it isn’t valid.
U::String has a lot of methods defined upon it, so let’s break them up into categories to get a proper overview of what’s possible to do with one. Let’s begin with the interrogators. There are three kinds of interrogators, validity-checking ones, property-checking ones, and content-matching ones.
The validity-checking interrogator is #valid_encoding?, which makes sure that the UTF-8 sequence itself is valid.
The property-checking interrogators are #alnum?, #alpha?, #ascii_only?, #assigned?, #case_ignorable?, #cased?, #cntrl?, #defined?, #digit?, #graph?, #newline?, #print?, #punct?, #soft_dotted?, #space?, #title?, #valid?, #wide?, #wide_cjk?, #xdigit?, and #zero_width?. These interrogators check the corresponding Unicode property of each characters in the U::String and if all characters have this property, they’ll return true.
Very close relatives to the property-checking interrogators are #folded?, #lower?, and #upper?, which check whether a string has been cased in a given way, and #normalized?, which checks whether the receiver has been normalized, optionally to a specific normalization form.
The content-matching interrogators are #==
, #===
, #=~
, #match,
#empty?, #end_with?, #eql?, #include?, #index, #rindex, and
#start_with?. These interrogators check that a substring of the U::String
matches another string or Regexp and either return a Boolean result, and
index into the U::String where the match begins or MatchData for full
matching information.
Related to the content-matching interrogators are #<=>
, #casecmp, and
#collation_key, all of which compare a U::String against another for
ordering.
Related to the property-checking interrogators are #canonical_combining_class, #general_category, #grapheme_break, #line_break, #script, and #word_break, which return the value of the Unicode property in question, the general category being the one often interrogated.
There are a couple of other “interrogators” in #bytesize, #length, #size, #width that return integer properties of the U::String as a whole, where #length and #width are probably the most useful.
Beyond interrogators there are quite a few methods for iterating over the content of a U::String, each viewing it in its own way: #each_byte, #each_char, #each_codepoint, #each_grapheme_cluster, #each_line, and #each_word. They all have respective methods (#bytes, #chars, #codepoints, #grapheme_clusters, #lines, #words) that return an Array instead of yielding each result.
Quite a few methods are devoted to extracting a substring of a U::String,
namely #[]
, #slice, #byteslice, #chomp, #chop, #chr, #getbyte,
#lstrip, #ord, #rstrip, #strip.
There are a few methods for case-shifting: #downcase, #foldcase, #titlecase, and #upcase. Then there’s #mirror, #normalize, and #reverse that alter the string in other ways.
The methods #center, #ljust, and #rjust pad a U::String to make it a certain number of cells wide.
Then there’s a couple of methods that are more related in the arguments they take than in function: #count, #delete, #squeeze, #tr, and #tr_s. These methods all take specifications of character/code point ranges that should be counted, deleted, squeezed, and translated (plus squeezed).
Deconstructing a U::String can be done with #partition and #rpartition, which splits it around a divider, #scan, which extracts matches to a pattern, #split, which splits it on a divider.
Substitution of all matches to a pattern can be made with #gsub and of the first match to a pattern with #sub.
Creating larger U::Strings from smaller ones is done with #+
, which
concatenates two of them, and #*
, which concatenates a U::String to itself
a number of times.
A U::String can also be used as a specification as to how to format a number
of values via #%
(and its alias #format) into a new U::String, much like
snprintf(3) in C.
The content of a U::String can be #dumped and #inspected to make it reader-friendly, but also debugger-friendly.
Finally, a U::String has a few methods to turn its content into other values: #hash, which turns it into a hash value to be used for hashing, #hex, #oct, #to_i, which turn it into a Integer, #to_str, #to_s, #b, which turn it into a String, and #to_sym (and its alias #intern), which turns it into a Symbol.
Note that some methods defined on String are missing. #Capitalize doesn’t exist, as capitalization isn’t a Unicode concept. #Sum doesn’t exist, as a U::String generally doesn’t contain content that you need a checksum of. #Crypt doesn’t exist for similar reasons. #Swapcase isn’t useful on a String and it certainly isn’t useful in a Unicode context. As a U::String doesn’t contain arbitrary data, #unpack is left to String. #Next/#succ would perhaps be implementable, but haven’t, as a satisfactory implementation hasn’t been thought of.
Constructor
initialize(stringString? = nil
)#⚙
Sets up a U::String wrapping string after encoding it as UTF-8 and freezing it.
Instance Methods
uself#⚙
Returns the receiver; mostly for completeness, but allows you to always call #u on something that’s either a String or a U::String.
valid_encoding?Boolean#⚙
Returns true if the receiver contains only valid UTF-8 sequences.
alnum?Boolean#⚙
Returns true if the receiver contains only characters in the general categories Letter and Number.
alpha?Boolean#⚙
Returns true if the receiver contains only characters in the general category Alpha.
ascii_only?Boolean#⚙
Returns true if the receiver contains only characters in the ASCII region, that is, U+0000 through U+007F.
assigned?Boolean#⚙
Returns true if the receiver contains only code points that have been assigned a code value.
case_ignorable?Boolean#⚙
Returns true if the receiver contains only “case ignorable” characters, that is, characters in the general categories
Other, format (Cf)
Letter, modifier (Lm)
Mark, enclosing (Me)
Mark, nonspacing (Mn)
Symbol, modifier (Sk)
and the characters
U+0027 APOSTROPHE
U+00AD SOFT HYPHEN
U+2019 RIGHT SINGLE QUOTATION MARK.
cased?Boolean#⚙
Returns true if the receiver only contains characters in the general categories
Letter, uppercase (Lu)
Letter, lowercase (Ll)
Letter, titlecase (Lt)
or has the derived properties Other_Uppercase or Other_Lowercase.
cntrl?Boolean#⚙
Returns true if the receiver contains only characters in the general category Other, control (Cc).
defined?Boolean#⚙
Returns true if the receiver contains only characters not in the general categories Other, not assigned (Cn) and Other, surrogate (Cs).
digit?Boolean#⚙
Returns true if the receiver contains only characters in the general category Number, decimal digit (Nd).
folded?(locale#to_str = ENV[LC_CTYPE]
)Boolean#⚙
Returns true if the receiver has been case-folded according to
the rules of the language of locale, which may be empty to specifically
use the default, language-independent, rules, that is, if a =
a#foldcase(locale), where a = #normalize(:nfd
).
graph?Boolean#⚙
Returns true if the receiver contains only non-space “printable” characters.
Non-space “printable” character are those not in the general categories Other or Space, separator (Zs):
Other, control (Cc)
Other, format (Cf)
Other, not assigned (Cn)
Other, surrogate (Cs)
Space, separator (Zs)
lower?(locale#to_str = ENV[LC_CTYPE]
)Boolean#⚙
Returns true if the receiver has been downcased according to the
rules of the language of locale, which may be empty to specifically use
the default, language-independent, rules, that is, if a =
a#downcase(locale), where a = #normalize(:nfd
).
newline?Boolean#⚙
Returns true if the receiver contains only “newline” characters. A character is a “newline” character if it is any of the following characters:
U+000A (LINE FEED (LF))
U+000C (FORM FEED (FF))
U+000D (CARRIAGE RETURN (CR))
U+0085 (NEXT LINE)
U+2028 (LINE SEPARATOR)
U+2029 (PARAGRAPH SEPARATOR)
print?Boolean#⚙
Returns true if the receiver contains only characters not in the general category Other.
punct?Boolean#⚙
Returns true if the receiver contains only characters in the general categories Punctuation and Symbol.
soft_dotted?Boolean#⚙
Returns true if this U::String only contains soft-dotted characters.
- Note
-
Soft-dotted characters have the soft-dotted property and thus lose their dot if an accent is applied to them, for example, ‘i’ and ‘j’.
- See Also
space?Boolean#⚙
Returns true if the receiver contains only “space” characters. Space characters are those in the general category Separator:
Separator, space (Zs)
Separator, line (Zl)
Separator, paragraph (Zp)
such as ‘ ’, or a control character acting as such, namely
U+0009 CHARACTER TABULATION (HT)
U+000A LINE FEED (LF)
U+000C FORM FEED (FF)
U+000D CARRIAGE RETURN (CR)
title?Boolean#⚙
Returns true if the receiver contains only characters in the general category Letter, Titlecase (Lt).
upper?(locale#to_str = ENV[LC_CTYPE]
)Boolean#⚙
Returns true if the receiver has been upcased according to the
rules of the language of locale, which may be empty to specifically use
the default, language-independent, rules, that is, if a =
a#upcase(locale), where a = #normalize(:nfd
).
valid?Boolean#⚙
Returns true if the receiver contains only valid Unicode characters.
wide?Boolean#⚙
Returns true if the receiver contains only “wide” characters. Wide character are those that have their East_Asian_Width property set to Wide or Fullwidth.
This is mostly useful for determining how many “cells” a character will take up on a terminal or similar cell-based display.
- See Also
wide_cjk?Boolean#⚙
Returns true if the receiver contains only “wide” and “ambiguously wide” characters. Wide and ambiguously wide character are those that have their East_Asian_Width property set to Ambiguous, Wide or Fullwidth.
This is mostly useful for determining how many “cells” a character will take up on a terminal or similar cell-based display.
- See Also
xdigit?Boolean#⚙
Returns true if the receiver contains only characters in the general category Number, decimal digit (Nd) or is a lower- or uppercase letter between ‘a’ and ‘f’. Specifically, any character that
Belongs to the general category Number, decimal digit (Nd)
Falls in the range U+0041 (LATIN CAPITAL LETTER A) through U+0046 (LATIN CAPITAL LETTER F)
Falls in the range U+0061 (LATIN SMALL LETTER A) through U+0066 (LATIN SMALL LETTER F)
Falls in the range U+FF21 (FULLWIDTH LATIN CAPITAL LETTER A) through U+FF26 (FULLWIDTH LATIN CAPITAL LETTER F)
Falls in the range U+FF41 (FULLWIDTH LATIN SMALL LETTER A) through U+FF46 (FULLWIDTH LATIN SMALL LETTER F)
will do.
zero_width?Boolean#⚙
Returns true if the receiver contains only “zero-width” characters. A zero-width character is defined as a character in the general categories Mark, nonspacing (Mn), Mark, enclosing (Me) or Other, format (Of), excluding the character U+00AD (SOFT HYPHEN), or is a Hangul character between U+1160 and U+1200 or U+200B (ZERO WIDTH SPACE).
normalized?(mode#to_sym = :default
)Boolean#⚙
Returns true if it can be determined that the receiver is normalized according to mode.
See #normalize for a discussion on normalization and a list of the possible normalization modes.
==
(otherU::String, #to_str)Boolean#⚙
Returns true if the receiver’s bytes equal those of other.
- See Also
===
(otherU::String, #to_str)Boolean#⚙
This is an alias for #==
.
=~
(otherRegexp, #=~
)Numeric?#⚙
Returns the result of other#=~
(self), that is, the index
of the first character of the match of other in the receiver, if one
exists.
- RaisesTypeError
-
If other is a U::String or String
match(patternRegexp, #to_str, index#to_int = 0
)MatchData?#⚙
Returns the result of r#match(self, index), that is, the match data of the first match of r in the receiver, inheriting any taint and untrust from both the receiver and from pattern, if one exists, where r = pattern, if pattern is a Regexp, r = Regexp.new(pattern) otherwise.
match(patternRegexp, #to_str, index#to_int = 0
){ |matchdataMatchData| … }Object?#⚙
Returns the result of calling the given block with the result of r#match(self, index), that is, the match data of the first match of r in the receiver, inheriting any taint and untrust from both the recevier and from pattern, if one exists, where r = pattern, if pattern is a Regexp, r = Regexp.new(pattern) otherwise.
empty?Boolean#⚙
Returns true if #bytesize = 0.
end_with?(*suffixesArray)Boolean#⚙
Returns true if any element of suffixes that responds to #to_str is a byte-level suffix of the receiver.
eql?(otherU::String)Boolean#⚙
Returns true if the receiver’s bytes equal those of other.
- See Also
include?(substring#to_str)Boolean#⚙
Returns true if #index(substring) ≠ nil.
index(patternRegexp, #to_str, offset#to_int = 0
)Integer?#⚙
Returns the minimal index of the receiver where pattern matches, equal to or greater than i, where i = offset if offset ≥ 0, i = #length - abs(offset) otherwise, or nil if there is no match.
If pattern is a Regexp, the Regexp special variables $&
, $'
,
$`
, $1
, $2
, …, $
n are updated accordingly.
If pattern responds to #to_str, the matching is performed by byte comparison.
- See Also
rindex(patternRegexp, #to_str, offset#to_int = -1
)Integer?#⚙
Returns the maximal index of the receiver where pattern matches, equal to or less than i, where i = offset if offset ≥ 0, i = #length - abs(offset) otherwise, or nil if there is no match.
If pattern is a Regexp, the Regexp special variables $&
, $'
,
$`
, $1
, $2
, …, $
n are updated accordingly.
If pattern responds to #to_str
, the matching is performed by a byte
comparison.
- See Also
start_with?(*prefixesArray)Boolean#⚙
Returns true if any element of prefixes that responds to #to_str is a byte-level prefix of the receiver.
<=>
(otherU::String, #to_str, locale#to_str = ENV['LC_COLLATE']
)Fixnum#⚙
Returns the comparison of the receiver and other using the linguistically correct rules of locale. The locale must be given as a language, region, and encoding, for example, “en_US.UTF-8”.
This operation is known as “collation” and you can find more information about the collation algorithm employed in the Unicode Technical Standard #10, see http://unicode.org/reports/tr10/.
- RaisesErrno::EILSEQ
-
If a character in the receiver can’t be converted into the encoding of the locale
- See Also
casecmp(otherU::String, #to_str, locale#to_str = ENV['LC_COLLATE']
)Fixnum#⚙
Returns the comparison of #foldcase to other#foldcase using the linguistically correct rules of locale. This is, however, only an approximation of a case-insensitive comparison. The locale must be given as a language, region, and encoding, for example, “en_US.UTF-8”.
This operation is known as “collation” and you can find more information about the collation algorithm employed in the Unicode Technical Standard #10, see http://unicode.org/reports/tr10/.
collation_key(locale)U::String#⚙
Returns the locale-dependent collation key of the receiver in locale, inheriting any taint and untrust.
- Note
-
-
Use the collation key when comparing U::Strings to each other repeatedly, as occurs when, for example, sorting a list of U::Strings.
-
The locale must be given as a language, region, and encoding, for example, “en_US.UTF-8”.
-
- RaisesErrno::EILSEQ
-
If a character in the receiver can’t be converted into the encoding of the locale
canonical_combining_classFixnum#⚙
Returns the canonical combining class of the characters of the receiver.
The canonical combining class of a character is a number in the range [0, 254]. The canonical combining class is used when generating a canonical ordering of the characters in a string.
The empty string has a canonical combining class of 0.
- Raises
-
- ArgumentError
-
If the receiver contains two characters belonging to different combining classes
- ArgumentError
-
If the receiver contains an incomplete UTF-8 sequence
- ArgumentError
-
If the receiver contains an invalid UTF-8 sequence
general_categorySymbol#⚙
Returns the general category of the characters of the receiver.
The general category identifies what kind of symbol the character is.
Category Major, minor | Unicode Value | Ruby Value |
---|---|---|
Other, control | Cc | :other_control |
Other, format | Cf | :other_format |
Other, not assigned | Cn | :other_not_assigned |
Other, private use | Co | :other_private_use |
Other, surrogate | Cs | :other_surrogate |
Letter, lowercase | Ll | :letter_lowercase |
Letter, modifier | Lm | :letter_modifier |
Letter, other | Lo | :letter_other |
Letter, titlecase | Lt | :letter_titlecase |
Letter, uppercase | Lu | :letter_uppercase |
Mark, spacing combining | Mc | :mark_spacing_combining |
Mark, enclosing | Me | :mark_enclosing |
Mark, nonspacing | Mn | :mark_non_spacing |
Number, decimal digit | Nd | :number_decimal |
Number, letter | Nl | :number_letter |
Number, other | No | :number_other |
Punctuation, connector | Pc | :punctuation_connector |
Punctuation, dash | Pd | :punctuation_dash |
Punctuation, close | Pe | :punctuation_close |
Punctuation, final quote | Pf | :punctuation_final_quote |
Punctuation, initial quote | Pi | :punctuation_initial_quote |
Punctuation, other | Po | :punctuation_other |
Punctuation, open | Ps | :punctuation_open |
Symbol, currency | Sc | :symbol_currency |
Symbol, modifier | Sk | :symbol_modifier |
Symbol, math | Sm | :symbol_math |
Symbol, other | So | :symbol_other |
Separator, line | Zl | :separator_line |
Separator, paragraph | Zp | :separator_paragraph |
Separator, space | Zs | :separator_space |
- Raises
-
- ArgumentError
-
If the receiver contains two characters belonging to different general categories
- ArgumentError
-
If the receiver contains an incomplete UTF-8 sequence
- ArgumentError
-
If the receiver contains an invalid UTF-8 sequence
- See Also
-
Unicode Technical Note #36: A Categorization of Unicode Characters
grapheme_breakSymbol#⚙
Returns the grapheme break property value of the characters of the receiver.
The possible break values are
:control
:cr
:extend
:l
:lf
:lv
:lvt
:other
:prepend
:regional_indicator
:spacingmark
:t
:v
- RaisesArgumentError
-
If the string consists of more than one break type
- See Also
line_breakSymbol#⚙
Returns the line break property value of the characters of the receiver.
The possible break values are
:after
:alphabetic
:ambiguous
:before
:before_and_after
:carriage_return
:close_parenthesis
:close_punctuation
:combining_mark
:complex_context
:conditional_japanese_starter
:contingent
:exclamation
:hangul_l_jamo
:hangul_lv_syllable
:hangul_lvt_syllable
:hangul_t_jamo
:hangul_v_jamo
:hebrew_letter
:hyphen
:ideographic
:infix_separator
:inseparable
:line_feed
:mandatory
:next_line
:non_breaking_glue
:non_starter
:numeric
:open_punctuation
:postfix
:prefix
:quotation
:regional_indicator
:space
:surrogate
:symbol
:unknown
:word_joiner
:zero_width_space
- RaisesArgumentError
-
If the string consists of more than one break type
- See Also
scriptSymbol#⚙
Returns the script of the characters of the receiver.
The script of a character identifies the primary writing system that uses the character.
Script | Description |
---|---|
:arabic | Arabic |
:armenian | Armenian |
:avestan | Avestan |
:balinese | Balinese |
:bamum | Bamum |
:batak | Batak |
:bengali | Bengali |
:bopomofo | Bopomofo |
:brahmi | Brahmi |
:braille | Braille |
:buginese | Buginese |
:buhid | Buhid |
:canadian_aboriginal | Canadian Aboriginal |
:carian | Carian |
:chakma | Chakma |
:cham | Cham |
:cherokee | Cherokee |
:common | For other characters that may be used with multiple scripts |
:coptic | Coptic |
:cuneiform | Cuneiform |
:cypriot | Cypriot |
:cyrillic | Cyrillic |
:deseret | Deseret |
:devanagari | Devanagari |
:egyptian_hieroglyphs | Egyptian Hieroglpyhs |
:ethiopic | Ethiopic |
:georgian | Georgian |
:glagolitic | Glagolitic |
:gothic | Gothic |
:greek | Greek |
:gujarati | Gujarati |
:gurmukhi | Gurmukhi |
:han | Han |
:hangul | Hangul |
:hanunoo | Hanunoo |
:hebrew | Hebrew |
:hiragana | Hiragana |
:imperial_aramaic | Imperial Aramaic |
:inherited | For characters that may be used with multiple scripts, and that inherit their script from the preceding characters; these include nonspacing marks, enclosing marks, and the zero-width joiner/non-joiner characters |
:inscriptional_pahlavi | Inscriptional Pahlavi |
:inscriptional_parthian | Inscriptional Parthian |
:javanese | Javanese |
:kaithi | Kaithi |
:kannada | Kannada |
:katakana | Katakana |
:kayah_li | Kayah Li |
:kharoshthi | Kharoshthi |
:khmer | Khmer |
:lao | Lao |
:latin | Latin |
:lepcha | Lepcha |
:limbu | Limbu |
:linear_b | Linear B |
:lisu | Lisu |
:lycian | Lycian |
:lydian | Lydian |
:malayalam | Malayalam |
:mandaic | Mandaic |
:meetei_mayek | Meetei Mayek |
:meroitic_hieroglyphs | Meroitic Hieroglyphs |
:meroitic_cursive | Meroitic Cursives |
:miao | Miao |
:mongolian | Mongolian |
:myanmar | Myanmar |
:new_tai_lue | New Tai Lue |
:nko | N'Ko |
:ogham | Ogham |
:old_italic | Old Italic |
:old_persian | Old Persian |
:old_south_arabian | Old South Arabian |
:old_turkic | Old Turkic |
:ol_chiki | Ol Chiki |
:oriya | Oriya |
:osmanya | Osmanya |
:phags_pa | Phags-pa |
:phoenician | Phoenician |
:rejang | Rejang |
:runic | Runic |
:samaritan | Samaritan |
:saurashtra | Saurashtra |
:sharada | Sharada |
:shavian | Shavian |
:sinhala | Sinhala |
:sora_sompeng | Sora Sompeng |
:sundanese | Sundanese |
:syloti_nagri | Syloti Nagri |
:syriac | Syriac |
:tagalog | Tagalog |
:tagbanwa | Tagbanwa |
:tai_le | Tai Le |
:tai_tham | Tai Tham |
:tai_viet | Tai Viet |
:takri | Takri |
:tamil | Tamil |
:telugu | Telugu |
:thaana | Thaana |
:thai | Thai |
:tibetan | Tibetan |
:tifinagh | Tifinagh |
:ugaritic | Ugaritic |
:unknown | For not assigned, private-use, non-character, and surrogate code points |
:vai | Vai |
:yi | Yi |
- Raises
-
- ArgumentError
-
If the receiver contains two characters belonging to different scripts
- ArgumentError
-
If the receiver contains an incomplete UTF-8 sequence
- ArgumentError
-
If the receiver contains an invalid UTF-8 sequence
- See Also
word_breakSymbol#⚙
Returns the word break property value of the characters of the receiver.
The possible word break values are
:aletter
:cr
:extend
:extendnumlet
:format
:katakana
:lf
:midletter
:midnum
:midnumlet
:newline
:numeric
:other
:regional_indicator
- RaisesArgumentError
-
If the string consists of more than one break type
- See Also
bytesizeInteger#⚙
Returns the number of bytes required to represent the receiver.
lengthInteger#⚙
Returns the number of characters in the receiver.
sizeInteger#⚙
This is an alias for #length.
widthInteger#⚙
Returns the width of the receiver. The width is defined as the sum of the number of “cells” on a terminal or similar cell-based display that the characters in the string will require.
Characters that are #wide? have a width of 2. Characters that are #zero_width? have a width of 0. Other characters have a width of 1.
each_byte{ |byteFixnum| … }self#⚙
Enumerates the bytes in the receiver.
each_byteEnumerator#⚙
Returns an Enumerator over the bytes in the receiver.
bytesArray<
Fixnum>
#⚙
Returns the bytes of the receiver.
each_char{ |charU::String| … }self#⚙
Enumerates the characters in the receiver, each inheriting any taint and untrust.
each_charEnumerator#⚙
Returns an Enumerator over the characters in the receiver.
charsArray<
U::String>
#⚙
Returns the characters of the receiver, each inheriting any taint and untrust.
each_codepoint{ |codepointInteger| … }self#⚙
Enumerates the code points of the receiver.
each_codepointEnumerator#⚙
Returns an Enumerator over the code points of the receiver.
codepointsArray<
Integer>
#⚙
Returns the code points of the receiver.
each_grapheme_cluster{ |clusterU::String| … }self#⚙
Enumerates the grapheme clusters in the receiver, each inheriting any taint and untrust.
each_grapheme_clusterEnumerator#⚙
Returns an Enumerator over the grapheme clusters in the receiver.
grapheme_clusters{ |clusterU::String| … }self#⚙
This is an alias for #each_grapheme_cluster.
grapheme_clustersEnumerator#⚙
This is an alias for #each_grapheme_cluster.
each_line(separatorU::String, #to_str = $/
){ |lpU::String, self| … }self#⚙
Enumerates the lines of the receiver, inheriting any taint and untrust.
If separator is nil, yields self. If separator is #empty?, separates each line (paragraph) by two or more U+000A LINE FEED characters.
each_line(separatorU::String, #to_str = $/
)Enumerator#⚙
Returns an Enumerator over the lines of the receiver.
If separator is nil, self will be yielded. If separator is #empty?, separates each line (paragraph) by two or more U+000A LINE FEED characters.
lines(separatorU::String, #to_str = $/
)Array<
U::String>
#⚙
Returns the lines of the receiver, inheriting any taint and untrust.
If separator is nil, yields self. If separator is #empty?, separates each line (paragraph) by two or more U+000A LINE FEED characters.
each_word{ |wordU::String| … }self#⚙
Enumerates the words in the receiver, each inheriting any taint and untrust.
each_wordEnumerator#⚙
Returns an Enumerator over the characters in the receiver.
words{ |wordU::String| … }self#⚙
This is an alias for #each_word.
wordsEnumerator#⚙
This is an alias for #each_word.
[]
(index#to_int)U::String?#⚙
Returns the substring [max(i, 0), min(#length, i + 1)], where i = index if index ≥ 0, i = #length - abs(index) otherwise, inheriting any taint and untrust, or nil if this substring is empty.
[]
(index#to_int, length#to_int)U::String?#⚙
Returns the substring [max(i, 0), min(#length, i + length)], where i = index if index ≥ 0, i = #length - abs(index) otherwise, inheriting any taint or untrust, or nil if length < 0.
[]
(rangeRange)U::String?#⚙
Returns the result of #[i, j - k]
, where i =
range#begin if range#begin ≥ 0, i = #length - abs(range#begin)
otherwise, j = range#end if range#end ≥ 0, j = #length -
abs(range#end) otherwise, and k = 1 if range#exclude_end?, k = 0
otherwise, or nil if j - k < 0.
[]
(regexpRegexp, reference#to_int, #to_str, Symbol = 0
)U::String?#⚙
Returns the submatch reference from the first match of regexp in the receiver, inheriting any taint and untrust from both the receiver and from regexp, or nil if there is no match or if the submatch isn’t part of the overall match.
- RaisesIndexError
-
If reference doesn’t refer to a submatch
[]
(stringU::String, ::String)U::String?#⚙
Returns the substring string, inheriting any taint and untrust from string, if string is a substring of the receiver.
[]
(objectObject)nil#⚙
Returns nil for any object that doesn’t satisfy the other cases.
slice(index#to_int)U::String?#⚙
This is an alias for #[]
.
slice(index#to_int, length#to_int)U::String?#⚙
This is an alias for #[]
.
slice(rangeRange)U::String?#⚙
This is an alias for #[]
.
slice(regexpRegexp, reference#to_int, #to_str, Symbol = 0
)U::String?#⚙
This is an alias for #[]
.
slice(stringU::String, ::String)U::String?#⚙
This is an alias for #[]
.
slice(objectObject)nil#⚙
This is an alias for #[]
.
byteslice(index#to_int)U::String?#⚙
Returns the byte-index-based substring [max(i, 0), min(#bytesize, i + 1)], where i = index if index ≥ 0, i = #bytesize - abs(index) otherwise, inheriting any taint and untrust, or nil if this substring is empty.
byteslice(index#to_int, length#to_int)U::String?#⚙
Returns the byte-index-based substring [max(i, 0), min(#bytesize, i + length)], where i = index if index ≥ 0, i = #bytesize - abs(index) otherwise, inheriting any taint and untrust, or nil if length < 0.
byteslice(rangeRange)U::String?#⚙
Returns the result of #[i, j - k]
, where i =
range#begin if range#begin ≥ 0, i = #bytesize - abs(range#begin)
otherwise, j = range#end if range#end ≥ 0, j = #bytesize -
abs(range#end) otherwise, and k = 1 if range#exclude_end?, k = 0
otherwise, or nil if j - k < 0.
byteslice(objectObject)nil#⚙
Returns nil for any object that doesn’t satisfy the other cases.
chomp(separatorU::String, #to_str, nil = $/
)U::String, self, nil#⚙
Returns the receiver, minus any separator suffix, inheriting any taint and untrust, unless #length = 0, in which case nil is returned. If separator is nil or invalidly encoded, the receiver is returned.
If separator is $/
and $/
has its default value or if separator is
U+000A LINE FEED, the longest suffix consisting of any of
U+000A LINE FEED
U+000D CARRIAGE RETURN
U+000D CARRIAGE RETURN, U+000D LINE FEED
will be removed. If no such suffix exists and the last character is a #newline?, it will be removed instead.
If separator is #empty?, remove the longest #newline? suffix.
- See Also
chopU::String#⚙
Returns the receiver, minus its last character, inheriting any taint and untrust, unless the receiver is #empty? or if the last character is invalidly encoded, in which case the receiver is returned.
If the last character is U+000A LINE FEED and the second-to-last character is U+000D CARRIAGE RETURN, both characters are removed.
- See Also
chrU::String#⚙
Returns the substring [0, min(#length, 1)], inheriting any taint and untrust.
getbyte(index#to_int)Fixnum?#⚙
Returns the byte at byte-index i, where i = index if index ≥ 0, i = #bytesize - abs(index) otherwise, or nil if i lays outside of [0, #bytesize].
lstripU::String#⚙
Returns the receiver with its maximum #space? prefix removed, inheriting any taint and untrust.
- See Also
ordInteger#⚙
Returns the code point of the first character of the receiver.
rstripU::String#⚙
Returns the receiver with its maximum #space? suffix removed, inheriting any taint and untrust from the receiver.
- See Also
stripU::String#⚙
Returns the receiver with its maximum #space? prefix and suffix removed, inheriting any taint and untrust.
- See Also
downcase(locale#to_str = ENV['LC_CTYPE']
)U::String#⚙
Returns the downcasing of the receiver according to the rules of the language of locale, which may be empty to specifically use the default, language-independent, rules, inheriting any taint and untrust.
foldcase(locale#to_str = ENV['LC_CTYPE']
)U::String#⚙
Returns the case-folding of the receiver according to the rules of the language of locale, which may be empty to specifically use the default rules, inheriting any taint and untrust.
titlecase(locale#to_str = ENV['LC_CTYPE']
)U::String#⚙
Returns the title-casing of the receiver according to the rules of the language of locale, which may be empty to specifically use the default, language-independent, rules, inheriting any taint and untrust.
upcase(locale#to_str = ENV['LC_CTYPE']
)U::String#⚙
Returns the upcasing of the receiver according to the rules of of the language of locale, which may be empty to specifically use the default, language-independent, rules, inheriting any taint and untrust.
mirrorU::String#⚙
Returns the mirroring of the receiver, inheriting any taint and untrust.
Mirroring is done by replacing characters in the string with their horizontal mirror image, if any, in text that is laid out from right to left. For example, ‘(’ becomes ‘)’ and ‘)’ becomes ‘(’.
normalize(form#to_sym = :nfd
)U::String#⚙
Returns the receiver normalized into form, inheriting any taint and untrust.
Normalization is the process of converting characters and sequences of characters in string into a canonical form. This process includes dealing with whether characters are represented by a composed character or a base character and combining marks, such as accents.
The possible normalization forms are
Form | Description |
---|---|
:nfd |
Normalizes characters to their maximally decomposed form, ordering accents and so on according to their combining class |
:nfc |
Normalizes according to :nfd , then composes any
decomposed characters |
:nfkd |
Normalizes according to :nfd and also normalizes
“compatibility” characters, such as replacing U+00B3 SUPERSCRIPT
THREE with U+0033 DIGIT THREE |
:nfkc |
Normalizes according to :nfkd , then composes any
decomposed characters |
reverseU::String#⚙
Returns the reversal of the receiver, inheriting any taint and untrust from the receiver.
- Note
-
This doesn’t take into account proper handling of combining marks, direction indicators, and similarly relevant characters, so this method is mostly useful when you know the contents of the string is simple and the result isn’t intended for display.
center(width#to_int, paddingU::String, #to_str = ' '
)U::String#⚙
Returns the receiver padded as evenly as possible on both sides with padding to make it max(#length, width) wide, inheriting any taint and untrust from the receiver and also from padding if padding is used.
- Raises
-
- ArgumentError
-
If padding#width = 0
- ArgumentError
-
If characters inside padding that should be used for round-off padding are too wide
- See Also
ljust(width#to_int, paddingU::String, #to_str = ' '
)U::String#⚙
Returns the receiver padded on the right with padding to make it max(#length, width) wide, inheriting any taint and untrust from the receiver and also from padding if padding is used.
- Raises
-
- ArgumentError
-
If padding#width = 0
- ArgumentError
-
If characters inside padding that should be used for round-off padding are too wide
- See Also
rjust(width#to_int, paddingU::String, #to_str = ' '
)U::String#⚙
Returns the receiver padded on the left with padding to make it max(#length, width) wide, inheriting any taint and untrust from the receiver and also from padding if padding is used.
- Raises
-
- ArgumentError
-
If padding#width = 0
- ArgumentError
-
If characters inside padding that should be used for round-off padding are too wide
- See Also
count(setU::String, #to_str, *setsArray<
U::String, #to_str>
)Integer#⚙
Returns the number of characters in the receiver that are included in the intersection of set and any additional sets of characters.
The complement of all Unicode characters and a given set of characters may
be specified by prefixing a non-empty set with ‘^
’ (U+005E CIRCUMFLEX
ACCENT).
Any sequence of characters a-b inside a set will expand to also include all characters whose code points lay between those of a and b.
delete(setU::String, #to_str, *setsArray<
U::String, #to_str>
)U::String#⚙
Returns the receiver, minus any characters that are included in the intersection of set and any additional sets of characters, inheriting any taint and untrust.
The complement of all Unicode characters and a given set of characters may
be specified by prefixing a non-empty set with ‘^
’ (U+005E CIRCUMFLEX
ACCENT).
Any sequence of characters a-b inside a set will expand to also include all characters whose code points lay between those of a and b.
squeeze(*setsArray<
U::String, #to_str>
)U::String#⚙
Returns the receiver, replacing any substrings of #length > 1 consisting of the same character c with c, where c is a member of the intersection of the character sets in sets, inheriting any taint and untrust.
If sets is empty, then the set of all Unicode characters is used.
The complement of all Unicode characters and a given set of characters may
be specified by prefixing a non-empty set with ‘^
’ (U+005E CIRCUMFLEX
ACCENT).
Any sequence of characters a-b inside a set will expand to also include all characters whose code points lay between those of a and b.
tr(from#to_str, to#to_str)U::String#⚙
Returns the receiver, translating characters in from to their equivalent character, by index, in to, inheriting any taint and untrust. If to#length < from#length, to[-1] will be used for any index i > to#length.
The complement of all Unicode characters and a given set of characters may
be specified by prefixing a non-empty set with ‘^
’ (U+005E CIRCUMFLEX
ACCENT).
Any sequence of characters a-b inside a set will expand to also include all characters whose code points lay between those of a and b.
tr_s(from#to_str, to#to_str)U::String#⚙
Returns the receiver, translating characters in from to their equivalent character, by index, in to and then squeezing any substrings of #length > 1 consisting of the same character c with c, inheriting any taint and untrust. If to#length < from#length, to[-1] will be used for any index i > to#length.
The complement of all Unicode characters and a given set of characters may
be specified by prefixing a non-empty set with ‘^
’ (U+005E CIRCUMFLEX
ACCENT).
Any sequence of characters a-b inside a set will expand to also include all characters whose code points lay between those of a and b.
partition(separatorRegexp, #to_str)Array<
U::String>
#⚙
Returns the receiver split into s₁ = #slice(0, i), s₂ = #slice(i, n), s₃ = #slice(i+n, -1), where i = j if j ≠ nil, i = #length otherwise, j = #index(separator), n = separator#length, where s₁ and s₃ inherit any taint and untrust from the receiver and s₂ inherits any taint and untrust from separator and also from the receiver if separator is a Regexp.
- See Also
rpartition(separatorRegexp, #to_str)Array<
U::String>
#⚙
Returns the receiver split into s₁ = #slice(0, i), s₂ = #slice(i, n), s₃ = #slice(i + n, -1), where i = j if j ≠ nil, i = 0 otherwise, j = #rindex(separator), n = separator#length, where s₁ and s₃ inherit any taint and untrust from the receiver and s₂ inherits any taint and untrust from separator and also from the receiver if separator is a Regexp.
- See Also
scan(patternRegexp)Array<
U::String>
+#⚙
Returns all matches – or sub-matches, if they exist – of matches of pattern in the receiver, each inheriting any taint and untrust from both the receiver and from pattern.
- Note
-
The Regexp special variables
$&
,$'
,$`
,$1
,$2
, …,$
n are updated accordingly.
scan(pattern#to_str)Array<
U::String>
#⚙
Returns all matches of pattern in the receiver, each inheriting any taint and untrust from the receiver.
scan(patternRegexp){ |submatchesArray<
U::String>
| … }self#⚙
Enumerates the sub-matches of matches of pattern in the receiver, each inheriting any taint and untrust from both the receiver and from pattern.
- Note
-
The Regexp special variables
$&
,$'
,$`
,$1
,$2
, …,$
n are updated accordingly.
scan(pattern#to_str){ |matchU::String| … }self#⚙
Enumerates the matches of pattern in the receiver, each inheriting any taint and untrust from the receiver.
split(patternRegexp, #to_str = $;
, limit#to_int = 0
)Array<
U::String>
#⚙
Returns the receiver split into limit substrings separated by pattern, each inheriting any taint and untrust.
If pattern = $;
= nil or pattern = ' '
, splits according to AWK rules,
that is, any #space? prefix is skipped, then substrings are separated by
non-empty #space? substrings.
If limit < 0, then no limit is imposed and trailing #empty? substrings aren’t removed.
If limit = 0, then no limit is imposed and trailing #empty? substrings are removed.
If limit = 1, then, if #length = 0, the result will be empty, otherwise it will consist of the receiver only.
If limit > 1, then the receiver is split into at most limit substrings.
gsub(patternRegexp, #to_str, replacement#to_str)U::String#⚙
Returns the receiver with all matches of pattern replaced by replacement, inheriting any taint and untrust from the receiver and from replacement.
The replacement is used as a specification for what to replace matches with:
Specification | Replacement |
---|---|
\1 , \2 , …, \ n
|
Numbered sub-match n |
\k< name>
|
Named sub-match name |
The Regexp special variables $&
, $'
, $`
, $1
, $2
, …,
$
n are updated accordingly.
gsub(patternRegexp, #to_str, replacements#to_hash)U::String#⚙
Returns the receiver with all matches of pattern replaced by replacements#[match], where match is the matched substring, inheriting any taint and untrust from the receiver and from the replacements#[match]es, as well as any taint on replacements.
The Regexp special variables $&
, $'
, $`
, $1
, $2
, …,
$
n are updated accordingly.
- Raises
-
- RuntimeError
-
If any replacement is the result being constructed
- Exception
-
Any error raised by replacements#default, if it gets called
gsub(patternRegexp, #to_str){ |matchU::String|#to_str … }U::String#⚙
Returns the receiver with all matches of pattern replaced by the results of the given block, inheriting any taint and untrust from the receiver and from the results of the given block.
The Regexp special variables $&
, $'
, $`
, $1
, $2
, …,
$
n are updated accordingly.
gsub(patternRegexp, #to_str)Enumerator#⚙
Returns an Enumerator over the matches of pattern in the receiver.
The Regexp special variables $&
, $'
, $`
, $1
, $2
, …,
$
n will be updated accordingly.
sub(patternRegexp, #to_str, replacement#to_str)U::String?#⚙
Returns the receiver with the first match of pattern replaced by replacement, inheriting any taint and untrust from the receiver and from replacement, or nil if there’s no match.
The replacement is used as a specification for what to replace matches with:
Specification | Replacement |
---|---|
\1 , \2 , …, \ n
|
Numbered sub-match n |
\k< name>
|
Named sub-match name |
The Regexp special variables $&
, $'
, $`
, $1
, $2
, …,
$
n are updated accordingly.
sub(patternRegexp, #to_str, replacements#to_hash)U::String?#⚙
Returns the receiver with the first match of pattern replaced by replacements#[match], where match is the matched substring, inheriting any taint and untrust from the receiver, replacements, and replacements#[match], or nil if there’s no match.
The Regexp special variables $&
, $'
, $`
, $1
, $2
, …,
$
n are updated accordingly.
- RaisesException
-
Any error raised by replacements#default, if it gets called
sub(patternRegexp, #to_str){ |matchU::String|#to_str … }U::String?#⚙
Returns the receiver with all instances of pattern replaced by the results of the given block, inheriting any taint and untrust from the receiver and from the results of the given block, or nil if there’s no match.
The Regexp special variables $&
, $'
, $`
, $1
, $2
, …,
$
n are updated accordingly.
+
(otherU::String, #to_str)U::String#⚙
Returns the concatenation of other to the receiver, inheriting any taint on either.
*
(n#to_int)U::String#⚙
Returns the concatenation of n copies of the receiver, inheriting any taint and untrust.
- Raises
-
- ArgumentError
-
If n < 0
- ArgumentError
-
If n > 0 and n × #bytesize > LONG_MAX
%
(value)U::String#⚙
Returns a formatted string of the values in Array(value) by treating the receiver as a format specification of this formatted string.
A format specification is a string consisting of sequences of normal
characters that are copied verbatim and field specifiers. A field
specifier consists of a %
, followed by any optional flags, an optional
width, an optional precision, and a directive:
%[flags][width][.[precision]]directive
Note that this means that a lone %
at the end of the string is simply
copied verbatim as it, by this definition, isn’t a field directive.
The directive determines how this field should be formatted. The flags, width, and precision modify this interpretation.
The field often takes a value from value and formats it according to a given set of rules, which depend on the flags, width, and precision, but can also output other, hardwired, values.
The directives that don’t take a value are
Directive | Description |
---|---|
% | Outputs ‘%’. |
\n | Outputs “%\n”. |
\0 | Outputs “%\0”. |
None of these directives take any flags, width, or precision.
All of the following directives allow you to specify a width. The width only ever limits the minimum width of the field, that is, at least width cells will be filled by the field, but perhaps more will actually be required in the end.
- c
-
Outputs
[left-padding]character[right-padding]
If a width w has been specified and the ‘
-
’ flag hasn’t been given, left-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.Character is the result of #to_str#chr on the argument, if it responds to #to_str, otherwise it’s the result of #to_int turned into a string containing the character at that code point. A precision isn’t allowed. The #width of the character is used in any width calculations.
If a width w has been specified and the ‘
-
’ flag has been given, right-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty. - s
-
Outputs
[left-padding]string[right-padding]
Left-padding and right-padding are the same as for the ‘c’ directive described above.
String is a substring of the result of #to_s on the argument that is w cells wide, where w = precision, if a precision has been specified, w = #width otherwise.
- p
-
Outputs
[left-padding]inspect[right-padding]
Left-padding and right-padding are the same as for the ‘c’ directive described above.
String is a substring of the result of #inspect on the argument that is w cells wide, where w = precision, if a precision has been specified, w = #width otherwise.
- d
- i
- u
-
Outputs
[left-padding][prefix/sign][zeroes] [precision-filler]digits[right-padding]
If a width w has been specified and neither the ‘
-
’ nor the ‘0
’ flag has been given, left-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.Prefix/sign is “-” if the argument is negative, “+” if the ‘
+
’ flag was given, and “ ” if the ‘If a width w has been specified and the ‘
0
’ flag has been given and neither the ‘-
’ flag has been given nor a precision has been specified, zeroes consists of enough zeroes to make the whole field at least w cells wide, otherwise it’s empty.If a precision p has been specified, precision-filler consists of enough zeroes to make for p digits of output, otherwise it’s empty.
Digits consists of the digits in base 10 that represent the result of calling Integer with the argument as its argument.
If a width w has been specified and the ‘
-
’ flag has been given, right-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.Flag Description (Space) Add a “ ” prefix to non-negative numbers +
Add a “+” sign to non-negative numbers; overrides the ‘ 0
Use ‘0’ for any width padding; ignored when a precision has been specified -
Left justify the output with ‘ ’ as padding; overrides the ‘ 0
’ flag - o
-
Outputs
[left-padding][prefix/sign][zeroes/sevens] [precision-filler]octal-digits[right-padding]
If a width w has been specified and neither the ‘
-
’ nor the ‘0
’ flag has been given, left-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.Prefix/sign is “-” if the argument is negative and the ‘
+
’ or ‘+
’ flag was given, and “ ” if the ‘If a width w has been specified and the ‘
0
’ flag has been given and neither the ‘-
’ flag has been given nor a precision has been specified, zeroes/sevens consists of enough zeroes, if the argument is non-negative or if the ‘+
’ or ‘If a precision p has been specified, precision-filler consists of enough zeroes, if the argument is non-negative or if the ‘
+
’ or ‘Octal-digits consists of the digits in base 8 that represent the result of #to_int on the argument, using ‘0’ through ‘7’. A negative value will be output as a two’s complement value.
If a width w has been specified and the ‘
-
’ flag has been given, right-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.Flag Description (Space) Add a “ ” prefix to non-negative numbers and don’t output negative numbers as two’s complement values +
Add a “+” sign to non-negative numbers and don’t output negative numbers as two’s complement values; overrides the ‘ 0
Use ‘0’ for any width padding; ignored when a precision has been specified -
Left justify the output with ‘ ’ as padding; overrides the ‘ 0
’ flag#
Increase precision to include as many digits as necessary to make the first digit ‘0’, but don’t include the ‘0’ itself - x
-
Outputs
[left-padding][sign][base-prefix][prefix][zeroes/fs] [precision-filler]hexadecimal-digits[right-padding]
Left-padding and right-padding are the same as for the ‘o’ directive described above. Zeroes/fs is the same as zeroes/sevens for the ‘o’ directive, except that it uses ‘f’ characters instead of sevens. The same goes for precision-filler.
Sign is “-” if the argument is negative and the ‘
+
’ or ‘+
’ flag was given, and “ ” if the argument is non-negative and the ‘Base-prefix is “0x” if the ‘
#
’ flag was given and the result of #to_int on the argument is non-zero.Prefix is “..” if the argument is negative and neither the ‘
+
’ nor the ‘Hexadecimal-digits consists of the digits in base 16 that represent the result of #to_int on the argument, using ‘0’ through ‘9’ and ‘a’ through ‘f’. A negative value will be output as a two’s complement value.
Flag Description (Space) Same as for ‘o’ +
Same as for ‘o’ 0
Same as for ‘o’ -
Same as for ‘o’ #
Prefix non-zero values with “0x” - X
-
Same as ‘x’, except that it uses uppercase letters instead.
- b
-
Outputs
[left-padding][sign][base-prefix][prefix][zeroes/ones] [precision-filler]binary-digits[right-padding]
Left-padding and right-padding are the same as for the ‘o’ directive described above. Base-prefix and prefix are the same as for the ‘x’ directive, except that base-prefix outputs “0b”. Zeroes/ones is the same as zeroes/fs for the ‘x’ directive, except that it uses ones instead of sevens. The same goes for precision-filler.
Binary-digits consists of the digits in base 2 that represent the result of #to_int on the argument, using ‘0’ and ‘1’. A negative value will be output as a two’s complement value.
Flag Description (Space) Same as for ‘o’ +
Same as for ‘o’ 0
Same as for ‘o’ -
Same as for ‘o’ #
Prefix non-zero values with “0b” - B
-
Same as ‘b’, except that it uses a “0B” prefix for the ‘
#
’ flag. - f
-
Outputs
[left-padding][prefix/sign][zeroes] integer-part[decimal-point][fractional-part][right-padding]
If a width w has been specified and neither the ‘
-
’ nor the ‘0
’ flag has been given, left-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.Prefix/sign is “-” if the argument is negative, “+” if the ‘
+
’ flag was given, and “ ” if the ‘If a width w has been specified and the ‘
0
’ flag has been given and the ‘-
’ flag has not been given, zeroes consists of enough zeroes to make the whole field at least w cells wide, otherwise it’s empty.Integer-part consists of the digits in base 10 that represent the integer part of the result of calling Float with the argument as its argument.
Decimal-point is “.” if the precision isn’t 0 or if the ‘
#
’ flag has been given.Fractional-part consists of p digits in base 10 that represent the fractional part of the result of calling Float with the argument as its argument, where p = precision, if one has been specified, p = 6 otherwise.
If a width w has been specified and the ‘
-
’ flag has been given, right-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.Flag Description (Space) Add a “ ” prefix to non-negative numbers +
Add a “+” sign to non-negative numbers; overrides the ‘ 0
Use ‘0’ for any width padding; ignored when a precision has been specified -
Left justify the output with ‘ ’ as padding; overrides the ‘ 0
’ flag# Output a decimal point, even if no fractional part follows - e
-
Outputs
[left-padding][prefix/sign][zeroes] digit[decimal-point][fractional-part]exponent[right-padding]
If a width w has been specified and neither the ‘
-
’ nor the ‘0
’ flag has been given, left-padding consists of enough spaces to make the whole field at least w + e cells wide, where e ≥ 4 is the width of the exponent, otherwise it’s empty.Prefix/sign is “-” if the argument is negative, “+” if the ‘
+
’ flag was given, and “ ” if the ‘If a width w has been specified and the ‘
0
’ flag has been given and the ‘-
’ flag has not been given, zeroes consists of enough zeroes to make the whole field w + e cells wide, where e ≥ 4 is the width of the exponent, otherwise it’s empty.Digit consists of one digit in base 10 that represent the most significant digit of the result of calling Float with the argument as its argument.
Decimal-point is “.” if the precision isn’t 0 or if the ‘
#
’ flag has been given.Fractional-part consists of p digits in base 10 that represent all but the most significant digit of the result of calling Float with the argument as its argument, where p = precision, if one has been specified, p = 6 otherwise.
Exponent consists of “e” followed by the exponent in base 10 required to turn the result of calling Float with the argument as its argument into a decimal fraction with one non-zero digit in the integer part. If the exponent is 0, “+00” will be output.
If a width w has been specified and the ‘
-
’ flag has been given, right-padding consists of enough spaces to make the whole field at least w + e cells wide, where e ≥ 4 is the width of the exponent, otherwise it’s empty.Flag Description (Space) Add a “ ” prefix to non-negative numbers +
Add a “+” sign to non-negative numbers; overrides the ‘ 0
Use ‘0’ for any width padding; ignored when a precision has been specified -
Left justify the output with ‘ ’ as padding; overrides the ‘ 0
’ flag# Output a decimal point, even if no fractional part follows - E
-
Same as ‘e’, except that it uses an uppercase ‘E’ for the exponent separator.
- g
-
Same as ‘e’ if the exponent is less than -4 or if the exponent is greater than or equal to the precision, otherwise ‘f’ is used. The precision defaults to 6 and a precision of 0 is treated as a precision of 1. Trailing zeros are removed from the fractional part of the result.
- G
-
Same as ‘g’, except that it uses an uppercase ‘E’ for the exponent separator.
- a
-
Outputs
[left-padding][prefix/sign][zeroes] digit[hexadecimal-point][fractional-part]exponent[right-padding]
If a width w has been specified and neither the ‘
-
’ nor the ‘0
’ flag has been given, left-padding consists of enough spaces to make the whole field at least w + e cells wide, where e ≥ 3 is the width of the exponent, otherwise it’s empty.Prefix/sign is “-” if the argument is negative, “+” if the ‘
+
’ flag was given, and “ ” if the ‘If a width w has been specified and the ‘
0
’ flag has been given and the ‘-
’ flag has not been given, zeroes consists of enough zeroes to make the whole field w + e cells wide, where e ≥ 3 is the width of the exponent, otherwise it’s empty.Digit consists of one digit in base 16 that represent the most significant digit of the result of calling Float with the argument as its argument, using ‘0’ through ‘9’ and ‘a’ through ‘f’.
Decimal-point is “.” if the precision isn’t 0 or if the ‘
#
’ flag has been given.Fractional-part consists of p digits in base 16 that represent all but the most significant digit of the result of calling Float with the argument as its argument, where p = precision, if one has been specified, p = q, where q is the number of digits required to represent the number exactly, otherwise. Digits are output using ‘0’ through ‘9’ and ‘a’ through ‘f’.
Exponent consists of “p” followed by the exponent of 2 in base 10 required to turn the result of calling Float with the argument as its argument into a decimal fraction with one non-zero digit in the integer part. If the exponent is 0, “+0” will be output.
If a width w has been specified and the ‘
-
’ flag has been given, right-padding consists of enough spaces to make the whole field at least w + e cells wide, where e ≥ 3 is the width of the exponent, otherwise it’s empty.Flag Description (Space) Add a “ ” prefix to non-negative numbers +
Add a “+” sign to non-negative numbers; overrides the ‘ 0
Use ‘0’ for any width padding; ignored when a precision has been specified -
Left justify the output with ‘ ’ as padding; overrides the ‘ 0
’ flag# Output a decimal point, even if no fractional part follows - A
-
Same as ‘a’, except that it uses an uppercase letters instead.
A warning is issued if the ‘0
’ flag is given when the ‘-
’ flag has
also been given to the ‘d’, ‘i’, ‘u’, ‘o’, ‘x’, ‘X’, ‘b’, or ‘B’
directives.
A warning is issued if the ‘0
’ flag is given when a precision has been
specified for the ‘d’, ‘i’, ‘u’, ‘o’, ‘x’, ‘X’, ‘b’, or ‘B’ directives.
A warning is issued if the ‘
’ flag is given when the ‘+
’
flag has also been given to the ‘d’, ‘i’, ‘u’, ‘o’, ‘x’, ‘X’, ‘b’, or ‘B’
directives.
A warning is issued if the ‘0
’ flag is given when the ‘o’, ‘x’, ‘X’,
‘b’, or ‘B’ directives has been given a negative argument.
A warning is issued if the ‘#
’ flag is given when the ‘o’ directive has
been given a negative argument.
Any taint on the receiver and any taint on arguments to any ‘s’ and ‘p’ directives is inherited by the result.
- Raises
-
- ArgumentError
-
If the receiver isn’t a valid format specification
- ArgumentError
-
If any flags are given to the ‘%’, ‘\n’, or ‘\0’ directives
- ArgumentError
-
If an argument is given to the ‘%’, ‘\n’, or ‘\0’ directives
- ArgumentError
-
If a width is specified for the ‘%’, ‘\n’, or ‘\0’ directives
- ArgumentError
-
If a precision is specified for the ‘%’, ‘\n’, ‘\0’, or ‘c’ directives
- ArgumentError
-
If any of the flags ‘
+
’, ’0
’, or ‘#
’ are given to the ‘c’, ‘s’, or ‘p’ directives - ArgumentError
-
If the ‘
#
’ flag is given to the ‘d’, ‘i’, or ‘u’ directives - ArgumentError
-
If the argument to the ‘c’ directive doesn’t respond to #to_str or #to_int
format(value)U::String#⚙
This is an alias for #%
.
dumpU::String#⚙
Returns the receiver in a reader-friendly format, inheriting any taint and untrust.
The reader-friendly format looks like “"…".u
”. Inside the “…”, any
#print? characters in the ASCII range are output as-is, the following
special characters are escaped according to the following table:
Character | Dumped Sequence |
---|---|
U+0022 QUOTATION MARK | \" |
U+005C REVERSE SOLIDUS | \\ |
U+000A LINE FEED (LF) | \n |
U+000D CARRIAGE RETURN (CR) | \r |
U+0009 CHARACTER TABULATION | \t |
U+000C FORM FEED (FF) | \f |
U+000B LINE TABULATION | \v |
U+0008 BACKSPACE | \b |
U+0007 BELL | \a |
U+001B ESCAPE | \e |
the following special sequences are also escaped:
Character | Dumped Sequence |
---|---|
#$ |
\#$ |
#@ |
\#@ |
#{ |
\#{ |
any valid UTF-8 byte sequences are output as “\u{
n}
”, where n is the
lowercase hexadecimal representation of the code point encoded by the UTF-8
sequence, and any other byte is output as “\x
n”, where n is the
two-digit uppercase hexadecimal representation of the byte’s value.
inspectString#⚙
Returns the receiver in a reader-friendly inspectable format, inheriting any taint and untrust, encoded using UTF-8.
The reader-friendly inspectable format looks like “"…".u
”. Inside the
“…”, any #print? characters are output as-is, the following special
characters are escaped according to the following table:
Character | Dumped Sequence |
---|---|
U+0022 QUOTATION MARK | \" |
U+005C REVERSE SOLIDUS | \\ |
U+000A LINE FEED (LF) | \n |
U+000D CARRIAGE RETURN (CR) | \r |
U+0009 CHARACTER TABULATION | \t |
U+000C FORM FEED (FF) | \f |
U+000B LINE TABULATION | \v |
U+0008 BACKSPACE | \b |
U+0007 BELL | \a |
U+001B ESCAPE | \e |
the following special sequences are also escaped:
Character | Dumped Sequence |
---|---|
#$ |
\#$ |
#@ |
\#@ |
#{ |
\#{ |
Valid UTF-8 byte sequences representing code points < 0x10000 are output as
\u
n, where n is the four-digit uppercase hexadecimal representation
of the code point.
Valid UTF-8 byte sequences representing code points ≥ 0x10000 are output as
\u{
n}
, where n is the uppercase hexadecimal representation of the
code point.
Any other byte is output as \x
n, where n is the two-digit uppercase
hexadecimal representation of the byte’s value.
hashFixnum#⚙
Returns the hash value of the receiver’s content.
hexInteger#⚙
Returns the result of #to_i(16).
octInteger#⚙
Returns the result of #to_i(8), but with the added provision
that any leading base specification in the receiver will override the
suggested octal (8) base, that is, '0b11'.u
#oct = 3, not 9.
to_i(base#to_int = 16
)Integer#⚙
Returns the Integer value that results from treating the receiver as a string of digits in base.
The conversion algorithm is
Skip any leading #space?s
Check for an optional sign, ‘+’ or ‘-’
If base is 2, skip an optional “0b” or “0B” prefix
If base is 8, skip an optional “0o” or “0o” prefix
If base is 10, skip an optional “0d” or “0D” prefix
If base is 16, skip an optional “0x” or “0X” prefix
Skip any ‘0’s
-
Read an as long sequence of digits in base separated by optional U+005F LOW LINE characters, using letters in the following ranges of characters for digits or the characters digit value, if any
U+0041 LATIN CAPITAL LETTER A through U+005A LATIN CAPITAL LETTER Z
U+0061 LATIN SMALL LETTER A through U+007A LATIN SMALL LETTER Z
U+FF21 FULLWIDTH LATIN CAPITAL LETTER A through U+FF3A FULLWIDTH LATIN CAPITAL LETTER Z
U+FF41 FULLWIDTH LATIN SMALL LETTER A through U+FF5A FULLWIDTH LATIN SMALL LETTER Z
Note that only one separator is allowed in a row.
- RaisesArgumentError
-
Unless 2 ≤ base ≤ 36
to_strString#⚙
Returns the String representation of the receiver, inheriting any taint and untrust, encoded as UTF-8.
to_sString#⚙
This is an alias for #to_str.
bString#⚙
Returns the String representation of the receiver, inheriting any taint and untrust, encoded as ASCII-8BIT.
to_symSymbol#⚙
Returns the Symbol representation of the receiver.
- Raises
-
- EncodingError
-
If the receiver contains an invalid UTF-8 sequence
- RuntimeError
-
If there’s no more room for a new Symbol in Ruby’s Symbol table
internSymbol#⚙
This is an alias for #to_sym.