disu.se

StringClass

Namespace

U

Ancestors
  1. Comparable

  2. Data

  3. Object

A U::String is a sequence of zero or more Unicode characters encoded as UTF-8. It’s interface is an extension of that of Ruby’s built-in String class that provides better Unicode support, as it handles things such as casing, width, collation, and various other Unicode properties that Ruby’s built-in String class simply doesn’t bother itself with. It also provides “backwards compatibility” with Ruby 1.8.7 so that you can use Unicode without upgrading to Ruby 2.0 (which you probably should do, though).

It differs from Ruby’s built-in String class in one other very important way in that it doesn’t provide any way to change an existing object. That is, a U::String is a value object.

A U::String is most easily created from a String by calling String#u. Most U::String methods that return a stringy result will return a U::String, so you only have to do that once. You can get back a String by calling U::String#to_str.

Validation of a U::String’s content isn’t performed until any access to it is made, at which time an ArgumentError will be raised if it isn’t valid.

U::String has a lot of methods defined upon it, so let’s break them up into categories to get a proper overview of what’s possible to do with one. Let’s begin with the interrogators. There are three kinds of interrogators, validity-checking ones, property-checking ones, and content-matching ones.

The validity-checking interrogator is #valid_encoding?, which makes sure that the UTF-8 sequence itself is valid.

The property-checking interrogators are #alnum?, #alpha?, #ascii_only?, #assigned?, #case_ignorable?, #cased?, #cntrl?, #defined?, #digit?, #graph?, #newline?, #print?, #punct?, #soft_dotted?, #space?, #title?, #valid?, #wide?, #wide_cjk?, #xdigit?, and #zero_width?. These interrogators check the corresponding Unicode property of each characters in the U::String and if all characters have this property, they’ll return true.

Very close relatives to the property-checking interrogators are #folded?, #lower?, and #upper?, which check whether a string has been cased in a given way, and #normalized?, which checks whether the receiver has been normalized, optionally to a specific normalization form.

The content-matching interrogators are #==, #===, #=~, #match, #empty?, #end_with?, #eql?, #include?, #index, #rindex, and #start_with?. These interrogators check that a substring of the U::String matches another string or Regexp and either return a Boolean result, and index into the U::String where the match begins or MatchData for full matching information.

Related to the content-matching interrogators are #<=>, #casecmp, and #collation_key, all of which compare a U::String against another for ordering.

Related to the property-checking interrogators are #canonical_combining_class, #general_category, #grapheme_break, #line_break, #script, and #word_break, which return the value of the Unicode property in question, the general category being the one often interrogated.

There are a couple of other “interrogators” in #bytesize, #length, #size, #width that return integer properties of the U::String as a whole, where #length and #width are probably the most useful.

Beyond interrogators there are quite a few methods for iterating over the content of a U::String, each viewing it in its own way: #each_byte, #each_char, #each_codepoint, #each_grapheme_cluster, #each_line, and #each_word. They all have respective methods (#bytes, #chars, #codepoints, #grapheme_clusters, #lines, #words) that return an Array instead of yielding each result.

Quite a few methods are devoted to extracting a substring of a U::String, namely #[], #slice, #byteslice, #chomp, #chop, #chr, #getbyte, #lstrip, #ord, #rstrip, #strip.

There are a few methods for case-shifting: #downcase, #foldcase, #titlecase, and #upcase. Then there’s #mirror, #normalize, and #reverse that alter the string in other ways.

The methods #center, #ljust, and #rjust pad a U::String to make it a certain number of cells wide.

Then there’s a couple of methods that are more related in the arguments they take than in function: #count, #delete, #squeeze, #tr, and #tr_s. These methods all take specifications of character/code point ranges that should be counted, deleted, squeezed, and translated (plus squeezed).

Deconstructing a U::String can be done with #partition and #rpartition, which splits it around a divider, #scan, which extracts matches to a pattern, #split, which splits it on a divider.

Substitution of all matches to a pattern can be made with #gsub and of the first match to a pattern with #sub.

Creating larger U::Strings from smaller ones is done with #+, which concatenates two of them, and #*, which concatenates a U::String to itself a number of times.

A U::String can also be used as a specification as to how to format a number of values via #% (and its alias #format) into a new U::String, much like snprintf(3) in C.

The content of a U::String can be #dumped and #inspected to make it reader-friendly, but also debugger-friendly.

Finally, a U::String has a few methods to turn its content into other values: #hash, which turns it into a hash value to be used for hashing, #hex, #oct, #to_i, which turn it into a Integer, #to_str, #to_s, #b, which turn it into a String, and #to_sym (and its alias #intern), which turns it into a Symbol.

Note that some methods defined on String are missing. #Capitalize doesn’t exist, as capitalization isn’t a Unicode concept. #Sum doesn’t exist, as a U::String generally doesn’t contain content that you need a checksum of. #Crypt doesn’t exist for similar reasons. #Swapcase isn’t useful on a String and it certainly isn’t useful in a Unicode context. As a U::String doesn’t contain arbitrary data, #unpack is left to String. #Next/#succ would perhaps be implementable, but haven’t, as a satisfactory implementation hasn’t been thought of.

Constructor

initialize(stringString? = nil)#

Sets up a U::String wrapping string after encoding it as UTF-8 and freezing it.

Instance Methods

uself#

Returns the receiver; mostly for completeness, but allows you to always call #u on something that’s either a String or a U::String.

valid_encoding?Boolean#

Returns true if the receiver contains only valid UTF-8 sequences.

alnum?Boolean#

Returns true if the receiver contains only characters in the general categories Letter and Number.

alpha?Boolean#

Returns true if the receiver contains only characters in the general category Alpha.

ascii_only?Boolean#

Returns true if the receiver contains only characters in the ASCII region, that is, U+0000 through U+007F.

assigned?Boolean#

Returns true if the receiver contains only code points that have been assigned a code value.

case_ignorable?Boolean#

Returns true if the receiver contains only “case ignorable” characters, that is, characters in the general categories

  • Other, format (Cf)

  • Letter, modifier (Lm)

  • Mark, enclosing (Me)

  • Mark, nonspacing (Mn)

  • Symbol, modifier (Sk)

and the characters

  • U+0027 APOSTROPHE

  • U+00AD SOFT HYPHEN

  • U+2019 RIGHT SINGLE QUOTATION MARK.

See Also

Unicode Standard Annex #21: Case Mappings

cased?Boolean#

Returns true if the receiver only contains characters in the general categories

  • Letter, uppercase (Lu)

  • Letter, lowercase (Ll)

  • Letter, titlecase (Lt)

or has the derived properties Other_Uppercase or Other_Lowercase.

cntrl?Boolean#

Returns true if the receiver contains only characters in the general category Other, control (Cc).

defined?Boolean#

Returns true if the receiver contains only characters not in the general categories Other, not assigned (Cn) and Other, surrogate (Cs).

digit?Boolean#

Returns true if the receiver contains only characters in the general category Number, decimal digit (Nd).

folded?(locale#to_str = ENV[LC_CTYPE])Boolean#

Returns true if the receiver has been case-folded according to the rules of the language of locale, which may be empty to specifically use the default, language-independent, rules, that is, if a = a#foldcase(locale), where a = #normalize(:nfd).

graph?Boolean#

Returns true if the receiver contains only non-space “printable” characters.

Non-space “printable” character are those not in the general categories Other or Space, separator (Zs):

  • Other, control (Cc)

  • Other, format (Cf)

  • Other, not assigned (Cn)

  • Other, surrogate (Cs)

  • Space, separator (Zs)

lower?(locale#to_str = ENV[LC_CTYPE])Boolean#

Returns true if the receiver has been downcased according to the rules of the language of locale, which may be empty to specifically use the default, language-independent, rules, that is, if a = a#downcase(locale), where a = #normalize(:nfd).

newline?Boolean#

Returns true if the receiver contains only “newline” characters. A character is a “newline” character if it is any of the following characters:

  • U+000A (LINE FEED (LF))

  • U+000C (FORM FEED (FF))

  • U+000D (CARRIAGE RETURN (CR))

  • U+0085 (NEXT LINE)

  • U+2028 (LINE SEPARATOR)

  • U+2029 (PARAGRAPH SEPARATOR)

punct?Boolean#

Returns true if the receiver contains only characters in the general categories Punctuation and Symbol.

soft_dotted?Boolean#

Returns true if this U::String only contains soft-dotted characters.

Note

Soft-dotted characters have the soft-dotted property and thus lose their dot if an accent is applied to them, for example, ‘i’ and ‘j’.

See Also

Unicode Public Review Issue #11

space?Boolean#

Returns true if the receiver contains only “space” characters. Space characters are those in the general category Separator:

  • Separator, space (Zs)

  • Separator, line (Zl)

  • Separator, paragraph (Zp)

such as ‘ ’, or a control character acting as such, namely

  • U+0009 CHARACTER TABULATION (HT)

  • U+000A LINE FEED (LF)

  • U+000C FORM FEED (FF)

  • U+000D CARRIAGE RETURN (CR)

title?Boolean#

Returns true if the receiver contains only characters in the general category Letter, Titlecase (Lt).

upper?(locale#to_str = ENV[LC_CTYPE])Boolean#

Returns true if the receiver has been upcased according to the rules of the language of locale, which may be empty to specifically use the default, language-independent, rules, that is, if a = a#upcase(locale), where a = #normalize(:nfd).

valid?Boolean#

Returns true if the receiver contains only valid Unicode characters.

wide?Boolean#

Returns true if the receiver contains only “wide” characters. Wide character are those that have their East_Asian_Width property set to Wide or Fullwidth.

This is mostly useful for determining how many “cells” a character will take up on a terminal or similar cell-based display.

See Also

wide_cjk?Boolean#

Returns true if the receiver contains only “wide” and “ambiguously wide” characters. Wide and ambiguously wide character are those that have their East_Asian_Width property set to Ambiguous, Wide or Fullwidth.

This is mostly useful for determining how many “cells” a character will take up on a terminal or similar cell-based display.

See Also

xdigit?Boolean#

Returns true if the receiver contains only characters in the general category Number, decimal digit (Nd) or is a lower- or uppercase letter between ‘a’ and ‘f’. Specifically, any character that

  • Belongs to the general category Number, decimal digit (Nd)

  • Falls in the range U+0041 (LATIN CAPITAL LETTER A) through U+0046 (LATIN CAPITAL LETTER F)

  • Falls in the range U+0061 (LATIN SMALL LETTER A) through U+0066 (LATIN SMALL LETTER F)

  • Falls in the range U+FF21 (FULLWIDTH LATIN CAPITAL LETTER A) through U+FF26 (FULLWIDTH LATIN CAPITAL LETTER F)

  • Falls in the range U+FF41 (FULLWIDTH LATIN SMALL LETTER A) through U+FF46 (FULLWIDTH LATIN SMALL LETTER F)

will do.

zero_width?Boolean#

Returns true if the receiver contains only “zero-width” characters. A zero-width character is defined as a character in the general categories Mark, nonspacing (Mn), Mark, enclosing (Me) or Other, format (Of), excluding the character U+00AD (SOFT HYPHEN), or is a Hangul character between U+1160 and U+1200 or U+200B (ZERO WIDTH SPACE).

normalized?(mode#to_sym = :default)Boolean#

Returns true if it can be determined that the receiver is normalized according to mode.

See #normalize for a discussion on normalization and a list of the possible normalization modes.

See Also

Unicode Standard Annex #15: Unicode Normalization Forms

==(otherU::String, #to_str)Boolean#

Returns true if the receiver’s bytes equal those of other.

See Also

===(otherU::String, #to_str)Boolean#

This is an alias for #==.

=~(otherRegexp, #=~)Numeric?#

Returns the result of other#=~(self), that is, the index of the first character of the match of other in the receiver, if one exists.

RaisesTypeError

If other is a U::String or String

match(patternRegexp, #to_str, index#to_int = 0)MatchData?#

Returns the result of r#match(self, index), that is, the match data of the first match of r in the receiver, inheriting any taint and untrust from both the receiver and from pattern, if one exists, where r = pattern, if pattern is a Regexp, r = Regexp.new(pattern) otherwise.

match(patternRegexp, #to_str, index#to_int = 0){ |matchdataMatchData| … }Object?#

Returns the result of calling the given block with the result of r#match(self, index), that is, the match data of the first match of r in the receiver, inheriting any taint and untrust from both the recevier and from pattern, if one exists, where r = pattern, if pattern is a Regexp, r = Regexp.new(pattern) otherwise.

empty?Boolean#

Returns true if #bytesize = 0.

end_with?(*suffixesArray)Boolean#

Returns true if any element of suffixes that responds to #to_str is a byte-level suffix of the receiver.

eql?(otherU::String)Boolean#

Returns true if the receiver’s bytes equal those of other.

See Also

include?(substring#to_str)Boolean#

Returns true if #index(substring) ≠ nil.

index(patternRegexp, #to_str, offset#to_int = 0)Integer?#

Returns the minimal index of the receiver where pattern matches, equal to or greater than i, where i = offset if offset ≥ 0, i = #length - abs(offset) otherwise, or nil if there is no match.

If pattern is a Regexp, the Regexp special variables $&, $', $`, $1, $2, …, $n are updated accordingly.

If pattern responds to #to_str, the matching is performed by byte comparison.

See Also

#rindex

rindex(patternRegexp, #to_str, offset#to_int = -1)Integer?#

Returns the maximal index of the receiver where pattern matches, equal to or less than i, where i = offset if offset ≥ 0, i = #length - abs(offset) otherwise, or nil if there is no match.

If pattern is a Regexp, the Regexp special variables $&, $', $`, $1, $2, …, $n are updated accordingly.

If pattern responds to #to_str, the matching is performed by a byte comparison.

See Also

#index

start_with?(*prefixesArray)Boolean#

Returns true if any element of prefixes that responds to #to_str is a byte-level prefix of the receiver.

<=>(otherU::String, #to_str, locale#to_str = ENV['LC_COLLATE'])Fixnum#

Returns the comparison of the receiver and other using the linguistically correct rules of locale. The locale must be given as a language, region, and encoding, for example, “en_US.UTF-8”.

This operation is known as “collation” and you can find more information about the collation algorithm employed in the Unicode Technical Standard #10, see http://unicode.org/reports/tr10/.

RaisesErrno::EILSEQ

If a character in the receiver can’t be converted into the encoding of the locale

See Also

casecmp(otherU::String, #to_str, locale#to_str = ENV['LC_COLLATE'])Fixnum#

Returns the comparison of #foldcase to other#foldcase using the linguistically correct rules of locale. This is, however, only an approximation of a case-insensitive comparison. The locale must be given as a language, region, and encoding, for example, “en_US.UTF-8”.

This operation is known as “collation” and you can find more information about the collation algorithm employed in the Unicode Technical Standard #10, see http://unicode.org/reports/tr10/.

collation_key(locale)U::String#

Returns the locale-dependent collation key of the receiver in locale, inheriting any taint and untrust.

Note
  • Use the collation key when comparing U::Strings to each other repeatedly, as occurs when, for example, sorting a list of U::Strings.

  • The locale must be given as a language, region, and encoding, for example, “en_US.UTF-8”.

RaisesErrno::EILSEQ

If a character in the receiver can’t be converted into the encoding of the locale

canonical_combining_classFixnum#

Returns the canonical combining class of the characters of the receiver.

The canonical combining class of a character is a number in the range [0, 254]. The canonical combining class is used when generating a canonical ordering of the characters in a string.

The empty string has a canonical combining class of 0.

Raises
ArgumentError

If the receiver contains two characters belonging to different combining classes

ArgumentError

If the receiver contains an incomplete UTF-8 sequence

ArgumentError

If the receiver contains an invalid UTF-8 sequence

general_categorySymbol#

Returns the general category of the characters of the receiver.

The general category identifies what kind of symbol the character is.

Category Major, minor Unicode Value Ruby Value
Other, control Cc :other_control
Other, format Cf :other_format
Other, not assigned Cn :other_not_assigned
Other, private use Co :other_private_use
Other, surrogate Cs :other_surrogate
Letter, lowercase Ll :letter_lowercase
Letter, modifier Lm :letter_modifier
Letter, other Lo :letter_other
Letter, titlecase Lt :letter_titlecase
Letter, uppercase Lu :letter_uppercase
Mark, spacing combining Mc :mark_spacing_combining
Mark, enclosing Me :mark_enclosing
Mark, nonspacing Mn :mark_non_spacing
Number, decimal digit Nd :number_decimal
Number, letter Nl :number_letter
Number, other No :number_other
Punctuation, connector Pc :punctuation_connector
Punctuation, dash Pd :punctuation_dash
Punctuation, close Pe :punctuation_close
Punctuation, final quote Pf :punctuation_final_quote
Punctuation, initial quote Pi :punctuation_initial_quote
Punctuation, other Po :punctuation_other
Punctuation, open Ps :punctuation_open
Symbol, currency Sc :symbol_currency
Symbol, modifier Sk :symbol_modifier
Symbol, math Sm :symbol_math
Symbol, other So :symbol_other
Separator, line Zl :separator_line
Separator, paragraph Zp :separator_paragraph
Separator, space Zs :separator_space
Raises
ArgumentError

If the receiver contains two characters belonging to different general categories

ArgumentError

If the receiver contains an incomplete UTF-8 sequence

ArgumentError

If the receiver contains an invalid UTF-8 sequence

See Also

Unicode Technical Note #36: A Categorization of Unicode Characters

grapheme_breakSymbol#

Returns the grapheme break property value of the characters of the receiver.

The possible break values are

  • :control

  • :cr

  • :extend

  • :l

  • :lf

  • :lv

  • :lvt

  • :other

  • :prepend

  • :regional_indicator

  • :spacingmark

  • :t

  • :v

RaisesArgumentError

If the string consists of more than one break type

See Also

Unicode Standard Annex #29: Unicode Text Segmentation

line_breakSymbol#

Returns the line break property value of the characters of the receiver.

The possible break values are

  • :after

  • :alphabetic

  • :ambiguous

  • :before

  • :before_and_after

  • :carriage_return

  • :close_parenthesis

  • :close_punctuation

  • :combining_mark

  • :complex_context

  • :conditional_japanese_starter

  • :contingent

  • :exclamation

  • :hangul_l_jamo

  • :hangul_lv_syllable

  • :hangul_lvt_syllable

  • :hangul_t_jamo

  • :hangul_v_jamo

  • :hebrew_letter

  • :hyphen

  • :ideographic

  • :infix_separator

  • :inseparable

  • :line_feed

  • :mandatory

  • :next_line

  • :non_breaking_glue

  • :non_starter

  • :numeric

  • :open_punctuation

  • :postfix

  • :prefix

  • :quotation

  • :regional_indicator

  • :space

  • :surrogate

  • :symbol

  • :unknown

  • :word_joiner

  • :zero_width_space

RaisesArgumentError

If the string consists of more than one break type

See Also

Unicode Standard Annex #14: Unicode Line Breaking Algorithm

scriptSymbol#

Returns the script of the characters of the receiver.

The script of a character identifies the primary writing system that uses the character.

Script Description
:arabic Arabic
:armenian Armenian
:avestan Avestan
:balinese Balinese
:bamum Bamum
:batak Batak
:bengali Bengali
:bopomofo Bopomofo
:brahmi Brahmi
:braille Braille
:buginese Buginese
:buhid Buhid
:canadian_aboriginal Canadian Aboriginal
:carian Carian
:chakma Chakma
:cham Cham
:cherokee Cherokee
:common For other characters that may be used with multiple scripts
:coptic Coptic
:cuneiform Cuneiform
:cypriot Cypriot
:cyrillic Cyrillic
:deseret Deseret
:devanagari Devanagari
:egyptian_hieroglyphs Egyptian Hieroglpyhs
:ethiopic Ethiopic
:georgian Georgian
:glagolitic Glagolitic
:gothic Gothic
:greek Greek
:gujarati Gujarati
:gurmukhi Gurmukhi
:han Han
:hangul Hangul
:hanunoo Hanunoo
:hebrew Hebrew
:hiragana Hiragana
:imperial_aramaic Imperial Aramaic
:inherited For characters that may be used with multiple scripts, and that inherit their script from the preceding characters; these include nonspacing marks, enclosing marks, and the zero-width joiner/non-joiner characters
:inscriptional_pahlavi Inscriptional Pahlavi
:inscriptional_parthian Inscriptional Parthian
:javanese Javanese
:kaithi Kaithi
:kannada Kannada
:katakana Katakana
:kayah_li Kayah Li
:kharoshthi Kharoshthi
:khmer Khmer
:lao Lao
:latin Latin
:lepcha Lepcha
:limbu Limbu
:linear_b Linear B
:lisu Lisu
:lycian Lycian
:lydian Lydian
:malayalam Malayalam
:mandaic Mandaic
:meetei_mayek Meetei Mayek
:meroitic_hieroglyphs Meroitic Hieroglyphs
:meroitic_cursive Meroitic Cursives
:miao Miao
:mongolian Mongolian
:myanmar Myanmar
:new_tai_lue New Tai Lue
:nko N'Ko
:ogham Ogham
:old_italic Old Italic
:old_persian Old Persian
:old_south_arabian Old South Arabian
:old_turkic Old Turkic
:ol_chiki Ol Chiki
:oriya Oriya
:osmanya Osmanya
:phags_pa Phags-pa
:phoenician Phoenician
:rejang Rejang
:runic Runic
:samaritan Samaritan
:saurashtra Saurashtra
:sharada Sharada
:shavian Shavian
:sinhala Sinhala
:sora_sompeng Sora Sompeng
:sundanese Sundanese
:syloti_nagri Syloti Nagri
:syriac Syriac
:tagalog Tagalog
:tagbanwa Tagbanwa
:tai_le Tai Le
:tai_tham Tai Tham
:tai_viet Tai Viet
:takri Takri
:tamil Tamil
:telugu Telugu
:thaana Thaana
:thai Thai
:tibetan Tibetan
:tifinagh Tifinagh
:ugaritic Ugaritic
:unknown For not assigned, private-use, non-character, and surrogate code points
:vai Vai
:yi Yi
Raises
ArgumentError

If the receiver contains two characters belonging to different scripts

ArgumentError

If the receiver contains an incomplete UTF-8 sequence

ArgumentError

If the receiver contains an invalid UTF-8 sequence

See Also

Unicode Standard Annex #24 Unicode Script Property

word_breakSymbol#

Returns the word break property value of the characters of the receiver.

The possible word break values are

  • :aletter

  • :cr

  • :extend

  • :extendnumlet

  • :format

  • :katakana

  • :lf

  • :midletter

  • :midnum

  • :midnumlet

  • :newline

  • :numeric

  • :other

  • :regional_indicator

RaisesArgumentError

If the string consists of more than one break type

See Also

Unicode Standard Annex #29: Unicode Text Segmentation

bytesizeInteger#

Returns the number of bytes required to represent the receiver.

lengthInteger#

Returns the number of characters in the receiver.

sizeInteger#

This is an alias for #length.

widthInteger#

Returns the width of the receiver. The width is defined as the sum of the number of “cells” on a terminal or similar cell-based display that the characters in the string will require.

Characters that are #wide? have a width of 2. Characters that are #zero_width? have a width of 0. Other characters have a width of 1.

See Also

Unicode Standard Annex #11: East Asian Width

each_byte{ |byteFixnum| … }self#

Enumerates the bytes in the receiver.

each_byteEnumerator#

Returns an Enumerator over the bytes in the receiver.

bytesArray<Fixnum>#

Returns the bytes of the receiver.

each_char{ |charU::String| … }self#

Enumerates the characters in the receiver, each inheriting any taint and untrust.

each_charEnumerator#

Returns an Enumerator over the characters in the receiver.

charsArray<U::String>#

Returns the characters of the receiver, each inheriting any taint and untrust.

each_codepoint{ |codepointInteger| … }self#

Enumerates the code points of the receiver.

each_codepointEnumerator#

Returns an Enumerator over the code points of the receiver.

codepointsArray<Integer>#

Returns the code points of the receiver.

each_grapheme_cluster{ |clusterU::String| … }self#

Enumerates the grapheme clusters in the receiver, each inheriting any taint and untrust.

See Also

Unicode Standard Annex #29: Unicode Text Segmentation

each_grapheme_clusterEnumerator#

Returns an Enumerator over the grapheme clusters in the receiver.

See Also

Unicode Standard Annex #29: Unicode Text Segmentation

grapheme_clusters{ |clusterU::String| … }self#

This is an alias for #each_grapheme_cluster.

grapheme_clustersEnumerator#

This is an alias for #each_grapheme_cluster.

each_line(separatorU::String, #to_str = $/){ |lpU::String, self| … }self#

Enumerates the lines of the receiver, inheriting any taint and untrust.

If separator is nil, yields self. If separator is #empty?, separates each line (paragraph) by two or more U+000A LINE FEED characters.

each_line(separatorU::String, #to_str = $/)Enumerator#

Returns an Enumerator over the lines of the receiver.

If separator is nil, self will be yielded. If separator is #empty?, separates each line (paragraph) by two or more U+000A LINE FEED characters.

lines(separatorU::String, #to_str = $/)Array<U::String>#

Returns the lines of the receiver, inheriting any taint and untrust.

If separator is nil, yields self. If separator is #empty?, separates each line (paragraph) by two or more U+000A LINE FEED characters.

each_word{ |wordU::String| … }self#

Enumerates the words in the receiver, each inheriting any taint and untrust.

See Also

Unicode Standard Annex #29: Unicode Text Segmentation

each_wordEnumerator#

Returns an Enumerator over the characters in the receiver.

See Also

Unicode Standard Annex #29: Unicode Text Segmentation

words{ |wordU::String| … }self#

This is an alias for #each_word.

wordsEnumerator#

This is an alias for #each_word.

[](index#to_int)U::String?#

Returns the substring [max(i, 0), min(#length, i + 1)], where i = index if index ≥ 0, i = #length - abs(index) otherwise, inheriting any taint and untrust, or nil if this substring is empty.

[](index#to_int, length#to_int)U::String?#

Returns the substring [max(i, 0), min(#length, i + length)], where i = index if index ≥ 0, i = #length - abs(index) otherwise, inheriting any taint or untrust, or nil if length < 0.

[](rangeRange)U::String?#

Returns the result of #[i, j - k], where i = range#begin if range#begin ≥ 0, i = #length - abs(range#begin) otherwise, j = range#end if range#end ≥ 0, j = #length - abs(range#end) otherwise, and k = 1 if range#exclude_end?, k = 0 otherwise, or nil if j - k < 0.

[](regexpRegexp, reference#to_int, #to_str, Symbol = 0)U::String?#

Returns the submatch reference from the first match of regexp in the receiver, inheriting any taint and untrust from both the receiver and from regexp, or nil if there is no match or if the submatch isn’t part of the overall match.

RaisesIndexError

If reference doesn’t refer to a submatch

[](stringU::String, ::String)U::String?#

Returns the substring string, inheriting any taint and untrust from string, if string is a substring of the receiver.

[](objectObject)nil#

Returns nil for any object that doesn’t satisfy the other cases.

slice(index#to_int)U::String?#

This is an alias for #[].

slice(index#to_int, length#to_int)U::String?#

This is an alias for #[].

slice(rangeRange)U::String?#

This is an alias for #[].

slice(regexpRegexp, reference#to_int, #to_str, Symbol = 0)U::String?#

This is an alias for #[].

slice(stringU::String, ::String)U::String?#

This is an alias for #[].

slice(objectObject)nil#

This is an alias for #[].

byteslice(index#to_int)U::String?#

Returns the byte-index-based substring [max(i, 0), min(#bytesize, i + 1)], where i = index if index ≥ 0, i = #bytesize - abs(index) otherwise, inheriting any taint and untrust, or nil if this substring is empty.

byteslice(index#to_int, length#to_int)U::String?#

Returns the byte-index-based substring [max(i, 0), min(#bytesize, i + length)], where i = index if index ≥ 0, i = #bytesize - abs(index) otherwise, inheriting any taint and untrust, or nil if length < 0.

byteslice(rangeRange)U::String?#

Returns the result of #[i, j - k], where i = range#begin if range#begin ≥ 0, i = #bytesize - abs(range#begin) otherwise, j = range#end if range#end ≥ 0, j = #bytesize - abs(range#end) otherwise, and k = 1 if range#exclude_end?, k = 0 otherwise, or nil if j - k < 0.

byteslice(objectObject)nil#

Returns nil for any object that doesn’t satisfy the other cases.

chomp(separatorU::String, #to_str, nil = $/)U::String, self, nil#

Returns the receiver, minus any separator suffix, inheriting any taint and untrust, unless #length = 0, in which case nil is returned. If separator is nil or invalidly encoded, the receiver is returned.

If separator is $/ and $/ has its default value or if separator is U+000A LINE FEED, the longest suffix consisting of any of

  • U+000A LINE FEED

  • U+000D CARRIAGE RETURN

  • U+000D CARRIAGE RETURN, U+000D LINE FEED

will be removed. If no such suffix exists and the last character is a #newline?, it will be removed instead.

If separator is #empty?, remove the longest #newline? suffix.

See Also

chopU::String#

Returns the receiver, minus its last character, inheriting any taint and untrust, unless the receiver is #empty? or if the last character is invalidly encoded, in which case the receiver is returned.

If the last character is U+000A LINE FEED and the second-to-last character is U+000D CARRIAGE RETURN, both characters are removed.

See Also

chrU::String#

Returns the substring [0, min(#length, 1)], inheriting any taint and untrust.

getbyte(index#to_int)Fixnum?#

Returns the byte at byte-index i, where i = index if index ≥ 0, i = #bytesize - abs(index) otherwise, or nil if i lays outside of [0, #bytesize].

lstripU::String#

Returns the receiver with its maximum #space? prefix removed, inheriting any taint and untrust.

See Also

ordInteger#

Returns the code point of the first character of the receiver.

rstripU::String#

Returns the receiver with its maximum #space? suffix removed, inheriting any taint and untrust from the receiver.

See Also

stripU::String#

Returns the receiver with its maximum #space? prefix and suffix removed, inheriting any taint and untrust.

See Also

downcase(locale#to_str = ENV['LC_CTYPE'])U::String#

Returns the downcasing of the receiver according to the rules of the language of locale, which may be empty to specifically use the default, language-independent, rules, inheriting any taint and untrust.

foldcase(locale#to_str = ENV['LC_CTYPE'])U::String#

Returns the case-folding of the receiver according to the rules of the language of locale, which may be empty to specifically use the default rules, inheriting any taint and untrust.

titlecase(locale#to_str = ENV['LC_CTYPE'])U::String#

Returns the title-casing of the receiver according to the rules of the language of locale, which may be empty to specifically use the default, language-independent, rules, inheriting any taint and untrust.

upcase(locale#to_str = ENV['LC_CTYPE'])U::String#

Returns the upcasing of the receiver according to the rules of of the language of locale, which may be empty to specifically use the default, language-independent, rules, inheriting any taint and untrust.

mirrorU::String#

Returns the mirroring of the receiver, inheriting any taint and untrust.

Mirroring is done by replacing characters in the string with their horizontal mirror image, if any, in text that is laid out from right to left. For example, ‘(’ becomes ‘)’ and ‘)’ becomes ‘(’.

See Also

Unicode Standard Annex #9: Unicode Bidirectional Algorithm

normalize(form#to_sym = :nfd)U::String#

Returns the receiver normalized into form, inheriting any taint and untrust.

Normalization is the process of converting characters and sequences of characters in string into a canonical form. This process includes dealing with whether characters are represented by a composed character or a base character and combining marks, such as accents.

The possible normalization forms are

Form Description
:nfd Normalizes characters to their maximally decomposed form, ordering accents and so on according to their combining class
:nfc Normalizes according to :nfd, then composes any decomposed characters
:nfkd Normalizes according to :nfd and also normalizes “compatibility” characters, such as replacing U+00B3 SUPERSCRIPT THREE with U+0033 DIGIT THREE
:nfkc Normalizes according to :nfkd, then composes any decomposed characters
See Also

Unicode Standard Annex #15: Unicode Normalization Forms

reverseU::String#

Returns the reversal of the receiver, inheriting any taint and untrust from the receiver.

Note

This doesn’t take into account proper handling of combining marks, direction indicators, and similarly relevant characters, so this method is mostly useful when you know the contents of the string is simple and the result isn’t intended for display.

center(width#to_int, paddingU::String, #to_str = ' ')U::String#

Returns the receiver padded as evenly as possible on both sides with padding to make it max(#length, width) wide, inheriting any taint and untrust from the receiver and also from padding if padding is used.

Raises
ArgumentError

If padding#width = 0

ArgumentError

If characters inside padding that should be used for round-off padding are too wide

See Also

ljust(width#to_int, paddingU::String, #to_str = ' ')U::String#

Returns the receiver padded on the right with padding to make it max(#length, width) wide, inheriting any taint and untrust from the receiver and also from padding if padding is used.

Raises
ArgumentError

If padding#width = 0

ArgumentError

If characters inside padding that should be used for round-off padding are too wide

See Also

rjust(width#to_int, paddingU::String, #to_str = ' ')U::String#

Returns the receiver padded on the left with padding to make it max(#length, width) wide, inheriting any taint and untrust from the receiver and also from padding if padding is used.

Raises
ArgumentError

If padding#width = 0

ArgumentError

If characters inside padding that should be used for round-off padding are too wide

See Also

count(setU::String, #to_str, *setsArray<U::String, #to_str>)Integer#

Returns the number of characters in the receiver that are included in the intersection of set and any additional sets of characters.

The complement of all Unicode characters and a given set of characters may be specified by prefixing a non-empty set with ‘^’ (U+005E CIRCUMFLEX ACCENT).

Any sequence of characters a-b inside a set will expand to also include all characters whose code points lay between those of a and b.

delete(setU::String, #to_str, *setsArray<U::String, #to_str>)U::String#

Returns the receiver, minus any characters that are included in the intersection of set and any additional sets of characters, inheriting any taint and untrust.

The complement of all Unicode characters and a given set of characters may be specified by prefixing a non-empty set with ‘^’ (U+005E CIRCUMFLEX ACCENT).

Any sequence of characters a-b inside a set will expand to also include all characters whose code points lay between those of a and b.

squeeze(*setsArray<U::String, #to_str>)U::String#

Returns the receiver, replacing any substrings of #length > 1 consisting of the same character c with c, where c is a member of the intersection of the character sets in sets, inheriting any taint and untrust.

If sets is empty, then the set of all Unicode characters is used.

The complement of all Unicode characters and a given set of characters may be specified by prefixing a non-empty set with ‘^’ (U+005E CIRCUMFLEX ACCENT).

Any sequence of characters a-b inside a set will expand to also include all characters whose code points lay between those of a and b.

tr(from#to_str, to#to_str)U::String#

Returns the receiver, translating characters in from to their equivalent character, by index, in to, inheriting any taint and untrust. If to#length < from#length, to[-1] will be used for any index i > to#length.

The complement of all Unicode characters and a given set of characters may be specified by prefixing a non-empty set with ‘^’ (U+005E CIRCUMFLEX ACCENT).

Any sequence of characters a-b inside a set will expand to also include all characters whose code points lay between those of a and b.

tr_s(from#to_str, to#to_str)U::String#

Returns the receiver, translating characters in from to their equivalent character, by index, in to and then squeezing any substrings of #length > 1 consisting of the same character c with c, inheriting any taint and untrust. If to#length < from#length, to[-1] will be used for any index i > to#length.

The complement of all Unicode characters and a given set of characters may be specified by prefixing a non-empty set with ‘^’ (U+005E CIRCUMFLEX ACCENT).

Any sequence of characters a-b inside a set will expand to also include all characters whose code points lay between those of a and b.

partition(separatorRegexp, #to_str)Array<U::String>#

Returns the receiver split into s₁ = #slice(0, i), s₂ = #slice(i, n), s₃ = #slice(i+n, -1), where i = j if j ≠ nil, i = #length otherwise, j = #index(separator), n = separator#length, where s₁ and s₃ inherit any taint and untrust from the receiver and s₂ inherits any taint and untrust from separator and also from the receiver if separator is a Regexp.

See Also

#rpartition

rpartition(separatorRegexp, #to_str)Array<U::String>#

Returns the receiver split into s₁ = #slice(0, i), s₂ = #slice(i, n), s₃ = #slice(i + n, -1), where i = j if j ≠ nil, i = 0 otherwise, j = #rindex(separator), n = separator#length, where s₁ and s₃ inherit any taint and untrust from the receiver and s₂ inherits any taint and untrust from separator and also from the receiver if separator is a Regexp.

See Also

#partition

scan(patternRegexp)Array<U::String>+#

Returns all matches – or sub-matches, if they exist – of matches of pattern in the receiver, each inheriting any taint and untrust from both the receiver and from pattern.

Note

The Regexp special variables $&, $', $`, $1, $2, …, $n are updated accordingly.

scan(pattern#to_str)Array<U::String>#

Returns all matches of pattern in the receiver, each inheriting any taint and untrust from the receiver.

scan(patternRegexp){ |submatchesArray<U::String>| … }self#

Enumerates the sub-matches of matches of pattern in the receiver, each inheriting any taint and untrust from both the receiver and from pattern.

Note

The Regexp special variables $&, $', $`, $1, $2, …, $n are updated accordingly.

scan(pattern#to_str){ |matchU::String| … }self#

Enumerates the matches of pattern in the receiver, each inheriting any taint and untrust from the receiver.

split(patternRegexp, #to_str = $;, limit#to_int = 0)Array<U::String>#

Returns the receiver split into limit substrings separated by pattern, each inheriting any taint and untrust.

If pattern = $; = nil or pattern = ' ', splits according to AWK rules, that is, any #space? prefix is skipped, then substrings are separated by non-empty #space? substrings.

If limit < 0, then no limit is imposed and trailing #empty? substrings aren’t removed.

If limit = 0, then no limit is imposed and trailing #empty? substrings are removed.

If limit = 1, then, if #length = 0, the result will be empty, otherwise it will consist of the receiver only.

If limit > 1, then the receiver is split into at most limit substrings.

gsub(patternRegexp, #to_str, replacement#to_str)U::String#

Returns the receiver with all matches of pattern replaced by replacement, inheriting any taint and untrust from the receiver and from replacement.

The replacement is used as a specification for what to replace matches with:

Specification Replacement
\1, \2, …, \n Numbered sub-match n
\k<name> Named sub-match name

The Regexp special variables $&, $', $`, $1, $2, …, $n are updated accordingly.

gsub(patternRegexp, #to_str, replacements#to_hash)U::String#

Returns the receiver with all matches of pattern replaced by replacements#[match], where match is the matched substring, inheriting any taint and untrust from the receiver and from the replacements#[match]es, as well as any taint on replacements.

The Regexp special variables $&, $', $`, $1, $2, …, $n are updated accordingly.

Raises
RuntimeError

If any replacement is the result being constructed

Exception

Any error raised by replacements#default, if it gets called

gsub(patternRegexp, #to_str){ |matchU::String|#to_str … }U::String#

Returns the receiver with all matches of pattern replaced by the results of the given block, inheriting any taint and untrust from the receiver and from the results of the given block.

The Regexp special variables $&, $', $`, $1, $2, …, $n are updated accordingly.

gsub(patternRegexp, #to_str)Enumerator#

Returns an Enumerator over the matches of pattern in the receiver.

The Regexp special variables $&, $', $`, $1, $2, …, $n will be updated accordingly.

sub(patternRegexp, #to_str, replacement#to_str)U::String?#

Returns the receiver with the first match of pattern replaced by replacement, inheriting any taint and untrust from the receiver and from replacement, or nil if there’s no match.

The replacement is used as a specification for what to replace matches with:

Specification Replacement
\1, \2, …, \n Numbered sub-match n
\k<name> Named sub-match name

The Regexp special variables $&, $', $`, $1, $2, …, $n are updated accordingly.

sub(patternRegexp, #to_str, replacements#to_hash)U::String?#

Returns the receiver with the first match of pattern replaced by replacements#[match], where match is the matched substring, inheriting any taint and untrust from the receiver, replacements, and replacements#[match], or nil if there’s no match.

The Regexp special variables $&, $', $`, $1, $2, …, $n are updated accordingly.

RaisesException

Any error raised by replacements#default, if it gets called

sub(patternRegexp, #to_str){ |matchU::String|#to_str … }U::String?#

Returns the receiver with all instances of pattern replaced by the results of the given block, inheriting any taint and untrust from the receiver and from the results of the given block, or nil if there’s no match.

The Regexp special variables $&, $', $`, $1, $2, …, $n are updated accordingly.

+(otherU::String, #to_str)U::String#

Returns the concatenation of other to the receiver, inheriting any taint on either.

RaisesArgumentError

If #bytesize + other#bytesize > LONG_MAX

*(n#to_int)U::String#

Returns the concatenation of n copies of the receiver, inheriting any taint and untrust.

Raises
ArgumentError

If n < 0

ArgumentError

If n > 0 and n × #bytesize > LONG_MAX

%(value)U::String#

Returns a formatted string of the values in Array(value) by treating the receiver as a format specification of this formatted string.

A format specification is a string consisting of sequences of normal characters that are copied verbatim and field specifiers. A field specifier consists of a %, followed by any optional flags, an optional width, an optional precision, and a directive:

%[flags][width][.[precision]]directive

Note that this means that a lone % at the end of the string is simply copied verbatim as it, by this definition, isn’t a field directive.

The directive determines how this field should be formatted. The flags, width, and precision modify this interpretation.

The field often takes a value from value and formats it according to a given set of rules, which depend on the flags, width, and precision, but can also output other, hardwired, values.

The directives that don’t take a value are

Directive Description
% Outputs ‘%’.
\n Outputs “%\n”.
\0 Outputs “%\0”.

None of these directives take any flags, width, or precision.

All of the following directives allow you to specify a width. The width only ever limits the minimum width of the field, that is, at least width cells will be filled by the field, but perhaps more will actually be required in the end.

c

Outputs

[left-padding]character[right-padding]

If a width w has been specified and the ‘-’ flag hasn’t been given, left-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.

Character is the result of #to_str#chr on the argument, if it responds to #to_str, otherwise it’s the result of #to_int turned into a string containing the character at that code point. A precision isn’t allowed. The #width of the character is used in any width calculations.

If a width w has been specified and the ‘-’ flag has been given, right-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.

s

Outputs

[left-padding]string[right-padding]

Left-padding and right-padding are the same as for the ‘c’ directive described above.

String is a substring of the result of #to_s on the argument that is w cells wide, where w = precision, if a precision has been specified, w = #width otherwise.

p

Outputs

[left-padding]inspect[right-padding]

Left-padding and right-padding are the same as for the ‘c’ directive described above.

String is a substring of the result of #inspect on the argument that is w cells wide, where w = precision, if a precision has been specified, w = #width otherwise.

d
i
u

Outputs

[left-padding][prefix/sign][zeroes]
   [precision-filler]digits[right-padding]

If a width w has been specified and neither the ‘-’ nor the ‘0’ flag has been given, left-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.

Prefix/sign is “-” if the argument is negative, “+” if the ‘+’ flag was given, and “ ” if the ‘ ’ flag was given, otherwise it’s empty.

If a width w has been specified and the ‘0’ flag has been given and neither the ‘-’ flag has been given nor a precision has been specified, zeroes consists of enough zeroes to make the whole field at least w cells wide, otherwise it’s empty.

If a precision p has been specified, precision-filler consists of enough zeroes to make for p digits of output, otherwise it’s empty.

Digits consists of the digits in base 10 that represent the result of calling Integer with the argument as its argument.

If a width w has been specified and the ‘-’ flag has been given, right-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.

Flag Description
(Space) Add a “ ” prefix to non-negative numbers
+ Add a “+” sign to non-negative numbers; overrides the ‘ ’ flag
0 Use ‘0’ for any width padding; ignored when a precision has been specified
- Left justify the output with ‘ ’ as padding; overrides the ‘0’ flag
o

Outputs

[left-padding][prefix/sign][zeroes/sevens]
   [precision-filler]octal-digits[right-padding]

If a width w has been specified and neither the ‘-’ nor the ‘0’ flag has been given, left-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.

Prefix/sign is “-” if the argument is negative and the ‘+’ or ‘ ’ flag was given, “..” if the argument is negative, “+” if the ‘+’ flag was given, and “ ” if the ‘ ’ flag was given, otherwise it’s empty.

If a width w has been specified and the ‘0’ flag has been given and neither the ‘-’ flag has been given nor a precision has been specified, zeroes/sevens consists of enough zeroes, if the argument is non-negative or if the ‘+’ or ‘ ’ flag has been specified, sevens otherwise, to make the whole field at least w cells wide, otherwise it’s empty.

If a precision p has been specified, precision-filler consists of enough zeroes, if the argument is non-negative or if the ‘+’ or ‘ ’ flag has been specified, sevens otherwise, to make for p digits of output, otherwise it’s empty.

Octal-digits consists of the digits in base 8 that represent the result of #to_int on the argument, using ‘0’ through ‘7’. A negative value will be output as a two’s complement value.

If a width w has been specified and the ‘-’ flag has been given, right-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.

Flag Description
(Space) Add a “ ” prefix to non-negative numbers and don’t output negative numbers as two’s complement values
+ Add a “+” sign to non-negative numbers and don’t output negative numbers as two’s complement values; overrides the ‘ ’ flag
0 Use ‘0’ for any width padding; ignored when a precision has been specified
- Left justify the output with ‘ ’ as padding; overrides the ‘0’ flag
# Increase precision to include as many digits as necessary to make the first digit ‘0’, but don’t include the ‘0’ itself
x

Outputs

[left-padding][sign][base-prefix][prefix][zeroes/fs]
   [precision-filler]hexadecimal-digits[right-padding]

Left-padding and right-padding are the same as for the ‘o’ directive described above. Zeroes/fs is the same as zeroes/sevens for the ‘o’ directive, except that it uses ‘f’ characters instead of sevens. The same goes for precision-filler.

Sign is “-” if the argument is negative and the ‘+’ or ‘ ’ flag was given, “+” if the argument is non-negative and the ‘+’ flag was given, and “ ” if the argument is non-negative and the ‘ ’ flag was given, otherwise it’s empty.

Base-prefix is “0x” if the ‘#’ flag was given and the result of #to_int on the argument is non-zero.

Prefix is “..” if the argument is negative and neither the ‘+’ nor the ‘ ’ flag was given.

Hexadecimal-digits consists of the digits in base 16 that represent the result of #to_int on the argument, using ‘0’ through ‘9’ and ‘a’ through ‘f’. A negative value will be output as a two’s complement value.

Flag Description
(Space) Same as for ‘o’
+ Same as for ‘o’
0 Same as for ‘o’
- Same as for ‘o’
# Prefix non-zero values with “0x”
X

Same as ‘x’, except that it uses uppercase letters instead.

b

Outputs

[left-padding][sign][base-prefix][prefix][zeroes/ones]
   [precision-filler]binary-digits[right-padding]

Left-padding and right-padding are the same as for the ‘o’ directive described above. Base-prefix and prefix are the same as for the ‘x’ directive, except that base-prefix outputs “0b”. Zeroes/ones is the same as zeroes/fs for the ‘x’ directive, except that it uses ones instead of sevens. The same goes for precision-filler.

Binary-digits consists of the digits in base 2 that represent the result of #to_int on the argument, using ‘0’ and ‘1’. A negative value will be output as a two’s complement value.

Flag Description
(Space) Same as for ‘o’
+ Same as for ‘o’
0 Same as for ‘o’
- Same as for ‘o’
# Prefix non-zero values with “0b”
B

Same as ‘b’, except that it uses a “0B” prefix for the ‘#’ flag.

f

Outputs

[left-padding][prefix/sign][zeroes]
   integer-part[decimal-point][fractional-part][right-padding]

If a width w has been specified and neither the ‘-’ nor the ‘0’ flag has been given, left-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.

Prefix/sign is “-” if the argument is negative, “+” if the ‘+’ flag was given, and “ ” if the ‘ ’ flag was given, otherwise it’s empty.

If a width w has been specified and the ‘0’ flag has been given and the ‘-’ flag has not been given, zeroes consists of enough zeroes to make the whole field at least w cells wide, otherwise it’s empty.

Integer-part consists of the digits in base 10 that represent the integer part of the result of calling Float with the argument as its argument.

Decimal-point is “.” if the precision isn’t 0 or if the ‘#’ flag has been given.

Fractional-part consists of p digits in base 10 that represent the fractional part of the result of calling Float with the argument as its argument, where p = precision, if one has been specified, p = 6 otherwise.

If a width w has been specified and the ‘-’ flag has been given, right-padding consists of enough spaces to make the whole field at least w cells wide, otherwise it’s empty.

Flag Description
(Space) Add a “ ” prefix to non-negative numbers
+ Add a “+” sign to non-negative numbers; overrides the ‘ ’ flag
0 Use ‘0’ for any width padding; ignored when a precision has been specified
- Left justify the output with ‘ ’ as padding; overrides the ‘0’ flag
# Output a decimal point, even if no fractional part follows
e

Outputs

[left-padding][prefix/sign][zeroes]
   digit[decimal-point][fractional-part]exponent[right-padding]

If a width w has been specified and neither the ‘-’ nor the ‘0’ flag has been given, left-padding consists of enough spaces to make the whole field at least w + e cells wide, where e ≥ 4 is the width of the exponent, otherwise it’s empty.

Prefix/sign is “-” if the argument is negative, “+” if the ‘+’ flag was given, and “ ” if the ‘ ’ flag was given, otherwise it’s empty.

If a width w has been specified and the ‘0’ flag has been given and the ‘-’ flag has not been given, zeroes consists of enough zeroes to make the whole field w + e cells wide, where e ≥ 4 is the width of the exponent, otherwise it’s empty.

Digit consists of one digit in base 10 that represent the most significant digit of the result of calling Float with the argument as its argument.

Decimal-point is “.” if the precision isn’t 0 or if the ‘#’ flag has been given.

Fractional-part consists of p digits in base 10 that represent all but the most significant digit of the result of calling Float with the argument as its argument, where p = precision, if one has been specified, p = 6 otherwise.

Exponent consists of “e” followed by the exponent in base 10 required to turn the result of calling Float with the argument as its argument into a decimal fraction with one non-zero digit in the integer part. If the exponent is 0, “+00” will be output.

If a width w has been specified and the ‘-’ flag has been given, right-padding consists of enough spaces to make the whole field at least w + e cells wide, where e ≥ 4 is the width of the exponent, otherwise it’s empty.

Flag Description
(Space) Add a “ ” prefix to non-negative numbers
+ Add a “+” sign to non-negative numbers; overrides the ‘ ’ flag
0 Use ‘0’ for any width padding; ignored when a precision has been specified
- Left justify the output with ‘ ’ as padding; overrides the ‘0’ flag
# Output a decimal point, even if no fractional part follows
E

Same as ‘e’, except that it uses an uppercase ‘E’ for the exponent separator.

g

Same as ‘e’ if the exponent is less than -4 or if the exponent is greater than or equal to the precision, otherwise ‘f’ is used. The precision defaults to 6 and a precision of 0 is treated as a precision of 1. Trailing zeros are removed from the fractional part of the result.

G

Same as ‘g’, except that it uses an uppercase ‘E’ for the exponent separator.

a

Outputs

[left-padding][prefix/sign][zeroes]
   digit[hexadecimal-point][fractional-part]exponent[right-padding]

If a width w has been specified and neither the ‘-’ nor the ‘0’ flag has been given, left-padding consists of enough spaces to make the whole field at least w + e cells wide, where e ≥ 3 is the width of the exponent, otherwise it’s empty.

Prefix/sign is “-” if the argument is negative, “+” if the ‘+’ flag was given, and “ ” if the ‘ ’ flag was given, otherwise it’s empty.

If a width w has been specified and the ‘0’ flag has been given and the ‘-’ flag has not been given, zeroes consists of enough zeroes to make the whole field w + e cells wide, where e ≥ 3 is the width of the exponent, otherwise it’s empty.

Digit consists of one digit in base 16 that represent the most significant digit of the result of calling Float with the argument as its argument, using ‘0’ through ‘9’ and ‘a’ through ‘f’.

Decimal-point is “.” if the precision isn’t 0 or if the ‘#’ flag has been given.

Fractional-part consists of p digits in base 16 that represent all but the most significant digit of the result of calling Float with the argument as its argument, where p = precision, if one has been specified, p = q, where q is the number of digits required to represent the number exactly, otherwise. Digits are output using ‘0’ through ‘9’ and ‘a’ through ‘f’.

Exponent consists of “p” followed by the exponent of 2 in base 10 required to turn the result of calling Float with the argument as its argument into a decimal fraction with one non-zero digit in the integer part. If the exponent is 0, “+0” will be output.

If a width w has been specified and the ‘-’ flag has been given, right-padding consists of enough spaces to make the whole field at least w + e cells wide, where e ≥ 3 is the width of the exponent, otherwise it’s empty.

Flag Description
(Space) Add a “ ” prefix to non-negative numbers
+ Add a “+” sign to non-negative numbers; overrides the ‘ ’ flag
0 Use ‘0’ for any width padding; ignored when a precision has been specified
- Left justify the output with ‘ ’ as padding; overrides the ‘0’ flag
# Output a decimal point, even if no fractional part follows
A

Same as ‘a’, except that it uses an uppercase letters instead.

A warning is issued if the ‘0’ flag is given when the ‘-’ flag has also been given to the ‘d’, ‘i’, ‘u’, ‘o’, ‘x’, ‘X’, ‘b’, or ‘B’ directives.

A warning is issued if the ‘0’ flag is given when a precision has been specified for the ‘d’, ‘i’, ‘u’, ‘o’, ‘x’, ‘X’, ‘b’, or ‘B’ directives.

A warning is issued if the ‘ ’ flag is given when the ‘+’ flag has also been given to the ‘d’, ‘i’, ‘u’, ‘o’, ‘x’, ‘X’, ‘b’, or ‘B’ directives.

A warning is issued if the ‘0’ flag is given when the ‘o’, ‘x’, ‘X’, ‘b’, or ‘B’ directives has been given a negative argument.

A warning is issued if the ‘#’ flag is given when the ‘o’ directive has been given a negative argument.

Any taint on the receiver and any taint on arguments to any ‘s’ and ‘p’ directives is inherited by the result.

Raises
ArgumentError

If the receiver isn’t a valid format specification

ArgumentError

If any flags are given to the ‘%’, ‘\n’, or ‘\0’ directives

ArgumentError

If an argument is given to the ‘%’, ‘\n’, or ‘\0’ directives

ArgumentError

If a width is specified for the ‘%’, ‘\n’, or ‘\0’ directives

ArgumentError

If a precision is specified for the ‘%’, ‘\n’, ‘\0’, or ‘c’ directives

ArgumentError

If any of the flags ‘ ’, ‘+’, ’0’, or ‘#’ are given to the ‘c’, ‘s’, or ‘p’ directives

ArgumentError

If the ‘#’ flag is given to the ‘d’, ‘i’, or ‘u’ directives

ArgumentError

If the argument to the ‘c’ directive doesn’t respond to #to_str or #to_int

format(value)U::String#

This is an alias for #%.

dumpU::String#

Returns the receiver in a reader-friendly format, inheriting any taint and untrust.

The reader-friendly format looks like “"…".u”. Inside the “…”, any #print? characters in the ASCII range are output as-is, the following special characters are escaped according to the following table:

Character Dumped Sequence
U+0022 QUOTATION MARK \"
U+005C REVERSE SOLIDUS \\
U+000A LINE FEED (LF) \n
U+000D CARRIAGE RETURN (CR) \r
U+0009 CHARACTER TABULATION \t
U+000C FORM FEED (FF) \f
U+000B LINE TABULATION \v
U+0008 BACKSPACE \b
U+0007 BELL \a
U+001B ESCAPE \e

the following special sequences are also escaped:

Character Dumped Sequence
#$ \#$
#@ \#@
#{ \#{

any valid UTF-8 byte sequences are output as “\u{n}”, where n is the lowercase hexadecimal representation of the code point encoded by the UTF-8 sequence, and any other byte is output as “\xn”, where n is the two-digit uppercase hexadecimal representation of the byte’s value.

inspectString#

Returns the receiver in a reader-friendly inspectable format, inheriting any taint and untrust, encoded using UTF-8.

The reader-friendly inspectable format looks like “"…".u”. Inside the “…”, any #print? characters are output as-is, the following special characters are escaped according to the following table:

Character Dumped Sequence
U+0022 QUOTATION MARK \"
U+005C REVERSE SOLIDUS \\
U+000A LINE FEED (LF) \n
U+000D CARRIAGE RETURN (CR) \r
U+0009 CHARACTER TABULATION \t
U+000C FORM FEED (FF) \f
U+000B LINE TABULATION \v
U+0008 BACKSPACE \b
U+0007 BELL \a
U+001B ESCAPE \e

the following special sequences are also escaped:

Character Dumped Sequence
#$ \#$
#@ \#@
#{ \#{

Valid UTF-8 byte sequences representing code points < 0x10000 are output as \un, where n is the four-digit uppercase hexadecimal representation of the code point.

Valid UTF-8 byte sequences representing code points ≥ 0x10000 are output as \u{n}, where n is the uppercase hexadecimal representation of the code point.

Any other byte is output as \xn, where n is the two-digit uppercase hexadecimal representation of the byte’s value.

hashFixnum#

Returns the hash value of the receiver’s content.

hexInteger#

Returns the result of #to_i(16).

octInteger#

Returns the result of #to_i(8), but with the added provision that any leading base specification in the receiver will override the suggested octal (8) base, that is, '0b11'.u#oct = 3, not 9.

to_i(base#to_int = 16)Integer#

Returns the Integer value that results from treating the receiver as a string of digits in base.

The conversion algorithm is

  1. Skip any leading #space?s

  2. Check for an optional sign, ‘+’ or ‘-’

  3. If base is 2, skip an optional “0b” or “0B” prefix

  4. If base is 8, skip an optional “0o” or “0o” prefix

  5. If base is 10, skip an optional “0d” or “0D” prefix

  6. If base is 16, skip an optional “0x” or “0X” prefix

  7. Skip any ‘0’s

  8. Read an as long sequence of digits in base separated by optional U+005F LOW LINE characters, using letters in the following ranges of characters for digits or the characters digit value, if any

    • U+0041 LATIN CAPITAL LETTER A through U+005A LATIN CAPITAL LETTER Z

    • U+0061 LATIN SMALL LETTER A through U+007A LATIN SMALL LETTER Z

    • U+FF21 FULLWIDTH LATIN CAPITAL LETTER A through U+FF3A FULLWIDTH LATIN CAPITAL LETTER Z

    • U+FF41 FULLWIDTH LATIN SMALL LETTER A through U+FF5A FULLWIDTH LATIN SMALL LETTER Z

    Note that only one separator is allowed in a row.

RaisesArgumentError

Unless 2 ≤ base ≤ 36

to_strString#

Returns the String representation of the receiver, inheriting any taint and untrust, encoded as UTF-8.

to_sString#

This is an alias for #to_str.

bString#

Returns the String representation of the receiver, inheriting any taint and untrust, encoded as ASCII-8BIT.

to_symSymbol#

Returns the Symbol representation of the receiver.

Raises
EncodingError

If the receiver contains an invalid UTF-8 sequence

RuntimeError

If there’s no more room for a new Symbol in Ruby’s Symbol table

internSymbol#

This is an alias for #to_sym.