Useful Rubyย ๐Ÿ’Ž Methods: A Short Guide – Scan, Inject With Performance Analysis

#scan Method

Finds all occurrences of a pattern in a string and returns them as an array.

The scan method in Ruby is a powerful string method that allows you to find all occurrences of a pattern in a string. It returns an array of matches, making it extremely useful for text processing and data extraction tasks.

Basic Syntax

string.scan(pattern) โ†’ array
string.scan(pattern) { |match| block } โ†’ string

Examples

Simple Word matching:

text = "hello world hello ruby"
matches = text.scan(/hello/)
puts matches.inspect
# Output: ["hello", "hello"]

Matching Multiple Patterns

text = "The quick brown fox jumps over the lazy dog"
matches = text.scan(/\b\w{3}\b/)  # Find all 3-letter words
puts matches.inspect
# Output: ["The", "fox", "the", "dog"]

1. Find All Matches

"hello world".scan(/\w+/) # => ["hello", "world"]  

2. Extract Numbers

"Age: 25, Price: $50".scan(/\d+/) # => ["25", "50"]  

3. Matching All Characters

"hello".scan(/./) { |c| puts c }
# Output:
# h
# e
# l
# l
# o

3. Capture Groups (Returns Arrays)

"Name: Alice, Age: 30".scan(/(\w+): (\w+)/)  
# => [["Name", "Alice"], ["Age", "30"]]  

When you use parentheses in your regex, scan returns arrays of captures:

text = "John: 30, Jane: 25, Alex: 40"
matches = text.scan(/(\w+): (\d+)/)
puts matches.inspect
# Output: [["John", "30"], ["Jane", "25"], ["Alex", "40"]]

4. Iterate with a Block

"a1 b2 c3".scan(/(\w)(\d)/) { |letter, num| puts "#{letter} -> #{num}" }  
# Output:  
# a -> 1  
# b -> 2  
# c -> 3  
text = "Prices: $10, $20, $30"
total = 0
text.scan(/\$(\d+)/) { |match| total += match[0].to_i }
puts total
# Output: 60

5. Case-Insensitive Search

"Ruby is COOL!".scan(/cool/i) # => ["COOL"]  

6. Extract Email Addresses

"Email me at test@mail.com".scan(/\S+@\S+/) # => ["test@mail.com"]  
text = "Contact us at support@example.com or sales@company.org"
emails = text.scan(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/)
puts emails.inspect
# Output: ["support@example.com", "sales@company.org"]

Performance Characteristics: Ruby’s #scan Method

The #scan method is generally efficient for most common string processing tasks, but its performance depends on several factors:

  1. String length – Larger strings take longer to process
  2. Pattern complexity – Simple patterns are faster than complex regex
  3. Number of matches – More matches mean more memory allocation

Performance Considerations

1. Time Complexity ๐Ÿงฎ

  • Best case: O(n) where n is string length
  • Worst case: O(n*m) for complex regex patterns (with backtracking)

2. Memory Usage ๐Ÿง 

  • Creates an array with all matches
  • Each match is a new string object (memory intensive for large results)

Benchmark ๐Ÿ“ˆ Examples

require 'benchmark'

large_text = "Lorem ipsum " * 10_000

# Simple word matching
Benchmark.bm do |x|
  x.report("simple scan:") { large_text.scan(/\w+/) }
  x.report("complex scan:") { large_text.scan(/(?:^|\s)(\w+)(?=\s|$)/) }
end

Typical results:

              user     system      total        real
simple scan:  0.020000   0.000000   0.020000 (  0.018123)
complex scan: 0.050000   0.010000   0.060000 (  0.054678)

Optimization Tips ๐Ÿ’ก

  1. Use simpler patterns when possible:
   # Slower
   text.scan(/(?:^|\s)(\w+)(?=\s|$)/)

   # Faster equivalent
   text.scan(/\b\w+\b/)
  1. Avoid capture groups if you don’t need them:
   # Slower (creates match groups)
   text.scan(/(\w+)/)

   # Faster
   text.scan(/\w+/)
  1. Use blocks to avoid large arrays:
   # Stores all matches in memory
   matches = text.scan(pattern)

   # Processes matches without storing
   text.scan(pattern) { |m| process(m) }
  1. Consider alternatives for very large strings:
   # For simple splits, String#split might be faster
   words = text.split

   # For streaming processing, use StringIO

When to Be Cautious โš ๏ธ

  • Processing multi-megabyte strings
  • Using highly complex regular expressions
  • When you only need the first few matches (consider #match instead)

The #scan method is optimized for most common cases, but for performance-critical applications with large inputs, consider benchmarking alternatives.


#injectย Method (aka #reduce)

Enumerable#injectย takes two arguments: a base case and a block.

Each item of theย Enumerableย is passed to the block, and the result of the block is fed into the block again and iterate next item.

In a way the inject function injects the function between the elements of the enumerable. inject is aliased as reduce. You use it when you want to reduce a collection to a single value.

For example:

    product = [ 2, 3, 4 ].inject(1) do |result, next_value|
      result * next_value
    end
    product #=> 24

Purpose

  • Accumulates values by applying an operation to each element in a collection
  • Can produce a single aggregated result or a compound value

Basic Syntax

collection.inject(initial) { |memo, element| block } โ†’ object
collection.inject { |memo, element| block } โ†’ object
collection.inject(symbol) โ†’ object
collection.inject(initial, symbol) โ†’ object

Key Features

  1. Takes an optional initial value
  2. The block receives the memo (accumulator) and current element
  3. Returns the final value of the memo

๐Ÿ‘‰ Examples

1. Summing Numbers

[1, 2, 3].inject(0) { |sum, n| sum + n } # => 6

2. Finding Maximum Value

[4, 2, 7, 1].inject { |max, n| max > n ? max : n } # => 7

3. Building a Hash

[:a, :b, :c].inject({}) { |h, k| h[k] = k.to_s; h }
# => {:a=>"a", :b=>""b"", :c=>"c"}

4. Symbol Shorthand (Ruby 1.9+)

[1, 2, 3].inject(:+) # => 6 (same as sum)
[1, 2, 3].inject(2, :*) # => 12 (2 * 1 * 2 * 3)

5. String Concatenation

%w[r u b y].inject("") { |s, c| s + c.upcase } # => "RUBY"

6. Counting Occurrences

words = %w[apple banana apple cherry apple]
words.inject(Hash.new(0)) { |h, w| h[w] += 1; h }
# => {"apple"=>3, "banana"=>1, "cherry"=>1}

Performance Characteristics: Ruby’s #inject Method

  1. Time Complexity: O(n) – processes each element exactly once
  2. Memory Usage:
  • Generally creates only one accumulator object
  • Avoids intermediate arrays (unlike chained map + reduce)

Benchmark ๐Ÿ“ˆ Examples

require 'benchmark'

large_array = (1..1_000_000).to_a

Benchmark.bm do |x|
  x.report("inject:") { large_array.inject(0, :+) }
  x.report("each + var:") do
    sum = 0
    large_array.each { |n| sum += n }
    sum
  end
end

Typical results show inject is slightly slower than explicit iteration but more concise:

              user     system      total        real
inject:      0.040000   0.000000   0.040000 (  0.042317)
each + var:  0.030000   0.000000   0.030000 (  0.037894)

Optimization Tips ๐Ÿ’ก

  1. Use symbol shorthand when possible (faster than blocks):
   # Faster
   array.inject(:+)

   # Slower
   array.inject { |sum, n| sum + n }
  1. Preallocate mutable objects when building structures:
   # Good for hashes
   items.inject({}) { |h, (k,v)| h[k] = v; h }

   # Better for arrays
   items.inject([]) { |a, e| a << e.transform; a }
  1. Avoid unnecessary object creation in blocks:
   # Bad - creates new string each time
   strings.inject("") { |s, x| s + x.upcase }

   # Good - mutates original string
   strings.inject("") { |s, x| s << x.upcase }
  1. Consider alternatives for simple cases:
   # For simple sums
   array.sum # (Ruby 2.4+) is faster than inject(:+)

   # For concatenation
   array.join is faster than inject(:+)

When to Be Cautious โš ๏ธ

  • With extremely large collections where memory matters
  • When the block operations are very simple (explicit loop may be faster)
  • When building complex nested structures (consider each_with_object)

The inject method provides excellent readability with generally good performance for most use cases.