Useful Ruby 💎 Methods: A Short Guide – Scan, Inject With Performance Analysis

#scan Method

Finds all occurrences of a pattern in a string and returns them as an array.

The scan method in Ruby is a powerful string method that allows you to find all occurrences of a pattern in a string. It returns an array of matches, making it extremely useful for text processing and data extraction tasks.

Basic Syntax

string.scan(pattern) → array
string.scan(pattern) { |match| block } → string

Examples

Simple Word matching:

text = "hello world hello ruby"
matches = text.scan(/hello/)
puts matches.inspect
# Output: ["hello", "hello"]

Matching Multiple Patterns

text = "The quick brown fox jumps over the lazy dog"
matches = text.scan(/\b\w{3}\b/)  # Find all 3-letter words
puts matches.inspect
# Output: ["The", "fox", "the", "dog"]

1. Find All Matches

"hello world".scan(/\w+/) # => ["hello", "world"]  

2. Extract Numbers

"Age: 25, Price: $50".scan(/\d+/) # => ["25", "50"]  

3. Matching All Characters

"hello".scan(/./) { |c| puts c }
# Output:
# h
# e
# l
# l
# o

3. Capture Groups (Returns Arrays)

"Name: Alice, Age: 30".scan(/(\w+): (\w+)/)  
# => [["Name", "Alice"], ["Age", "30"]]  

When you use parentheses in your regex, scan returns arrays of captures:

text = "John: 30, Jane: 25, Alex: 40"
matches = text.scan(/(\w+): (\d+)/)
puts matches.inspect
# Output: [["John", "30"], ["Jane", "25"], ["Alex", "40"]]

4. Iterate with a Block

"a1 b2 c3".scan(/(\w)(\d)/) { |letter, num| puts "#{letter} -> #{num}" }  
# Output:  
# a -> 1  
# b -> 2  
# c -> 3  
text = "Prices: $10, $20, $30"
total = 0
text.scan(/\$(\d+)/) { |match| total += match[0].to_i }
puts total
# Output: 60

5. Case-Insensitive Search

"Ruby is COOL!".scan(/cool/i) # => ["COOL"]  

6. Extract Email Addresses

"Email me at test@mail.com".scan(/\S+@\S+/) # => ["test@mail.com"]  
text = "Contact us at support@example.com or sales@company.org"
emails = text.scan(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/)
puts emails.inspect
# Output: ["support@example.com", "sales@company.org"]

Performance Characteristics: Ruby’s #scan Method

The #scan method is generally efficient for most common string processing tasks, but its performance depends on several factors:

  1. String length – Larger strings take longer to process
  2. Pattern complexity – Simple patterns are faster than complex regex
  3. Number of matches – More matches mean more memory allocation

Performance Considerations

1. Time Complexity 🧮

  • Best case: O(n) where n is string length
  • Worst case: O(n*m) for complex regex patterns (with backtracking)

2. Memory Usage 🧠

  • Creates an array with all matches
  • Each match is a new string object (memory intensive for large results)

Benchmark 📈 Examples

require 'benchmark'

large_text = "Lorem ipsum " * 10_000

# Simple word matching
Benchmark.bm do |x|
  x.report("simple scan:") { large_text.scan(/\w+/) }
  x.report("complex scan:") { large_text.scan(/(?:^|\s)(\w+)(?=\s|$)/) }
end

Typical results:

              user     system      total        real
simple scan:  0.020000   0.000000   0.020000 (  0.018123)
complex scan: 0.050000   0.010000   0.060000 (  0.054678)

Optimization Tips 💡

  1. Use simpler patterns when possible:
   # Slower
   text.scan(/(?:^|\s)(\w+)(?=\s|$)/)

   # Faster equivalent
   text.scan(/\b\w+\b/)
  1. Avoid capture groups if you don’t need them:
   # Slower (creates match groups)
   text.scan(/(\w+)/)

   # Faster
   text.scan(/\w+/)
  1. Use blocks to avoid large arrays:
   # Stores all matches in memory
   matches = text.scan(pattern)

   # Processes matches without storing
   text.scan(pattern) { |m| process(m) }
  1. Consider alternatives for very large strings:
   # For simple splits, String#split might be faster
   words = text.split

   # For streaming processing, use StringIO

When to Be Cautious ⚠️

  • Processing multi-megabyte strings
  • Using highly complex regular expressions
  • When you only need the first few matches (consider #match instead)

The #scan method is optimized for most common cases, but for performance-critical applications with large inputs, consider benchmarking alternatives.


#inject Method (aka #reduce)

Enumerable#inject takes two arguments: a base case and a block.

Each item of the Enumerable is passed to the block, and the result of the block is fed into the block again and iterate next item.

In a way the inject function injects the function between the elements of the enumerable. inject is aliased as reduce. You use it when you want to reduce a collection to a single value.

For example:

    product = [ 2, 3, 4 ].inject(1) do |result, next_value|
      result * next_value
    end
    product #=> 24

Purpose

  • Accumulates values by applying an operation to each element in a collection
  • Can produce a single aggregated result or a compound value

Basic Syntax

collection.inject(initial) { |memo, element| block } → object
collection.inject { |memo, element| block } → object
collection.inject(symbol) → object
collection.inject(initial, symbol) → object

Key Features

  1. Takes an optional initial value
  2. The block receives the memo (accumulator) and current element
  3. Returns the final value of the memo

👉 Examples

1. Summing Numbers

[1, 2, 3].inject(0) { |sum, n| sum + n } # => 6

2. Finding Maximum Value

[4, 2, 7, 1].inject { |max, n| max > n ? max : n } # => 7

3. Building a Hash

[:a, :b, :c].inject({}) { |h, k| h[k] = k.to_s; h }
# => {:a=>"a", :b=>""b"", :c=>"c"}

4. Symbol Shorthand (Ruby 1.9+)

[1, 2, 3].inject(:+) # => 6 (same as sum)
[1, 2, 3].inject(2, :*) # => 12 (2 * 1 * 2 * 3)

5. String Concatenation

%w[r u b y].inject("") { |s, c| s + c.upcase } # => "RUBY"

6. Counting Occurrences

words = %w[apple banana apple cherry apple]
words.inject(Hash.new(0)) { |h, w| h[w] += 1; h }
# => {"apple"=>3, "banana"=>1, "cherry"=>1}

Performance Characteristics: Ruby’s #inject Method

  1. Time Complexity: O(n) – processes each element exactly once
  2. Memory Usage:
  • Generally creates only one accumulator object
  • Avoids intermediate arrays (unlike chained map + reduce)

Benchmark 📈 Examples

require 'benchmark'

large_array = (1..1_000_000).to_a

Benchmark.bm do |x|
  x.report("inject:") { large_array.inject(0, :+) }
  x.report("each + var:") do
    sum = 0
    large_array.each { |n| sum += n }
    sum
  end
end

Typical results show inject is slightly slower than explicit iteration but more concise:

              user     system      total        real
inject:      0.040000   0.000000   0.040000 (  0.042317)
each + var:  0.030000   0.000000   0.030000 (  0.037894)

Optimization Tips 💡

  1. Use symbol shorthand when possible (faster than blocks):
   # Faster
   array.inject(:+)

   # Slower
   array.inject { |sum, n| sum + n }
  1. Preallocate mutable objects when building structures:
   # Good for hashes
   items.inject({}) { |h, (k,v)| h[k] = v; h }

   # Better for arrays
   items.inject([]) { |a, e| a << e.transform; a }
  1. Avoid unnecessary object creation in blocks:
   # Bad - creates new string each time
   strings.inject("") { |s, x| s + x.upcase }

   # Good - mutates original string
   strings.inject("") { |s, x| s << x.upcase }
  1. Consider alternatives for simple cases:
   # For simple sums
   array.sum # (Ruby 2.4+) is faster than inject(:+)

   # For concatenation
   array.join is faster than inject(:+)

When to Be Cautious ⚠️

  • With extremely large collections where memory matters
  • When the block operations are very simple (explicit loop may be faster)
  • When building complex nested structures (consider each_with_object)

The inject method provides excellent readability with generally good performance for most use cases.