#scan Method
Finds all occurrences of a pattern in a string and returns them as an array.
The scan method in Ruby is a powerful string method that allows you to find all occurrences of a pattern in a string. It returns an array of matches, making it extremely useful for text processing and data extraction tasks.
Basic Syntax
string.scan(pattern) → array
string.scan(pattern) { |match| block } → string
Examples
Simple Word matching:
text = "hello world hello ruby"
matches = text.scan(/hello/)
puts matches.inspect
# Output: ["hello", "hello"]
Matching Multiple Patterns
text = "The quick brown fox jumps over the lazy dog"
matches = text.scan(/\b\w{3}\b/) # Find all 3-letter words
puts matches.inspect
# Output: ["The", "fox", "the", "dog"]
1. Find All Matches
"hello world".scan(/\w+/) # => ["hello", "world"]
2. Extract Numbers
"Age: 25, Price: $50".scan(/\d+/) # => ["25", "50"]
3. Matching All Characters
"hello".scan(/./) { |c| puts c }
# Output:
# h
# e
# l
# l
# o
3. Capture Groups (Returns Arrays)
"Name: Alice, Age: 30".scan(/(\w+): (\w+)/)
# => [["Name", "Alice"], ["Age", "30"]]
When you use parentheses in your regex, scan returns arrays of captures:
text = "John: 30, Jane: 25, Alex: 40"
matches = text.scan(/(\w+): (\d+)/)
puts matches.inspect
# Output: [["John", "30"], ["Jane", "25"], ["Alex", "40"]]
4. Iterate with a Block
"a1 b2 c3".scan(/(\w)(\d)/) { |letter, num| puts "#{letter} -> #{num}" }
# Output:
# a -> 1
# b -> 2
# c -> 3
text = "Prices: $10, $20, $30"
total = 0
text.scan(/\$(\d+)/) { |match| total += match[0].to_i }
puts total
# Output: 60
5. Case-Insensitive Search
"Ruby is COOL!".scan(/cool/i) # => ["COOL"]
6. Extract Email Addresses
"Email me at test@mail.com".scan(/\S+@\S+/) # => ["test@mail.com"]
text = "Contact us at support@example.com or sales@company.org"
emails = text.scan(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/)
puts emails.inspect
# Output: ["support@example.com", "sales@company.org"]
Performance Characteristics: Ruby’s #scan Method
The #scan method is generally efficient for most common string processing tasks, but its performance depends on several factors:
- String length – Larger strings take longer to process
- Pattern complexity – Simple patterns are faster than complex regex
- Number of matches – More matches mean more memory allocation
Performance Considerations
1. Time Complexity 🧮
- Best case: O(n) where n is string length
- Worst case: O(n*m) for complex regex patterns (with backtracking)
2. Memory Usage 🧠
- Creates an array with all matches
- Each match is a new string object (memory intensive for large results)
Benchmark 📈 Examples
require 'benchmark'
large_text = "Lorem ipsum " * 10_000
# Simple word matching
Benchmark.bm do |x|
x.report("simple scan:") { large_text.scan(/\w+/) }
x.report("complex scan:") { large_text.scan(/(?:^|\s)(\w+)(?=\s|$)/) }
end
Typical results:
user system total real
simple scan: 0.020000 0.000000 0.020000 ( 0.018123)
complex scan: 0.050000 0.010000 0.060000 ( 0.054678)
Optimization Tips 💡
- Use simpler patterns when possible:
# Slower
text.scan(/(?:^|\s)(\w+)(?=\s|$)/)
# Faster equivalent
text.scan(/\b\w+\b/)
- Avoid capture groups if you don’t need them:
# Slower (creates match groups)
text.scan(/(\w+)/)
# Faster
text.scan(/\w+/)
- Use blocks to avoid large arrays:
# Stores all matches in memory
matches = text.scan(pattern)
# Processes matches without storing
text.scan(pattern) { |m| process(m) }
- Consider alternatives for very large strings:
# For simple splits, String#split might be faster
words = text.split
# For streaming processing, use StringIO
When to Be Cautious ⚠️
- Processing multi-megabyte strings
- Using highly complex regular expressions
- When you only need the first few matches (consider
#matchinstead)
The #scan method is optimized for most common cases, but for performance-critical applications with large inputs, consider benchmarking alternatives.
#inject Method (aka #reduce)
inject
Enumerable#injecttakes two arguments: a base case and a block.Each item of the
Enumerableis passed to the block, and the result of the block is fed into the block again and iterate next item.
In a way the inject function injects the function between the elements of the enumerable. inject is aliased as reduce. You use it when you want to reduce a collection to a single value.
For example:
product = [ 2, 3, 4 ].inject(1) do |result, next_value|
result * next_value
end
product #=> 24
Purpose
- Accumulates values by applying an operation to each element in a collection
- Can produce a single aggregated result or a compound value
Basic Syntax
collection.inject(initial) { |memo, element| block } → object
collection.inject { |memo, element| block } → object
collection.inject(symbol) → object
collection.inject(initial, symbol) → object
Key Features
- Takes an optional initial value
- The block receives the memo (accumulator) and current element
- Returns the final value of the memo
👉 Examples
1. Summing Numbers
[1, 2, 3].inject(0) { |sum, n| sum + n } # => 6
2. Finding Maximum Value
[4, 2, 7, 1].inject { |max, n| max > n ? max : n } # => 7
3. Building a Hash
[:a, :b, :c].inject({}) { |h, k| h[k] = k.to_s; h }
# => {:a=>"a", :b=>""b"", :c=>"c"}
4. Symbol Shorthand (Ruby 1.9+)
[1, 2, 3].inject(:+) # => 6 (same as sum)
[1, 2, 3].inject(2, :*) # => 12 (2 * 1 * 2 * 3)
5. String Concatenation
%w[r u b y].inject("") { |s, c| s + c.upcase } # => "RUBY"
6. Counting Occurrences
words = %w[apple banana apple cherry apple]
words.inject(Hash.new(0)) { |h, w| h[w] += 1; h }
# => {"apple"=>3, "banana"=>1, "cherry"=>1}
Performance Characteristics: Ruby’s #inject Method
inject- Time Complexity: O(n) – processes each element exactly once
- Memory Usage:
- Generally creates only one accumulator object
- Avoids intermediate arrays (unlike chained
map+reduce)
Benchmark 📈 Examples
require 'benchmark'
large_array = (1..1_000_000).to_a
Benchmark.bm do |x|
x.report("inject:") { large_array.inject(0, :+) }
x.report("each + var:") do
sum = 0
large_array.each { |n| sum += n }
sum
end
end
Typical results show inject is slightly slower than explicit iteration but more concise:
user system total real
inject: 0.040000 0.000000 0.040000 ( 0.042317)
each + var: 0.030000 0.000000 0.030000 ( 0.037894)
Optimization Tips 💡
- Use symbol shorthand when possible (faster than blocks):
# Faster
array.inject(:+)
# Slower
array.inject { |sum, n| sum + n }
- Preallocate mutable objects when building structures:
# Good for hashes
items.inject({}) { |h, (k,v)| h[k] = v; h }
# Better for arrays
items.inject([]) { |a, e| a << e.transform; a }
- Avoid unnecessary object creation in blocks:
# Bad - creates new string each time
strings.inject("") { |s, x| s + x.upcase }
# Good - mutates original string
strings.inject("") { |s, x| s << x.upcase }
- Consider alternatives for simple cases:
# For simple sums
array.sum # (Ruby 2.4+) is faster than inject(:+)
# For concatenation
array.join is faster than inject(:+)
When to Be Cautious ⚠️
- With extremely large collections where memory matters
- When the block operations are very simple (explicit loop may be faster)
- When building complex nested structures (consider
each_with_object)
The inject method provides excellent readability with generally good performance for most use cases.