Regular expressions (regex) are powerful tools for pattern matching and text manipulation. In Ruby, they’re implemented through the Regexp class. Let’s start with the basics and gradually build up to more complex patterns.
1. Basic Matching
Literal Characters
The simplest regex matches exact text:
"hello".match(/hello/) #=> #<MatchData "hello">
Special Characters
Some characters have special meaning and need escaping with \:
# Matching a literal dot
"file.txt".match(/file\.txt/) #=> #<MatchData "file.txt">
2. Character Classes
Simple Character Sets
Match any one character from a set:
# Match either 'a', 'b', or 'c'
"bat".match(/[abc]/) #=> #<MatchData "b">
Ranges
Match any character in a range:
# Match any lowercase letter
"hello".match(/[a-z]/) #=> #<MatchData "h">
# Match any digit
"Room 101".match(/[0-9]/) #=> #<MatchData "1">
Negated Character Sets
Match any character NOT in the set:
# Match any character that's not a vowel
"hello".match(/[^aeiou]/) #=> #<MatchData "h">
3. Shorthand Character Classes
Ruby provides shortcuts for common character classes:
\d # Any digit (0-9)
\D # Any non-digit
\w # Word character (letter, digit, underscore)
\W # Non-word character
\s # Whitespace (space, tab, newline)
\S # Non-whitespace
Examples:
"Price: $100".match(/\d+/) #=> #<MatchData "100">
"hello_world".match(/\w+/) #=> #<MatchData "hello_world">
4. Quantifiers
Control how many times a pattern should match:
? # 0 or 1 times
* # 0 or more times
+ # 1 or more times
{n} # Exactly n times
{n,} # n or more times
{n,m} # Between n and m times
Examples:
# Match between 3 and 5 digits
"12345".match(/\d{3,5}/) #=> #<MatchData "12345">
# Match 'color' or 'colour'
"colour".match(/colou?r/) #=> #<MatchData "colour">
5. Anchors
Match positions rather than characters:
^ # Start of line
$ # End of line
\A # Start of string
\Z # End of string
\b # Word boundary
Examples:
# Check if string starts with 'Hello'
"Hello world".match(/^Hello/) #=> #<MatchData "Hello">
# Check if string ends with 'world'
"Hello world".match(/world$/) #=> #<MatchData "world">
6. Grouping and Capturing
Parentheses create groups and capture matches:
# Capture date components
match = "2023-05-18".match(/(\d{4})-(\d{2})-(\d{2})/)
match[1] #=> "2023" (year)
match[2] #=> "05" (month)
match[3] #=> "18" (day)
7. Alternation
The pipe | acts like an OR operator:
# Match 'cat' or 'dog'
"dog".match(/cat|dog/) #=> #<MatchData "dog">
8. Modifiers
Change how the regex works:
i # Case insensitive
m # Multiline mode (dot matches newline)
x # Ignore whitespace (for readability)
Examples:
# Case insensitive match
"HELLO".match(/hello/i) #=> #<MatchData "HELLO">
9. Lookarounds
Assert that a pattern is or isn’t ahead/behind:
(?=pattern) # Positive lookahead
(?!pattern) # Negative lookahead
(?<=pattern) # Positive lookbehind
(?<!pattern) # Negative lookbehind
Example:
# Match 'q' not followed by 'u'
"qat".match(/q(?!u)/) #=> #<MatchData "q">
10. Ruby-Specific Features
Named Captures
match = "2023-05-18".match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/)
match[:year] #=> "2023"
%r Notation
Alternative syntax for regex literals:
%r{http://example\.com} # Same as /http:\/\/example\.com/
String Methods Using Regex
Ruby strings have many regex methods:
"hello".gsub(/[aeiou]/, '*') #=> "h*ll*"
"a,b,c".split(/,/) #=> ["a", "b", "c"]
"hello".scan(/./) #=> ["h", "e", "l", "l", "o"]
Practical Examples
- Email Validation:
email_regex = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i
"test@example.com".match?(email_regex) #=> true
- Extracting Phone Numbers:
text = "Call me at 555-1234 or (555) 987-6543"
text.scan(/(\(\d{3}\) \d{3}-\d{4}|\d{3}-\d{4})/) #=> ["555-1234", "(555) 987-6543"]
- HTML Tag Extraction:
html = "<p>Hello</p><div>World</div>"
html.scan(/<(\w+)>(.*?)<\/\1>/) #=> [["p", "Hello"], ["div", "World"]]
Tips for Effective Regex in Ruby
- Use
Regexp.escapewhen matching literal strings:
Returns a new string that escapes any characters that have special meaning in a regular expression:
s = Regexp.escape('\*?{}.') # => "\\\\\\*\\?\\{\\}\\."
Regexp.escape("file.txt") #=> "file\\.txt"
- For complex patterns, use the
xmodifier for readability:
regex = /
\A # Start of string
[\w+\-.]+ # Local part
@ # @ symbol
[a-z\d\-]+ # Domain
(\.[a-z]+)* # Subdomains
\.[a-z]+\z # TLD
/xi
- Consider using Rubular (https://rubular.com/) for testing your Ruby regular expressions.
Regular expressions can become complex, but starting with these fundamentals will give you a solid foundation for text processing in Ruby.
Happy Ruby Coding! ๐