Regular Expression – The Rails Drop

Regular expressions (regex) are powerful tools for pattern matching and text manipulation. In Ruby, they’re implemented through the Regexp class. Let’s start with the basics and gradually build up to more complex patterns.

1. Basic Matching

Literal Characters

The simplest regex matches exact text:

"hello".match(/hello/)  #=> #<MatchData "hello">

Special Characters

Some characters have special meaning and need escaping with \:

# Matching a literal dot
"file.txt".match(/file\.txt/)  #=> #<MatchData "file.txt">

2. Character Classes

Simple Character Sets

Match any one character from a set:

# Match either 'a', 'b', or 'c'
"bat".match(/[abc]/)  #=> #<MatchData "b">

Ranges

Match any character in a range:

# Match any lowercase letter
"hello".match(/[a-z]/)  #=> #<MatchData "h">

# Match any digit
"Room 101".match(/[0-9]/)  #=> #<MatchData "1">

Negated Character Sets

Match any character NOT in the set:

# Match any character that's not a vowel
"hello".match(/[^aeiou]/)  #=> #<MatchData "h">

3. Shorthand Character Classes

Ruby provides shortcuts for common character classes:

\d  # Any digit (0-9)
\D  # Any non-digit
\w  # Word character (letter, digit, underscore)
\W  # Non-word character
\s  # Whitespace (space, tab, newline)
\S  # Non-whitespace

Examples:

"Price: $100".match(/\d+/)  #=> #<MatchData "100">
"hello_world".match(/\w+/)  #=> #<MatchData "hello_world">

4. Quantifiers

Control how many times a pattern should match:

?     # 0 or 1 times
*     # 0 or more times
+     # 1 or more times
{n}   # Exactly n times
{n,}  # n or more times
{n,m} # Between n and m times

Examples:

# Match between 3 and 5 digits
"12345".match(/\d{3,5}/)  #=> #<MatchData "12345">

# Match 'color' or 'colour'
"colour".match(/colou?r/)  #=> #<MatchData "colour">

5. Anchors

Match positions rather than characters:

^  # Start of line
$  # End of line
\A # Start of string
\Z # End of string
\b # Word boundary

Examples:

# Check if string starts with 'Hello'
"Hello world".match(/^Hello/)  #=> #<MatchData "Hello">

# Check if string ends with 'world'
"Hello world".match(/world$/)  #=> #<MatchData "world">

6. Grouping and Capturing

Parentheses create groups and capture matches:

# Capture date components
match = "2023-05-18".match(/(\d{4})-(\d{2})-(\d{2})/)
match[1]  #=> "2023" (year)
match[2]  #=> "05"   (month)
match[3]  #=> "18"   (day)

7. Alternation

The pipe | acts like an OR operator:

# Match 'cat' or 'dog'
"dog".match(/cat|dog/)  #=> #<MatchData "dog">

8. Modifiers

Change how the regex works:

i  # Case insensitive
m  # Multiline mode (dot matches newline)
x  # Ignore whitespace (for readability)

Examples:

# Case insensitive match
"HELLO".match(/hello/i)  #=> #<MatchData "HELLO">

9. Lookarounds

Assert that a pattern is or isn’t ahead/behind:

(?=pattern)  # Positive lookahead
(?!pattern)  # Negative lookahead
(?<=pattern) # Positive lookbehind
(?<!pattern) # Negative lookbehind

Example:

# Match 'q' not followed by 'u'
"qat".match(/q(?!u)/)  #=> #<MatchData "q">

10. Ruby-Specific Features

Named Captures

match = "2023-05-18".match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/)
match[:year]  #=> "2023"

%r Notation

Alternative syntax for regex literals:

%r{http://example\.com}  # Same as /http:\/\/example\.com/

String Methods Using Regex

Ruby strings have many regex methods:

"hello".gsub(/[aeiou]/, '*')  #=> "h*ll*"
"a,b,c".split(/,/)            #=> ["a", "b", "c"]
"hello".scan(/./)             #=> ["h", "e", "l", "l", "o"]

Practical Examples

Email Validation:

email_regex = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i
"test@example.com".match?(email_regex)  #=> true

Extracting Phone Numbers:

text = "Call me at 555-1234 or (555) 987-6543"
text.scan(/(\(\d{3}\) \d{3}-\d{4}|\d{3}-\d{4})/)  #=> ["555-1234", "(555) 987-6543"]

HTML Tag Extraction:

html = "<p>Hello</p><div>World</div>"
html.scan(/<(\w+)>(.*?)<\/\1>/)  #=> [["p", "Hello"], ["div", "World"]]

Tips for Effective Regex in Ruby

Use Regexp.escape when matching literal strings:

Returns a new string that escapes any characters that have special meaning in a regular expression:

s = Regexp.escape('\*?{}.')      # => "\\\\\\*\\?\\{\\}\\."

   Regexp.escape("file.txt")  #=> "file\\.txt"

For complex patterns, use the x modifier for readability:

   regex = /
     \A             # Start of string
     [\w+\-.]+      # Local part
     @              # @ symbol
     [a-z\d\-]+     # Domain
     (\.[a-z]+)*    # Subdomains
     \.[a-z]+\z     # TLD
   /xi

Consider using Rubular (https://rubular.com/) for testing your Ruby regular expressions.

Regular expressions can become complex, but starting with these fundamentals will give you a solid foundation for text processing in Ruby.

Happy Ruby Coding! 🚀

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Tag: Regular Expression

Regular Expressions 🎰 in Ruby: A Step-by-Step Guide

1. Basic Matching

Literal Characters

Special Characters

2. Character Classes

Simple Character Sets

Ranges

Negated Character Sets

3. Shorthand Character Classes

4. Quantifiers

5. Anchors

6. Grouping and Capturing

7. Alternation

8. Modifiers

9. Lookarounds

10. Ruby-Specific Features

Named Captures

%r Notation

String Methods Using Regex

Practical Examples

Tips for Effective Regex in Ruby