Regular Expressions ๐ŸŽฐ in Ruby: A Step-by-Step Guide

Regular expressions (regex) are powerful tools for pattern matching and text manipulation. In Ruby, they’re implemented through the Regexp class. Let’s start with the basics and gradually build up to more complex patterns.

1. Basic Matching

Literal Characters

The simplest regex matches exact text:

"hello".match(/hello/)  #=> #<MatchData "hello">

Special Characters

Some characters have special meaning and need escaping with \:

# Matching a literal dot
"file.txt".match(/file\.txt/)  #=> #<MatchData "file.txt">

2. Character Classes

Simple Character Sets

Match any one character from a set:

# Match either 'a', 'b', or 'c'
"bat".match(/[abc]/)  #=> #<MatchData "b">

Ranges

Match any character in a range:

# Match any lowercase letter
"hello".match(/[a-z]/)  #=> #<MatchData "h">

# Match any digit
"Room 101".match(/[0-9]/)  #=> #<MatchData "1">

Negated Character Sets

Match any character NOT in the set:

# Match any character that's not a vowel
"hello".match(/[^aeiou]/)  #=> #<MatchData "h">

3. Shorthand Character Classes

Ruby provides shortcuts for common character classes:

\d  # Any digit (0-9)
\D  # Any non-digit
\w  # Word character (letter, digit, underscore)
\W  # Non-word character
\s  # Whitespace (space, tab, newline)
\S  # Non-whitespace

Examples:

"Price: $100".match(/\d+/)  #=> #<MatchData "100">
"hello_world".match(/\w+/)  #=> #<MatchData "hello_world">

4. Quantifiers

Control how many times a pattern should match:

?     # 0 or 1 times
*     # 0 or more times
+     # 1 or more times
{n}   # Exactly n times
{n,}  # n or more times
{n,m} # Between n and m times

Examples:

# Match between 3 and 5 digits
"12345".match(/\d{3,5}/)  #=> #<MatchData "12345">

# Match 'color' or 'colour'
"colour".match(/colou?r/)  #=> #<MatchData "colour">

5. Anchors

Match positions rather than characters:

^  # Start of line
$  # End of line
\A # Start of string
\Z # End of string
\b # Word boundary

Examples:

# Check if string starts with 'Hello'
"Hello world".match(/^Hello/)  #=> #<MatchData "Hello">

# Check if string ends with 'world'
"Hello world".match(/world$/)  #=> #<MatchData "world">

6. Grouping and Capturing

Parentheses create groups and capture matches:

# Capture date components
match = "2023-05-18".match(/(\d{4})-(\d{2})-(\d{2})/)
match[1]  #=> "2023" (year)
match[2]  #=> "05"   (month)
match[3]  #=> "18"   (day)

7. Alternation

The pipe | acts like an OR operator:

# Match 'cat' or 'dog'
"dog".match(/cat|dog/)  #=> #<MatchData "dog">

8. Modifiers

Change how the regex works:

i  # Case insensitive
m  # Multiline mode (dot matches newline)
x  # Ignore whitespace (for readability)

Examples:

# Case insensitive match
"HELLO".match(/hello/i)  #=> #<MatchData "HELLO">

9. Lookarounds

Assert that a pattern is or isn’t ahead/behind:

(?=pattern)  # Positive lookahead
(?!pattern)  # Negative lookahead
(?<=pattern) # Positive lookbehind
(?<!pattern) # Negative lookbehind

Example:

# Match 'q' not followed by 'u'
"qat".match(/q(?!u)/)  #=> #<MatchData "q">

10. Ruby-Specific Features

Named Captures

match = "2023-05-18".match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/)
match[:year]  #=> "2023"

%r Notation

Alternative syntax for regex literals:

%r{http://example\.com}  # Same as /http:\/\/example\.com/

String Methods Using Regex

Ruby strings have many regex methods:

"hello".gsub(/[aeiou]/, '*')  #=> "h*ll*"
"a,b,c".split(/,/)            #=> ["a", "b", "c"]
"hello".scan(/./)             #=> ["h", "e", "l", "l", "o"]

Practical Examples

  1. Email Validation:
email_regex = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i
"test@example.com".match?(email_regex)  #=> true
  1. Extracting Phone Numbers:
text = "Call me at 555-1234 or (555) 987-6543"
text.scan(/(\(\d{3}\) \d{3}-\d{4}|\d{3}-\d{4})/)  #=> ["555-1234", "(555) 987-6543"]
  1. HTML Tag Extraction:
html = "<p>Hello</p><div>World</div>"
html.scan(/<(\w+)>(.*?)<\/\1>/)  #=> [["p", "Hello"], ["div", "World"]]

Tips for Effective Regex in Ruby

  1. Use Regexp.escape when matching literal strings:

Returns a new string that escapes any characters that have special meaning in a regular expression:

s = Regexp.escape('\*?{}.')      # => "\\\\\\*\\?\\{\\}\\."
   Regexp.escape("file.txt")  #=> "file\\.txt"
  1. For complex patterns, use the x modifier for readability:
   regex = /
     \A             # Start of string
     [\w+\-.]+      # Local part
     @              # @ symbol
     [a-z\d\-]+     # Domain
     (\.[a-z]+)*    # Subdomains
     \.[a-z]+\z     # TLD
   /xi
  1. Consider using Rubular (https://rubular.com/) for testing your Ruby regular expressions.

Regular expressions can become complex, but starting with these fundamentals will give you a solid foundation for text processing in Ruby.

Happy Ruby Coding! ๐Ÿš€

Unknown's avatar

Author: Abhilash

Hi, Iโ€™m Abhilash! A seasoned web developer with 15 years of experience specializing in Ruby and Ruby on Rails. Since 2010, Iโ€™ve built scalable, robust web applications and worked with frameworks like Angular, Sinatra, Laravel, Node.js, Vue and React. Passionate about clean, maintainable code and continuous learning, I share insights, tutorials, and experiences here. Letโ€™s explore the ever-evolving world of web development together!

Leave a comment