Enumerating Enumerable: Enumerable#grep

by Joey deVilla on August 21, 2008

Enumerating Enumerable

Once again, it’s Enumerating Enumerable time! This is the latest in my series of articles where I set out to make better documentation for Ruby’s Enumerable module than Ruby-Doc.org’s. In this installment, I cover the grep method.

In case you missed any of the previous articles, they’re listed and linked below:

  1. all?
  2. any?
  3. collect / map
  4. count
  5. cycle
  6. detect / find
  7. drop
  8. drop_while
  9. each_cons
  10. each_slice
  11. each_with_index
  12. entries / to_a
  13. find_all / select
  14. find_index
  15. first

Enumerable#first Quick Summary

Graphic representation of the "grep" method in Ruby's "Enumerable" module

In the simplest possible terms Which items in the collection are a === match for a given value?
Ruby version 1.8 and 1.9
Expects
  • An argument against which every object in the collection will be compared using the === operator.
  • (Optional) A block to be used in a map operation on the resulting array.
Returns
  • If no block is given, an array containing the items in the collection that were a === for the given argument.
  • If a block is given, an array containing the items in the collection that were a === for the given argument, which is then mapped using the block.
RubyDoc.org’s entry Enumerable#grep

Enumerable#grep, Regular Expressions and Arrays

The grep method’s name implies regular expressions, and that’s one of its uses. When given a regular expression as an argument and used without a block, grep returns an array containing the items in the original array that match the given regular expression.

# Here's a list of countries, some of them with "stan" in their names.
#
# I'm including Stan Lee, creator of many wonderful superhero comics simply because
# he's cool enough to be his own country.
countries = ["Afghanistan", "Burkina Faso", "Kazakhstan", "France", "Tajikistan",
"Iceland", "Uzbekistan", "Australia", "Stan Lee"]
=> ["Afghanistan", "Burkina Faso", "Kazakhstan", "France", "Tajikistan",
"Iceland", "Uzbekistan", "Australia", "Stan Lee"]

# Which countries have the string "stan" in their names?
countries.grep(/stan/)
=> ["Afghanistan", "Kazakhstan", "Tajikistan", "Uzbekistan"]

# Note that "Stan Lee" wasn't included in that list. "Stan" and "stan" aren't the
# same thing, but that's easy to fix:
countries.grep(/[S|s]tan/)
=> ["Afghanistan", "Kazakhstan", "Tajikistan", "Uzbekistan", "Stan Lee"]

When a block is used with grep, the contents of the result array are passed through the block and the resulting array is returned. Think of it as grep followed by a collect/map operation.

# Let's get a look at those countries with "Stan" and "stan" in their names again:
countries.grep(/[S|s]tan/)
=> ["Afghanistan", "Kazakhstan", "Tajikistan", "Uzbekistan", "Stan Lee"]

# Let's get the lengths of the names of those countries:
countries.grep(/[S|s]tan/) {|country| country.length}
=> [11, 10, 10, 10, 8]

# It's a slightly shorter version of this:
countries.grep(/[S|s]tan/).map {|country| country.length}
=> [11, 10, 10, 10, 8]

# This time, let's find all the "stans" and uppercase them
countries.grep(/[S|s]tan/) {|country| country.upcase}
=> ["AFGHANISTAN", "KAZAKHSTAN", "TAJIKISTAN", "UZBEKISTAN", "STAN LEE"]

# And here's the version that uses map:
countries.grep(/[S|s]tan/).map {|country| country.upcase}
=> ["AFGHANISTAN", "KAZAKHSTAN", "TAJIKISTAN", "UZBEKISTAN", "STAN LEE"]

What Enumerable#grep Really Does: The === Operator

Here’s grep‘s secret: what it actually does is take each item in the array, compares it against the given argument using Ruby’s === (the “triple equals”) operator and returns an array of those items in the original array for which the comparison returns true.

For regular expressions, the === operator is grep-like. The expression r === s operator returns true if there is a match for regular expression r in string s.

Different classes implement === differently. For example, in the Range class, === is used to see if an item is within the range. The expression r === x returns true if x is in range r. Here’s grep in action when its argument is a range:

# These are the years when the band Radiohead released an album
radiohead_album_years = [1993, 1995, 1997, 2000, 2003, 2007]
=> [1993, 1995, 1997, 2000, 2003, 2007]

# And these are the years when Radiohead released an album between 1996 and
# 2002 inclusive
radiohead_album_years.grep((1996..2002))
=> [1997, 2000]

Generally speaking, collection.grep(thing_to_compare) compares thing_to_compare with each item in collection using the === operator as defined for thing_to_compare‘s class. It returns an array of those items in the original array for which the comparison returned true.

Don’t forget the extra processing — a map operation — comes “free” if you provide grep with a block:

radiohead_album_years = [1993, 1995, 1997, 2000, 2003, 2007]
=> [1993, 1995, 1997, 2000, 2003, 2007]

# Adding a block performs a map operation on grep's initial results
radiohead_album_years.grep((1996..2002)) {|year| year % 2 == 1 ? "odd" : "even" }
=> ["odd", "even"]

Enumerable#grep and Hashes

I’ll put it simply: Enumerable#grep isn’t terribly useful with hashes. Like most methods of Enumerable, when applied to a hash, grep, as it iterates through the hash, converts each key-value pair into a two-element array where the first element is the key and the second element is the corresponding value.

As I mentioned earlier, grep uses the === operator to do its comparison, and for arrays, === returns true only when comparing identical arrays:

# Identical arrays
[1, 2] === [1, 2]
=> true

# How about the first array as a subset of the second?
[1] === [1, 2]
=> false

# How about the first array as a superset of the second?
[1, 2, 3] === [1, 2]
=> false

# How about one array as a permutation of the other?
[2, 1] === [1, 2]
=> false

The practical upshot of all this is that for hashes, grep will return the empty array [] for most arguments, with the notable exception of an argument that is a two-dimensional array that corresponds to one of the key-value pairs in the hash.

That was a bit wordy, but an example should clear things right up:

# These are countries and their total areas (not counting outside territories)
# in square kilometres.
total_country_areas = {"Afghanistan"  => 647_500,
                       "Burkina Faso" => 274_200,
                       "Kazakhstan"   => 2_717_300,
                       "France"       => 547_030}
=> {"Afghanistan"=>647500, "Burkina Faso"=>274200, "Kazakhstan"=>2717300, "France"=>547030}

# Is there a '"Burkina Faso" => 274200' item in the hash?
total_country_areas.grep(["Burkina Faso", 274_200])
=> [["Burkina Faso", 274200]]

# That worked because the array argument we provided was an exact match
# for one of the items in the hash when it is converted into an array.

# Is there a '"Burkina Faso" => 0' item in the hash?
total_country_areas.grep(["Burkina Faso", 0])
=> []

# That didn't work because the array argument didn't correspond to any of the items
# in the hash.

Making Hashes grep-able

If you need to find which keys in a hatch pattern-match a given value, use the Hash#keys method (which returns an array of the hash’s keys) and grep that:

# Again with the countries and the areas...
total_country_areas = {"Afghanistan"  => 647_500,
                       "Burkina Faso" => 274_200,
                       "Kazakhstan"   => 2_717_300,
                       "France"       => 547_030}
=> {"Afghanistan"=>647500, "Burkina Faso"=>274200, "Kazakhstan"=>2717300, "France"=>547030}

# Which ones are the "stans"?
total_country_areas.keys.grep(/stan/)
=> ["Afghanistan", "Kazakhstan"]

If you need to-find which values in a hatch pattern-match a given value, use the Hash#values method (which returns an array of the hash’s values) and grep that:

# Of the countries' total areas, which the ones between
# 500,000 and 1 million square km?
total_country_areas.values.grep((500_000..1_000_000))
=> [647500, 547030]

What if you want to find key-value pairs where either the key or the value is a === match for a given argument? There’s a way to do that, and I’ll cover it when we get to the Enumerable#inject method. It’ll be soon, I promise!

Previous post:

Next post: