Homework 4
In 2015,
Jeb Bush released all the email he received as governor. That
contained lots of personal information, including social security numbers, street addresses, and more.
Oops.
You're an investigative reporter who knows perl! What regular expressions would you use to find:
- Social security numbers (they take the form xxx-xx-xxxx)
- Email addresses (this doesn't have to be perfect. Don't copy one online - make your own)
- Street addresses
Copy this text into the text box at the
Regex Tester and
see what your patterns find. Put them in parentheses to get an actual list of the matches
to show up on the right of the tool. Use the "g" after the last / to see
all matches in the text.
It's also fine to use the "i" option to make your patterns case insensitive.
Don't worry - your patterns won't be perfect. They will find incorrect things and miss correct ones.
Just make them as good as you can.
For the exercises, use a local version of the file named the same as the
file online.
Exercise 1
We searched through
Ken Lay's emails for
addresses and phone numbers. Now
search it for email addresses. You can start with a simple pattern of
non-space characters, but refine it as you see things that don't make
sense. As you refine, be sure to allow for dots, dashes, and underscores
but not other punctuation. Print out the addresses as you find them.
Exercise 2
Going through the Ken Lay Emails, you will see each message starts with a
To: and a From:. Build a list like the one I supplied last week where, for
each email, you print a line that lists the sender and recipient:
bob@example.com,joe@example.com
alice@example.com,eve@example.com
joe@example.com,eve@example.com
For now, if there are multiple recipients, just use the first one. However, it is good to
think about how you would do this if you had to print a line for all recipients.
Hint: The To: always comes after the From:. You will have to add some
variable to keep track of whether you have already seen the From: of the
message and are looking for the To:, or if you have seen a To: and are
now looking for a From: again.