Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

GitHub Repository: pierian-data/complete-python-3-bootcamp
Path: blob/master/15-PDFs-and-Spreadsheets/03-PDFs-Spreadsheets-Puzzle-Solution.ipynb
Views: ⁶⁴⁸

Kernel: Python 3 (ipykernel)

Content Copyright by Pierian Data

PDFs and Spreadsheets Puzzle Exercise

You will need to work with two files for this exercise and solve the following tasks:

Task One: Grab the Google Drive link from the .csv file. (Hint: Its along the diagonal).
Task Two: Download the PDF from the Google Drive link (we already downloaded it for you just in case you can't download from Google Drive) and find the phone number that is in the document. Note: There are different ways of formatting a phone number!

Task One: Grab the Google Drive Link from .csv File

In [1]:

import csv

Grab all the lines of data.

In [2]:

data = open('Exercise_Files/find_the_link.csv',encoding="utf-8")
csv_data = csv.reader(data)
data_lines = list(csv_data)

We can see its along the diagonal, which means the values are at the index position that matches the row's number order. So the 1st letter is the 1st item in the 1st row, the 2nd letter is the 2nd item in the 2nd row, the 3rd item is the 3rd letter in the 3rd row and so on. We can use enumerate to track the row number and simply index off the data_lines.

Method One

In [3]:

link_list = []
for row_num,data in enumerate(data_lines):
    link_list.append(data[row_num])

In [4]:

''.join(link_list)

Out[4]:

'https://drive.google.com/open?id=1G6SEgg018UB4_4xsAJJ5TdzrhmXipr4Q'

Method Two

In [5]:

link_str = ''
for row_num,data in enumerate(data_lines):
    link_str+=data[row_num]

In [6]:

link_str

Out[6]:

'https://drive.google.com/open?id=1G6SEgg018UB4_4xsAJJ5TdzrhmXipr4Q'

Task Two: Download the PDF from the Google Drive link and find the phone number that is in the document.

In [7]:

import PyPDF2

In [8]:

f = open('Exercise_Files/Find_the_Phone_Number.pdf','rb')

In [9]:

pdf = PyPDF2.PdfReader(f)

In [10]:

len(pdf.pages)

Out[10]:

17

Phone Number Matching

Lot's of ways to do this, but you had to figure out the phone number was in format ###.###.####

Hint: https://stackoverflow.com/questions/4697882/how-can-i-find-all-matches-to-a-regular-expression-in-python

In [11]:

import re

In [12]:

pattern = r'\d{3}'

In [13]:

all_text = ''

for n in range(len(pdf.pages)):
    
    page = pdf.pages[n]
    page_text = page.extract_text()
    
    all_text = all_text+' '+page_text

In [14]:

for match in re.finditer(pattern,all_text):
    print(match)

Out[14]:

<re.Match object; span=(650, 653), match='000'>
<re.Match object; span=(18270, 18273), match='000'>
<re.Match object; span=(35890, 35893), match='000'>
<re.Match object; span=(42919, 42922), match='505'>
<re.Match object; span=(42923, 42926), match='503'>
<re.Match object; span=(42927, 42930), match='445'>

Once you know the correct pattern:

In [15]:

import re

In [16]:

pattern = r'\d{3}.\d{3}.\d{4}'

In [17]:

for n in range(len(pdf.pages)):
    
    page  = pdf.pages[n]
    page_text = page.extract_text()
    match = re.search(pattern,page_text)
    
    if match:
        print(match.group())

Out[17]:

505.503.4455

Great Job! Information on this phone number:

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

PDFs and Spreadsheets Puzzle Exercise

Task One: Grab the Google Drive Link from .csv File

Task Two: Download the PDF from the Google Drive link and find the phone number that is in the document.

Phone Number Matching

Product

Resources

Company

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more, all in one place. Commercial Alternative to JupyterHub.

PDFs and Spreadsheets Puzzle Exercise

Task One: Grab the Google Drive Link from .csv File

Task Two: Download the PDF from the Google Drive link and find the phone number that is in the document.

Phone Number Matching

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.