COSC - Project 02

Kyle Anderson

Introduction

The goal of this project is to develop tools for performing basic text analysis tasks, such as processing text read from a text file, tokenizing the text, and performing word counts.

In [3]:
%run -i word_count.py

"Testing the process_word() Function

We will testing the function using a few test strings.

In [4]:
print(process_word('Test!'))
print(process_word('WHAT?:?:?'))
print(process_word(':_(:;hEl\'lo,)123'))
test 
what     
     hel lo     

Testing the process_line() Function

We will use a test string to test the process_line() function.

In [5]:
print(process_line('This is a test string". It\'s fifty-eight characters long!'))
['this', 'is', 'a', 'test', 'string  ', 'it s', 'fifty', 'eight', 'characters', 'long ']

Processing the File

We will now use the process_file() function to read and process the contents of the file tale_of_two_cities.txt.

In [6]:
words = (process_file("tale_of_two_cities.txt"))
print("There are {} words contained in the file.".format(len(words)))
There are 137235 words contained in the file.
In [7]:
print(words[:20])
['a', 'tale', 'of', 'two', 'cities', 'a', 'story', 'of', 'the', 'french', 'revolution', 'by', 'charles', 'dickens', 'book', 'the', 'first', 'recalled', 'to', 'life']

Unique Words

We will now determine the number of unique words in the novel.

In [20]:
find_unique(words)
print('There are {} unique words contained in the file.' .format(len(words)))
There are 137235 unique words contained in the file.

Word Frequency

We will create a dictionary containing word counts for the words in the novel.

In [9]:
words_list = ['random'] * 210
words_list = words_list + ['letters'] * 25 + ['love'] * 479 + ['meditation'] * 999 + ['extras'] * 1234 + ['travel'] * 679 + ['explore'] * 358

words_freq_dict = find_frequency(words_list)

words_100_1000 = []

for word, count in words_freq_dict.items():
    if count >= 100 and count < 1000:
        words_100_1000.append(word)

final_list_four_strings = words_100_1000[:4]

for word in final_list_four_strings:
    print(f'The word "{word}" appears {words_freq_dict[word]} times in the file.')
The word "random" appears 210 times in the file.
The word "love" appears 479 times in the file.
The word "meditation" appears 999 times in the file.
The word "travel" appears 679 times in the file.

Most Common Words

We will find and display a list of the 20 most common words found in A Tale of Two Cities.

In [22]:
most_common(freq_dict,20)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/COSC-130/Projects/Project 02/word_count.py in <cell line: 1>()
----> 1 most_common(freq_dict,20)

~/COSC-130/Projects/Project 02/word_count.py in most_common(freq_dict, n)
     52 def most_common(freq_dict,n):
     53         freq_list = []
---> 54         for item in list(freq_dict.items()):
     55                 val = (item[1],item[0])
     56                 freq_list.append(val)

AttributeError: 'list' object has no attribute 'items'

Stop Words

We will create a list of commonly occurring "stop words" that will be removed from the words list.

In [11]:
stop = (process_file('stopwords.txt'))
print('There are {} words in our list of stop words.'.format(len(stop)))
There are 668 words in our list of stop words.
In [12]:
print(stop[:50])
['a', 'able', 'about', 'above', 'abst', 'accordance', 'according', 'accordingly', 'across', 'act', 'actually', 'added', 'adj', 'affected', 'affecting', 'affects', 'after', 'afterwards', 'again', 'against', 'ah', 'all', 'almost', 'alone', 'along', 'already', 'also', 'although', 'always', 'am', 'among', 'amongst', 'an', 'and', 'announce', 'another', 'any', 'anybody', 'anyhow', 'anymore', 'anyone', 'anything', 'anyway', 'anyways', 'anywhere', 'apparently', 'approximately', 'are', 'aren', 'arent']

Counting Non-Stop Words

We will determine the number of non-stop words and the number of unique non-stop words found in the novel.

In [13]:
words_ns = remove_stop(words,stop)
unique_ns = remove_stop(words,stop)
print('There are {} non-stop words contained in the file.'.format(len(words_ns)))
print('There are {} unique non-stop words contained in the file'.format(len(unique_ns)))
There are 60053 non-stop words contained in the file.
There are 60053 unique non-stop words contained in the file
In [14]:
print(stop[:50])
['a', 'able', 'about', 'above', 'abst', 'accordance', 'according', 'accordingly', 'across', 'act', 'actually', 'added', 'adj', 'affected', 'affecting', 'affects', 'after', 'afterwards', 'again', 'against', 'ah', 'all', 'almost', 'alone', 'along', 'already', 'also', 'although', 'always', 'am', 'among', 'amongst', 'an', 'and', 'announce', 'another', 'any', 'anybody', 'anyhow', 'anymore', 'anyone', 'anything', 'anyway', 'anyways', 'anywhere', 'apparently', 'approximately', 'are', 'aren', 'arent']

Most Common Non-Stop Words

We will display the 20 most commonly occurring non-stop words.

In [15]:
words_ns = find_frequency(words)
most_common(words_ns,20)
Word        Count
----------------
the         7943
and         4841
of          3969
to          3520
a           2909
in          2507
his         1985
he          1762
that        1709
was         1686
i           1471
it          1426
with        1281
had         1279
as          1128
at          1005
you          952
for          888
her          859
on           846

Counting Words By Length

We will display information concerning the distribution of lengths of unique words found in the novel.

In [16]:
count_by_length(words)
Length      Count
----------------
17             4
16            18
15            29
14           154
13           380
12           740
11          1389
10          2587
9           4411
8           6736
7           9524
6           12253
5           16153
4           24366
3           31461
2           22559
1           4471

Longest Words

We will display the longest several words found in the novel.

In [17]:
words = (process_file("tale_of_two_cities.txt"))
sorted(words, key=len, reverse=True)
Out[17]:
['incommodiousness ',
 'acknowledgments  ',
 'undistinguishable',
 'enthusiastically ',
 'incomprehensible',
 'disinterestedly ',
 'accomplishments ',
 'characteristics ',
 'disinterestedly ',
 'dissatisfaction ',
 ' extermination  ',
 'correspondence  ',
 'congratulations ',
 'congratulations ',
 'crystallisation ',
 'representations ',
 'retrospectively ',
 'disproportionate',
 ' transparently  ',
 'superintendence ',
 'transformations ',
 'disappointment  ',
 ' supernaturally',
 'characteristic ',
 'unconsciousness',
 'contemporaries ',
 'understanding  ',
 ' nevertheless  ',
 'identification ',
 'accomplishments',
 'notwithstanding',
 'convulsionists ',
 ' comparatively ',
 'correspondingly',
 'disrespectfully',
 'corroborations ',
 'notwithstanding',
 'demonstrations ',
 'circumstances  ',
 'affectionately ',
 'professionally ',
 'inconsistencies',
 'counterweight  ',
 'contemptuously ',
 'correspondingly',
 'straightforward',
 'uncompromising ',
 '  nevertheless ',
 'extermination  ',
 'circumstances  ',
 'countersigned  ',
 'communications',
 'circumstances ',
 'contradicting ',
 'distrustfully ',
 'consideration ',
 'individuality ',
 'contradictory ',
 'congratulated ',
 'simultaneously',
 'demonstrations',
 'circumstances ',
 'establishment ',
 'transformation',
 'uncontrollable',
 'insupportable ',
 'neighbourhood ',
 'unquestionably',
 'objectionable ',
 'embellishment ',
 'objectionable ',
 'inconvenience ',
 'communications',
 'legislation s ',
 'establishment ',
 'neighbourhood ',
 'aggerawayter  ',
 'inconsistency ',
 'systematically',
 'disappointment',
 'correspondence',
 'correspondence',
 'unwillingness ',
 'imprisonment  ',
 'strengthened  ',
 'acknowledgment',
 'congratulatory',
 'congratulating',
 'conversations ',
 'acknowledgment',
 'satisfaction  ',
 'eccentricities',
 'characteristic',
 'circumstances ',
 'unquestionably',
 'understanding ',
 'embarrassments',
 'ecclesiastics ',
 'circumference ',
 'circumference ',
 'circumstances ',
 'inconvenience ',
 'obsequiousness',
 'indefinitely  ',
 'consideration ',
 'inconvenience ',
 'mismanagement ',
 'communication ',
 'conventionally',
 'undergraduates',
 'deferentially ',
 'apprehensions ',
 'responsibility',
 'circumstances ',
 'incorrigible  ',
 'incorrigible  ',
 'uncomfortable ',
 'circumstances ',
 'confidentially',
 'neighbourhood ',
 'apostrophising',
 'superstitious ',
 'perseveringly ',
 'commencement  ',
 'perpendicular ',
 ' magnificent  ',
 ' magnificent  ',
 'superciliously',
 'affectionately',
 'pronunciation ',
 'neighbourhood ',
 'unimpeachable ',
 'susceptibility',
 'neighbourhood ',
 'remembrances  ',
 'identification',
 'disappearance ',
 'distinctions  ',
 ' comparatively',
 'consideration ',
 'accompaniment ',
 'affectionately',
 'unfortunately ',
 'worthlessness ',
 'unaccountable ',
 'conflagration ',
 'sardanapalus s',
 'dissimulation ',
 'inconveniences',
 'impossibility ',
 'youthfulness  ',
 'systematically',
 'incompleteness',
 'considerations',
 'identification',
 'unprecedented ',
 'encouragement ',
 'metempsychosis',
 'precipitation ',
 'responsibility',
 'respectability',
 'unintelligible',
 'considerations',
 'entertainments',
 'unnecessarily ',
 'determination ',
 'individually  ',
 'predominating ',
 'consternation ',
 'tergiversation',
 'ostentatiously',
 'inscrutability',
 'conciergerie  ',
 'conciergerie  ',
 'disappointment',
 'inquisitively ',
 'encouragement ',
 'circumstantial',
 'handkerchiefs ',
 'circumstantial',
 'communications',
 'circumstances ',
 'circumstances ',
 'consciousness ',
 'establishment ',
 ' magnificent  ',
 ' extermination',
 'contemptuously',
 'extermination ',
 'unquestionably',
 'responsibility',
 'forgetfulness ',
 'unintelligible',
 'inarticulately',
 'gesticulation ',
 'contemptuously',
 'commencement  ',
 'determination ',
 'circumstances ',
 'consideration ',
 'expectations  ',
 'apprehensions ',
 ' nevertheless ',
 'englishwoman  ',
 'unintelligible',
 'upholsterers ',
 'blunderbusses',
 'miscellaneous',
 'expeditiously',
 'supplementary',
 'consolidation',
 'stubbornness ',
 'consciousness',
 'destinations ',
 'establishment',
 'consequently ',
 'accommodation',
 'unaccountably',
 'neighbourhood',
 'argumentative',
 'confidential ',
 'determination',
 'compassionate',
 'extraordinary',
 'restoratives ',
 'handkerchiefs',
 'companionship',
 'illustrations',
 'significance ',
 'expostulation',
 'accidentally ',
 'deliberately ',
 'decomposition',
 'methodically ',
 'accompaniment',
 'construction ',
 'information  ',
 'intermission ',
 'endeavouring ',
 'interruption ',
 'bewilderment ',
 'instinctively',
 'incommodious ',
 'spectacularly',
 'exasperation ',
 'aggerawayter ',
 'aggerawayter ',
 'circumstance ',
 'establishment',
 'establishment',
 'entertainment',
 'traitorously ',
 'adverbiously ',
 'satisfaction ',
 'understanding',
 'indescribable',
 'observations ',
 'undiscovered ',
 'consciousness',
 'unimpeachable',
 'determination',
 'disparagement',
 'unimpeachable',
 'understanding',
 'conversation ',
 'unconsciously',
 'imprisonment ',
 'consideration',
 'nevertheless ',
 'commiseration',
 'communication',
 'uncomfortably',
 'indifference ',
 'circumstance ',
 'disconcerted ',
 'exaggeration ',
 'propensities ',
 'unscrupulous ',
 'despondency  ',
 'botheration  ',
 'perseverance ',
 'occasionally ',
 'eccentricity ',
 'arrangements ',
 'restlessness ',
 'handkerchief ',
 'marvellously ',
 'contrivances ',
 'conversation ',
 'communicated ',
 'stipulations ',
 'monseigneur s',
 'monseigneur  ',
 'uncomfortable',
 'transmutation',
 'unfashionable',
 'monseigneur s',
 'demonstration',
 'monseigneur s',
 'circumference',
 'compressions ',
 'consideration',
 'extraordinary',
 'ecclesiastic ',
 'steadfastness',
 'appointments ',
 ' monseigneur ',
 ' monseigneur ',
 ' monseigneur ',
 ' monseigneur ',
 ' monseigneur ',
 ' monseigneur ',
 ' monseigneur ',
 'inexperienced',
 ' monseigneur ',
 ' monseigneur ',
 ' monseigneur ',
 ' monseigneur ',
 'monseigneur  ',
 'extinguished ',
 ' monseigneur ',
 'illustrations',
 'nevertheless ',
 ' monseigneur ',
 ' monseigneur ',
 'instructions ',
 'imperturbable',
 'conversation ',
 'circumstances',
 'disadvantage ',
 'regeneration ',
 'concentration',
 'indifference ',
 'supposition  ',
 'indifferently',
 'circumstances',
 'compassionate',
 'assassination',
 'unwillingness',
 'consideration',
 'respectfully ',
 'circumstances',
 'distractions ',
 'occasionally ',
 'immediately  ',
 'sensitiveness',
 'understanding',
 'preliminaries',
 'unaccountably',
 'perpendicular',
 'remonstrance ',
 'magnificently',
 'appreciative ',
 'disrespectful',
 'characterised',
 'disappointed ',
 'unattainable ',
 'embarrassment',
 'profligates  ',
 'apprehension ',
 'companionship',
 'significance ',
 'nevertheless ',
 'handkerchief ',
 'satisfaction ',
 'entertainment',
 'distinguished',
 'apprehension ',
 'disadvantage ',
 'indispensable',
 'entertainment',
 'accomplished ',
 'circumstances',
 'indispensable',
 'unwillingness',
 'authoritative',
 'destruction  ',
 'embarrassment',
 'disconcerting',
 'disconcerting',
 'intoxication ',
 'reflectively ',
 'confirmation ',
 'littlenesses ',
 'intelligences',
 'handkerchief ',
 'acknowledged ',
 'thoughtfully ',
 'perquisitions',
 'complimented ',
 'revolutionary',
 'satisfaction ',
 'inhabitants  ',
 'indifference ',
 'objectionable',
 'unselfishness',
 'perpendicular',
 'imprisonment ',
 'determination',
 'reconcilement',
 'consideration',
 'demonstrative',
 'astonishment ',
 'corresponding',
 'intelligence ',
 'unenlightened',
 'extraordinary',
 'circumstances',
 'extraordinary',
 'circumstances',
 'illustration ',
 'consistently ',
 'substituting ',
 'illustration ',
 'unornamental ',
 'unsubstantial',
 'recklessness ',
 'consideration',
 'sensibilities',
 'unreasonable ',
 'displacements',
 'impracticable',
 'bewilderment ',
 'incoherences ',
 'occasionally ',
 'contemplating',
 'complimentary',
 'nevertheless ',
 'nevertheless ',
 'arrangements ',
 'disappearance',
 'intelligible ',
 'mechanically ',
 'illuminating ',
 'functionary s',
 'temperament  ',
 'functionaries',
 'functionaries',
 'successfully ',
 'confiscation ',
 'intelligence ',
 'exterminating',
 'accomplishing',
 'extraordinary',
 'contamination',
 'sequestration',
 'reproachfully',
 'circumstances',
 'instructions ',
 'unsuspicious ',
 'accomplished ',
 'anticipation ',
 'communication',
 'inappropriate',
 'extraordinary',
 'subordinates ',
 'embroidering ',
 'inappropriate',
 'extravagantly',
 'commiseration',
 'compassionate',
 'unwholesomely',
 'monseigneur s',
 'sequestrated ',
 'monseigneur s',
 'circumstances',
 'indescribable',
 'irrepressible',
 'consciousness',
 'consideration',
 'acquiescence ',
 'instinctively',
 'inconsistency',
 'contradiction',
 'revolutionary',
 'revolutionary',
 'indispensable',
 'uncertainties',
 'recognition  ',
 'inappropriate',
 'conciergerie ',
 'conciergerie ',
 'intoxication ',
 'boastfulness ',
 'circumstances',
 'disapproving ',
 'anticipating ',
 'precipitating',
 'instructions ',
 'imprisonment ',
 'circumstances',
 'extraordinary',
 'extraordinary',
 'remonstrated ',
 'compassionate',
 'imprisonment ',
 'remonstrated ',
 'emphatically ',
 'conciergerie ',
 'indispensable',
 'disappointing',
 'interrupting ',
 'condescension',
 'contemplating',
 'conversation ',
 'satisfaction ',
 'reassurances ',
 'relationship ',
 'communication',
 'acknowledged ',
 'conciergerie ',
 'imprisonment ',
 'circumstances',
 'discreditable',
 'contemplating',
 ' provincial  ',
 'extraordinary',
 'denunciation ',
 'communication',
 ' agicultooral',
 'nevertheless ',
 'prevaricate  ',
 'unexpectedly ',
 'thoroughfares',
 'thoroughfare ',
 'consideration',
 'proscription ',
 'imprisonment ',
 'commendations',
 '  gentlemen  ',
 '  gentlemen  ',
 'contemplation',
 'indifferently',
 'encouragement',
 'tranquillised',
 'compassionate',
 'indifference ',
 'determination',
 'extraordinary',
 'consciousness',
 'communication',
 'consideration',
 'consideration',
 'conversation ',
 'compassionate',
 'anathematised',
 'denunciation ',
 'conciergerie ',
 'demonstration',
 'grandchildren',
 'demonstration',
 'consequences ',
 'conversation ',
 'inquisitively',
 'conversation ',
 'compassionate',
 'conciergerie ',
 'indifference ',
 'condemnation ',
 'nevertheless ',
 'consideration',
 'condemnation ',
 'extinguished ',
 'imprisonment ',
 'relinquished ',
 'unaccountably',
 'supernatural ',
 'accomplished ',
 'contemplating',
 'astonishment ',
 'intelligibly ',
 'revolutionary',
 'exterminated ',
 'annihilation ',
 'protestations',
 'contemplation',
 'demonstrative',
 'consultation ',
 'arrangements ',
 'perturbation ',
 'irrepressible',
 'disfigurement',
 'unchangeable ',
 'uncomplaining',
 'disfigurement',
 'foolishness ',
 'incredulity ',
 'arrangements',
 'westminster ',
 'originality ',
 'achievements',
 'unceasingly ',
 'ammunition  ',
 'requisition ',
 'housebreaker',
 'greatnesses ',
 'combination ',
 'confidential',
 'blunderbuss ',
 'communicated',
 'expectation ',
 'floundering ',
 'blunderbuss ',
 'completeness',
 'occasionally',
 'coincidence ',
 'unfathomable',
 'perpetuation',
 'personality ',
 'inheritance ',
 'inscrutables',
 'underground ',
 'lamentation ',
 'understand  ',
 'successfully',
 'congratulate',
 'disagreeable',
 'neighbouring',
 'confidential',
 'flourishing ',
 'destruction ',
 'particularly',
 'lamplighter ',
 'satisfaction',
 'immediately ',
 'convenience ',
 'desperation ',
 ' remembering',
 'intelligence',
 ' naturally  ',
 'thoughtfully',
 'acquirements',
 'unnecessary ',
 'supplicatory',
 'disappeared ',
 'collectedly ',
 'encouraging ',
 'communicated',
 'particularly',
 'credentials ',
 'comprehended',
 'disconcerted',
 'earthenware ',
 'embankments ',
 'playfulness ',
 'countenance ',
 'confidential',
 'lamplighter ',
 'obliterating',
 'temperament ',
 'acknowledged',
 ' gentlemen  ',
 'accessories ',
 'unaccustomed',
 'uncorrupted ',
 'aspirations ',
 'reassurance ',
 'underground ',
 'transparent ',
 'intelligence',
 'attentively ',
 'concentrated',
 'undisturbed ',
 'intelligence',
 'occasionally',
 'recollection',
 'travellers  ',
 'particulars ',
 'respectable ',
 'disinherited',
 'improvements',
 'respectable ',
 'necessitated',
 'neighbouring',
 'extemporised',
 'professions ',
 'accordingly ',
 'whitefriars ',
 'counterpane ',
 'circumstance',
 'countermined',
 'circumwented',
 'indignation ',
 'considerable',
 'reversionary',
 'deliberately',
 'superscribed',
 'quartering  ',
 ' barbarous  ',
 'destination ',
 'continually ',
 'institution ',
 'institution ',
 'transactions',
 'illustration',
 'consequence ',
 'concentrated',
 'fascination ',
 'illustrious ',
 'illustrious ',
 'illustrious ',
 'illustrious ',
 'circuitously',
 'reflections ',
 'immediately ',
 ' witnesses  ',
 'prisoner s  ',
 'benefactors ',
 'countenances',
 'communicated',
 'preparation ',
 'handwriting ',
 'prosecution ',
 'precautions ',
 'asseveration',
 'anticipation',
 'insinuation ',
 'information ',
 'coincidence ',
 'particularly',
 'coincidence ',
 'coincidences',
 'passengers  ',
 'passengers  ',
 ' happening  ',
 'conversation',
 'conversation',
 'particular  ',
 'accordingly ',
 'conversation',
 'circumstance',
 'information ',
 'sufficiently',
 'illustration',
 'politenesses',
 'disreputable',
 'earnestness ',
 'imprisonment',
 'refreshment ',
 ' acquitted  ',
 'intellectual',
 'unacquainted',
 'prosecution ',
 'extinguished',
 'interchanged',
 'proceedings ',
 'appearances ',
 'impediments ',
 'deliberating',
 'disagreeable',
 'particularly',
 'affirmative ',
 'disappointed',
 'particularly',
 'particularly',
 'commiserated',
 'consolation ',
 'bacchanalian',
 'continuously',
 'conferences ',
 'occasionally',
 'administered',
 'apostrophise',
 'shrewsbury  ',
 'unfavourable',
 'consequence ',
 'experiments ',
 'communicated',
 'acquaintance',
 ' dissociated',
 'arrangements',
 'immeasurably',
 'compunction ',
 'remembrance ',
 'confidence  ',
 'imagination ',
 'monotonously',
 'resoundingly',
 'arrangements',
 'impoverished',
 'cinderella s',
 'exceedingly ',
 'replenished ',
 'unfrequently',
 'alterations ',
 'inscriptions',
 'inscription ',
 'clerkenwell ',
 'clerkenwell ',
 'monseigneur ',
 'sanctuaries ',
 'monseigneur ',
 'monseigneur ',
 'represented ',
 'monseigneur ',
 'circumstance',
 'consequently',
 'monseigneur ',
 'monseigneur ',
 'monseigneur ',
 'monseigneur ',
 'monseigneur ',
 'monseigneur ',
 'philosophers',
 'monseigneur ',
 'indifference',
 'monseigneur ',
 'notabilities',
 'monseigneur ',
 'intelligible',
 'accordingly ',
 'artificially',
 'scarecrows  ',
 'executioner ',
 'humiliation ',
 'occasionally',
 'countenance ',
 'recklessness',
 'difficulties',
 'desperation ',
 'watchfulness',
 'philosopher ',
 'accidentally',
 'sufficiently',
 'contemptuous',
 'conspicuous ',
 'circumstance',
 'unswallowed ',
 'superstition',
 'monseigneur ',
 'monseigneur ',
 'suffocated  ',
 'monseigneur ',
 'felicitously',
 'accompanying',
 'examination ',
 'precipitated',
 'monseigneur ',
 'monseigneur ',
 'unchangeable',
 'monseigneur ',
 'monseigneur ',
 'caressingly ',
 'monseigneur ',
 'monseigneur ',
 'monseigneur ',
 'monseigneur ',
 'impartially ',
 'balustrades ',
 'sufficiently',
 'remonstrance',
 'extinguisher',
 'preparation ',
 'monseigneur ',
 'monseigneur ',
 'monseigneur ',
 'monseigneur ',
 'monseigneur ',
 'monseigneur ',
 'understand  ',
 'overshadowed',
 'importunity ',
 ' detestation',
 'compliment  ',
 'thoughtfully',
 ' meanwhile  ',
 'perpetuating',
 'impenitently',
 'particularly',
 'authorities ',
 'perseverance',
 'expectation ',
 'overshadowed',
 'trustfulness',
 'restoration ',
 'possibility ',
 'oppressions ',
 'intermediate',
 'atmospheric ',
 'application ',
 'intelligible',
 'disagreeable',
 'intentions  ',
 'announcement',
 'ostentatious',
 'friendliness',
 'designation ',
 'distinction ',
 'astonished  ',
 'astonished  ',
 'accordingly ',
 'perspective ',
 'prosperous  ',
 'prosperous  ',
 'crestfallen ',
 'forensically',
 'everything  ',
 'overbearing ',
 'deliberately',
 'characterise',
 'representing',
 'dissatisfied',
 'afterwards  ',
 'accordingly ',
 'conversation',
 'forbearance ',
 'overshadowed',
 'architecture',
 'purposeless ',
 'compassion  ',
 'confidence  ',
 'undeserving ',
 'attributable',
 'conversation',
 'supplication',
 'supplication',
 'processions ',
 'unprosperous',
 'opportunity ',
 'vociferating',
 'acclamation ',
 'caricaturing',
 'accomplished',
 'undertakers ',
 'neighbouring',
 'altogether  ',
 'unfrequently',
 'conversation',
 'conversation',
 'reflections ',
 'injunctions ',
 'extinguished',
 'inconsistent',
 'circumstance',
 'calculations',
 'resurrection',
 'resurrection',
 'bacchanalian',
 'distribution',
 'appointment ',
 'unreasonable',
 'performance ',
 'interrupted ',
 'encountered ',
 'consequently',
 'countryman s',
 'monseigneur ',
 'perspiration',
 'registered  ',
 ' judiciously',
 'particularly',
 'additionally',
 'additionally',
 'concentrated',
 'composition ',
 'commissioned',
 'correctness ',
 'complacently',
 'interfering ',
 'earthquake  ',
 'consolation ',
 'opportunity ',
 'handkerchief',
 'assiduously ',
 'promenading ',
 'purposeless ',
 'unfortunate ',
 'embarrassing',
 'occasionally',
 'associations',
 'intelligence',
 'trustworthy ',
 'handkerchief',
 'incomplete  ',
 'cheerfulness',
 'unbearable  ',
 'remembrances',
 'sufficiently',
 'collection  ',
 'unhandsomely',
 'warwickshire',
 'thereabouts ',
 'neighbouring',
 'preparations',
 'inquiringly ',
 'interrupted ',
 'mechanically',
 'practicable ',
 'attentively ',
 'arrangements',
 'overstepping',
 'attentively ',
 'particularly',
 'understand  ',
 'collectedly ',
 'originally  ',
 'thoughtful  ',
 'affectionate',
 'apprehension',
 'impossible  ',
 ...]

Counting Words by First Letter

We will display the number of unique words with each possible first letter.

In [18]:
count_by_first(words)
Letter      Count
----------------
z              2
y           2134
x             21
w           9409
v            698
u           1551
t           20484
s           9444
r           2793
q            304
p           3657
o           8163
n           3029
m           6236
l           4068
k            858
j            467
i           8877
h           11605
g           2183
f           4854
e           2274
d           4683
c           4975
b           5798
a           15435
            3233
In [0]: