Udacity CS 101: Making an Index

We’ve mentioned data structures before, and in this case its to do with how we should create our index such that it is quick to search for a particular keyword in the index, and retrieve the location of the keyword (in the form of a URL). Here a new method is introduced to us, split():

find = 'This is a sentence!'

print find.split() # will give us ['This','is','a','sentence!']

This method is useful for splitting up text by ‘space’, but as you can see it’s not very good at recognizing words and punctuation. In the example above, ‘sentence!’ is recognized as a word, so if we base our keyword search on the list returned by this method, we will miss out on the keyword ‘sentence’, because it does not have the exclamation mark! And we definitely know that they mean the word.

Also, a new construct, the triple quotes “””. They allow you to enter strings over multiple lines:

longText = """

This is a really

long string which is spread out

over a few lines


The next few chapters are about the Internet. It’s really funny how we use the Internet every day, and yet we have so little idea how it came about and how it works. So pay attention to those valuable lessons and appreciate the really smart people who invented this technology.

This Udacity post references Udacity CS 101 Unit 4 Chapters 1, 2, 3, 4, 5, 6 and 7.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s