Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Image: ubuntu2204
Analyse de données
Assignment Queries SQL et MapReduce
Introduction
In this set of exercises is composed of two parts:
A) a Relational conceptual model
B) a set of SQL queries for which you will continue using the database installed during the classes.
C) a set of MapReduce exercises
#A - Tourist tours
We need a database for a company that offers sightseeing experiences around the world. Each tourist attraction (e.g., a museum) has a name, a description and a location (city, country and geographical coordinates). The opening hours and the entry fee must be stored as well, when applicable. There might be some facilities at a tourist attraction, in which case the type of each (e.g., cafe, restroom, parking) must be specified. Tourists can book as many guided tours as they want; a tour takes place on a specific date and time, it has a fixed duration and a price. A tour might include one or several attractions. For example, a tourist may book a tour that includes a visit to the Eiffel Tower and to the Louvre. A tour is typically led by one or multiple tour guides, of which the company keeps their first and last names, contact information (phone and email address), and the languages they speak. When a tourist books a tour, the company needs to store the tourist personal information: first and last name, phone, email address, nationality, and the languages spoken. Optionally, tourists can write a review on the tours that they attended.
Exercise A.1 - Conceptual model
(a) Draw a conceptual model (entity-relationship diagram) of this database.
(b) Derive a logical model (collection of relational tables) from your conceptual model. Specify the name of each table, as well as the names of its columns, the primary key and the foreign keys.
#B - SQL queries
For each question return the query and 10 lines of the result (when the result has multiple lines).
B.1 Show the customers sorted by last name
B.2 Return the countries where there is at least one store
B.3 Return the city having the max number of customers
B.4 Return the first name and last name of the customers that live in Valencia
B.5 Return the customers (if any) that have never rented
B.6 Return the films that are most present in the stores
C. Count world repeated in the same sentence
Suppose that we have a text (one example is provided in the data .zip file) and we want to develop an algorithm that focuses on the most repeated worlds in the sentences.
At first we define what is (for us) the most repeated word
.
We want to ignore the stop words (the, and, etc.). A file that contains the words that are considered stop words is provided.
Look at this text:
The cat is on the cat table.
The table is red.
The wooden table is broken.
The cat and the other cat are with a cat.
The cat is white.
The apple has an apple color.
For this set of sentences the most repeated word is cat. The times the world is repeated in the sentences is 5 (2 times in the first sentence and 3 times in the 4th sentence).
Exercise C.1 - Count repeated word
Provide an implementation of an algorithm following the MapReduce paradigm that counts the repeated words ans shows them. The result must be presented in decreasing order (at first the words that have the most repetitions).
Describe in detail each step of the Map-Reduce algorithm.
Exercise C.2 Repetition rate
Show, using a MapRedice algorithm the repetition rate of the words. For our example the the repetition rate
of the word cat
is 2,5:
(2 repetitions + 3 repetitions) / number of sentences where repetition is present
(2 + 3) / 2 = 2,5
2 comes from sentence 1 and 3 from sentence 4. Two sentences contain repetitions for cat then the division is by 2.