Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
74 views
ubuntu2004
Kernel: Python 3 (system-wide)

Analyzing Data from the App Store - Which App Genre has the Most User Engagements?

By Alexander Fernandez

We will be analyzing what kind of free, English app attracts the most users. The goal is to find what kind of app that will get us the most user engagement so that we can create our own successful app. We will use data from the IOS app store and the Google Play store.

open('AppleStore.csv')
<_io.TextIOWrapper name='AppleStore.csv' mode='r' encoding='UTF-8'>
def explore_data(dataset, start, end, rows_and_columns=False): dataset_slice = dataset[start:end] for row in dataset_slice: print(row) print('\n') # adds a new (empty) line after each row if rows_and_columns: print('Number of rows:', len(dataset)) print('Number of columns:', len(dataset[0]))
from csv import reader opened_file = open('googleplaystore.csv') read_file = reader(opened_file) android = list(read_file) android_header = android[0] android = android[1:]
explore_data(android, 2, 5)
['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']
explore_data(android, 2, 5, True)
['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'] Number of rows: 10841 Number of columns: 13
opened_file = open('AppleStore.csv') read_file = reader(opened_file) ios = list(read_file) ios_header = ios[0] ios = ios[1:]
explore_data(ios, 1, 7)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'] ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1'] ['429047995', 'Pinterest', '74778624', 'USD', '0.0', '1061624', '1814', '4.5', '4.0', '6.26', '12+', 'Social Networking', '37', '5', '27', '1'] ['282935706', 'Bible', '92774400', 'USD', '0.0', '985920', '5320', '4.5', '5.0', '7.5.1', '4+', 'Reference', '37', '5', '45', '1']
explore_data(ios, 2, 5, True)
['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'] ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1'] Number of rows: 7197 Number of columns: 16
print(ios_header) print("\n") print(android_header)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

DATA CLEANING

Because there are some apps that are duplicates, paid, or non-English apps, we will need to perform data cleaning and remove those apps from our data. Additionally, we will need to remove the apps are not free because our focus is on free apps. Reading the discussion, there seems to be an error on row 10472. We will simply delete it.

print(android[10472]) # incorrect row print('\n') print(android_header) # header print('\n') print(android[0])
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']
del android[10472]
print(android[10472])
['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']

Data cleaning: Duplicate entries

One example we find just by looking through the data is Instagram

for app in android: name = app[0] if name == 'Instagram': print(app)
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
duplicate_apps = [] unique_apps = [] for app in android: name = app[0] if name in unique_apps: duplicate_apps.append(name) else: unique_apps.append(name) print(len(duplicate_apps)) print("\n") print(len(unique_apps)) print("\n") print(len(android))
1181 9659 10840

We will get rid of the duplicates by only keeping the one that has the highest reviews. This way, we are keeping the one that is most recently updated with its reviews.

reviews_max = {} #dictionary of {name, highest review} for app in android: name = app[0] n_reviews = float(app[3]) if name in reviews_max and reviews_max[name] < n_reviews: reviews_max[name] = n_reviews elif name not in reviews_max: reviews_max[name] = n_reviews print(len(reviews_max))
9659
android_clean = [] already_added = [] for app in android: name = app[0] n_reviews = float(app[3]) if n_reviews == reviews_max[name] and name not in already_added: android_clean.append(app) already_added.append(name) print(len(android_clean))
9659
duplicate_apps = [] unique_apps = [] for app in android_clean: name = app[0] if name in unique_apps: duplicate_apps.append(name) else: unique_apps.append(name) print(len(duplicate_apps)) print("\n") print(len(unique_apps))
0 9659

Easy peazy

Data cleaning: Getting rid of non-English apps

We will create a function that will determine whether the name of an app is English or not.

def is_English(string): count = 0 for i in string: if ord(i) > 127: count += 1 if count > 3: return False return True
is_English("Poop")
True
is_English("爱奇艺PPS -《欢乐颂2》电视剧热播")
False
is_English("Instachat 😜")
True
android_english = [] for app in android_clean: name = app[0] if is_English(name): android_english.append(app) ios_english = [] for app in ios: name = app[1] if is_English(name): ios_english.append(app)
print(len(android_english)) print("\n") print(len(ios_english))
9614 6183

Data cleaning: Isolating Free Apps

Now, we will get rid of the paid apps. Luckily there's is a column that tells us the price of an app. We can use this to create a new list of free apps.

android_free_apps = [] for app in android_english: name = app[0] price = app[7] if price == '0': android_free_apps.append(app) print(len(android_free_apps))
8864
ios_free_apps = [] for app in ios_english: name = app[1] price = app[4] if price == '0.0': ios_free_apps.append(app) print(len(ios_free_apps))
3222

Determining users' favorite app genre

Now, we will create a function that will tell us the frequency of each genre in the app stores. From this, we can det

def freq_table(dataset, index): table = {} total = 0 for row in dataset: total += 1 value = row[index] if value in table: table[value] += 1 else: table[value] = 1 table_percentages = {} for key in table: percentage = (table[key] / total) * 100 table_percentages[key] = percentage return table_percentages def display_table(dataset, index): table = freq_table(dataset, index) table_display = [] for key in table: key_val_as_tuple = (table[key], key) table_display.append(key_val_as_tuple) table_sorted = sorted(table_display, reverse = True) for entry in table_sorted: print(entry[1], ':', entry[0])
display_table(ios_free_apps, -5)
Games : 58.16263190564867 Entertainment : 7.883302296710118 Photo & Video : 4.9658597144630665 Education : 3.662321539416512 Social Networking : 3.2898820608317814 Shopping : 2.60707635009311 Utilities : 2.5139664804469275 Sports : 2.1415270018621975 Music : 2.0484171322160147 Health & Fitness : 2.0173805090006205 Productivity : 1.7380509000620732 Lifestyle : 1.5828677839851024 News : 1.3345747982619491 Travel : 1.2414649286157666 Finance : 1.1173184357541899 Weather : 0.8690254500310366 Food & Drink : 0.8069522036002483 Reference : 0.5586592178770949 Business : 0.5276225946617008 Book : 0.4345127250155183 Navigation : 0.186219739292365 Medical : 0.186219739292365 Catalogs : 0.12414649286157665
display_table(android_free_apps, 1)
FAMILY : 18.907942238267147 GAME : 9.724729241877256 TOOLS : 8.461191335740072 BUSINESS : 4.591606498194946 LIFESTYLE : 3.9034296028880866 PRODUCTIVITY : 3.892148014440433 FINANCE : 3.7003610108303246 MEDICAL : 3.531137184115524 SPORTS : 3.395758122743682 PERSONALIZATION : 3.3167870036101084 COMMUNICATION : 3.2378158844765346 HEALTH_AND_FITNESS : 3.0798736462093865 PHOTOGRAPHY : 2.944494584837545 NEWS_AND_MAGAZINES : 2.7978339350180503 SOCIAL : 2.6624548736462095 TRAVEL_AND_LOCAL : 2.33528880866426 SHOPPING : 2.2450361010830324 BOOKS_AND_REFERENCE : 2.1435018050541514 DATING : 1.861462093862816 VIDEO_PLAYERS : 1.7937725631768955 MAPS_AND_NAVIGATION : 1.3989169675090252 FOOD_AND_DRINK : 1.2409747292418771 EDUCATION : 1.1620036101083033 ENTERTAINMENT : 0.9589350180505415 LIBRARIES_AND_DEMO : 0.9363718411552346 AUTO_AND_VEHICLES : 0.9250902527075812 HOUSE_AND_HOME : 0.8235559566787004 WEATHER : 0.8009927797833934 EVENTS : 0.7107400722021661 PARENTING : 0.6543321299638989 ART_AND_DESIGN : 0.6430505415162455 COMICS : 0.6204873646209386 BEAUTY : 0.5979241877256317

We see that for IOS, the most common genre is 'Games' with 58% of apps being games. For Android, the most common is 'Family' with 19% following 'Game' with 10%. Another key observation is that most apps tend to focus on entertainment rather than practical purposes, especially for IOS users. We can use that to know what kind of app we want to build.

We know which that entertainment apps have the most apps in the app stores, but which genres are the most popular? We will now figure out which genre of apps that have the most downloads using the total number of user ratings.

genres_ios = freq_table(ios_free_apps, -5) for genre in genres_ios: total = 0 len_genre = 0 for app in ios_free_apps: genre_app = app[-5] if genre_app == genre: n_ratings = float(app[5]) total += n_ratings len_genre += 1 avg_n_ratings = total / len_genre print(genre, ':', avg_n_ratings)
Social Networking : 71548.34905660378 Photo & Video : 28441.54375 Games : 22788.6696905016 Music : 57326.530303030304 Reference : 74942.11111111111 Health & Fitness : 23298.015384615384 Weather : 52279.892857142855 Utilities : 18684.456790123455 Travel : 28243.8 Shopping : 26919.690476190477 News : 21248.023255813954 Navigation : 86090.33333333333 Lifestyle : 16485.764705882353 Entertainment : 14029.830708661417 Food & Drink : 33333.92307692308 Sports : 23008.898550724636 Book : 39758.5 Finance : 31467.944444444445 Education : 7003.983050847458 Productivity : 21028.410714285714 Business : 7491.117647058823 Catalogs : 4004.0 Medical : 612.0
categories_android = freq_table(android_free_apps, 1) for category in categories_android: total = 0 len_category = 0 for app in android_free_apps: category_app = app[1] if category_app == category: n_installs = app[5] n_installs = n_installs.replace(',', '') n_installs = n_installs.replace('+', '') total += float(n_installs) len_category += 1 avg_n_installs = total / len_category print(category, ':', avg_n_installs)
ART_AND_DESIGN : 1986335.0877192982 AUTO_AND_VEHICLES : 647317.8170731707 BEAUTY : 513151.88679245283 BOOKS_AND_REFERENCE : 8767811.894736841 BUSINESS : 1712290.1474201474 COMICS : 817657.2727272727 COMMUNICATION : 38456119.167247385 DATING : 854028.8303030303 EDUCATION : 1833495.145631068 ENTERTAINMENT : 11640705.88235294 EVENTS : 253542.22222222222 FINANCE : 1387692.475609756 FOOD_AND_DRINK : 1924897.7363636363 HEALTH_AND_FITNESS : 4188821.9853479853 HOUSE_AND_HOME : 1331540.5616438356 LIBRARIES_AND_DEMO : 638503.734939759 LIFESTYLE : 1437816.2687861272 GAME : 15588015.603248259 FAMILY : 3695641.8198090694 MEDICAL : 120550.61980830671 SOCIAL : 23253652.127118643 SHOPPING : 7036877.311557789 PHOTOGRAPHY : 17840110.40229885 SPORTS : 3638640.1428571427 TRAVEL_AND_LOCAL : 13984077.710144928 TOOLS : 10801391.298666667 PERSONALIZATION : 5201482.6122448975 PRODUCTIVITY : 16787331.344927534 PARENTING : 542603.6206896552 WEATHER : 5074486.197183099 VIDEO_PLAYERS : 24727872.452830188 NEWS_AND_MAGAZINES : 9549178.467741935 MAPS_AND_NAVIGATION : 4056941.7741935486

Conclusion

In this project, we looked through the App Store and Google Play apps to find which free, English app genre are the most popular. We can use this information to come up with our own popular app. From our data, we find that games may be oversaturated in the market so it would not be good idea to create one. Instead, we look for app genres that have a lot of downloads yet not a lot of apps in the market. We conclude that an app that is some sort of book may be our best option. One possible idea is to create an app that keeps track of our favorite books that includes some social networking feature where people can talk about their favorite books and add their own comments.