Network Analytics, Homework 0

General Instrictions:

? For each problem, you will write the output and save it to a file named with the problem

label. For example, for Problem 1, your filename should be problem1.out. If your file

name deviates from this convention, our automated grader will skip your file without

grading it.

? In your code, assume that the input file is in the same directory as your script.

? Once you complete the assignment, make a compressed zip or compressed tarball (if you

are familiar with unix shell) named submission.zip or submission.tgz, containing

all the scripts, answers, and output you have produced. Submit your compressed tarball

to Gradescope. The entry code for this course is MB6ERE. Your compressed package

should contain no directories.

Python Scripting: Six degrees of Separation

Six degrees of separation is the theory that everyone is on average six steps away, by way of

introduction, from any other person in the world, so that a chain of ”a friend of a friend”

statements can be made to connect any two people in a maximum of six steps.

This theory was made popular by the game ”Six Degrees of Kevin Bacon” where the goal is to

link any actor to the famous actor Kevin Bacon through no more than six connections, where

two actors are directly connected if they have appeared in a movie or commercial together.

In this exercise we are going to build an engine that returns someone’s “Kevin Bacon Number”

in Python.

0.1 Dataset

We observed 100 college students over a period of 2 years. We recorded 500 occasions in

which students form parties to go eat in a restaurant in Ithaca, NY. Download the file

restaurants.txt from the course website, which lists these events, one per line, where each

line lists the names in the party and the associated restaurant. Each line has the following

format:

1

NYU Shanghai, Fall 2020; Homework 0 October 30, 2020

name1,name2,...,nameN;restaurant

Each party size varies from 1 to 9.

0.2 Building social networks

A social network is a construct to study relationships between individuals, groups, organizations,

or even entire societies. Social network models describe a social structure determined

by interactions and enable the understanding of social phenomena through the properties of

relations between individuals, instead of the properties of the individuals themselves.

In this assignment, we are going to investigate some forms of social networks using our

restaurant dataset.

The first model is a network built on the concept of affiliation networks. The idea behind

affiliation networks is that acquaintanceships among people often stem from one or more

common or shared affiliations –— living on the same street, working at the same place, being

fans of the same football club, etc. We define a relationship between two people as their

common affiliation to some entity. Therefore, we can model the network with two sets, one

corresponding to the population of interest, and the other to the entities that they connect

to. The entities do not have any connections among themselves. Moreover, people do not

directly connect to each other. We establish the existence of a relationship between two

people if both are affiliated to a common entity.

? problem1.out (10 pts): Write python code that builds the affiliation network of customers

to restaurants. Each line of the output should have the following format:

Restaurant: customer1 customer2 ... customerN

where Restaurant is the name of each restaurant that appears in the data, and

customer1 customer2 ... customerN is the list of people who have eaten in the

restaurant, according to our data, without repetition.

Another way to build a social network is by linking people in dyads using some definition of

friendship. We start by defining that two people are friends if they ever dined together at a

restaurant. For reference, let us call this definition dyad 1 (the number 1 refers to the fact

that the two parties were seen dinning together at least once).

? problem2.out (10 pts): Write python code to build all the dyadic relationships between

two people using the dyad 1 definition. Each line of your output should have

the following format:

2

NYU Shanghai, Fall 2020; Homework 0 October 30, 2020

name1 name2

where name1 has been to a restaurant with name2. The dyads are unordered (in networks

language, we say that the relationship is undirected), so “name1 name2” and “name2

name1” are the same pairs and should appear only once in the output.

We can think of a stricter definition of friendship where we connect two people if they have

dined together at a restaurant at least k times (where k is a positive integer). Let us name this

definition dyad k. The rationale behind this definition is that people may dine occasionally

with acquaintances, but the repeated observation of two people dinning together may establish

a stronger relationship.

? problem3.out (10 pts): Write python code to build all the dyadic relationships between

two people using the dyad 3 definition. Each line of your output should have

the same format as the preceding problem.

The degree of an actor is defined as the number of connections (or friends) it possesses.

? problem4.out (10 pts): Write python code to compute the degree of each node using

the dyad 1 definition of friendship. Each line of your output should be formatted as:

name degree

To check whether our previous solution is correct we can compare the number of dyads

produced by problem2.out with the sum of all the degrees produced by problem4.out.

? question1.txt (10 pts): Write in the first line of this file how many times each dyad

contributes to the sum of degrees. How many times the number of dyads should the

sum of degrees equals to? To test whether your outputs produced in problem2.out

and problem4.out match your answer, write the total number of dyads and the sum

of degrees in the second and third lines of this file, respectively.

0.3 Network Algorithms

We will break the task of find the network distance between two people into two simpler

steps.

Step 1: An adjacency list representation of a network is a collection of unordered lists, one

for each vertex in the graph. Each list describes the set of neighbors of its vertex. The main

operation performed by the adjacency list data structure is to report a list of the friends of

a given actor. In other words, the total time to report all of the neighbors of an actor is

proportional to its degree.

3

NYU Shanghai, Fall 2020; Homework 0 October 30, 2020

? problem5.out (20 pts) : write Python code to build and output an adjacency list from

the restaurants.txt file in Python using the dyad 1 definition.

Here is an example. If your input, e.g., restaurant.txt file has the following lines:

A,B;R1

B,C;R2

C,A;R1

your script should produce the following “Actor: Adjacent to” lines:

A: B C

B: A C

C: A B

(Hint1: use nested dictionaries to build the adjacency list. For example,

if ”Alice” and ”Hussam” are friends, then you would have the entries

{"Alice":{"Hussam": 1}, "Hussam":{"Alice": 1}} in your dictionary. If your dictionary

is called friends, then friends["Alice"]["Hussam"] returns the value 1 if Alice

and Hussam are friends. Otherwise, either "Alice" is not a key of friends or "Hussam" is

not a key of friends["Alice"], or vice-versa, and it returns a key error.)

(Hint2: If you want to access friends["Alice"]["Hussam"], remember to initialize both

friends and friends["Alice"] as dictionaries before its first use so as to avoid a key error.

It is easy to initialize friends, but it can be challenging to initialize friends["Alice"]

as you don’t know whether "Alice" will be in the dataset. You avoid this problem

by calling the function friends.setdefault("Alice", {}) before using the dictionary

friends["Alice"]. This function does nothing if friends["Alice"] is already a dictionary,

but initializes friends["Alice"] as a dictionary if it is not.) Another way to accomplish

this is to use the defaultdict library of the package collections.

Step 2: Now, we are interested in computing the distance between a given actor and all

other actors. To this end, we will use a network search algorithm called Breadth First Search

(BFS).

BFS begins with a given actor, whom we call the root, and then inspects all of his or her

friends. These friends have distance 1 from the root. Then for each of those friends in turn,

it inspects their unvisited friends. These “friends of friends“ have distance 2 from the root,

and so on.

BFS can be implemented using an adjacency list and a queue. Here are the steps the algorithm

performs:

4

NYU Shanghai, Fall 2020; Homework 0 October 30, 2020

1. Enqueue the root node, set the distance of root to zero, and mark root as visited.

2. Dequeue a node (let’s call this node n).

? enqueue all friends of n that have not yet been visited

? set the friends distance to the distance of n plus one

? mark the friends of n as visited

3. If the queue is empty, then the algorithm exits. Otherwise repeat from Step 2.

? problem6.out (30 pts) : write a Python function to compute the distance of all foodies

in restaurants.txt who are reachable from a given root. The first argument of your

function should be the name of the root, e.g., ”Beula”, and the second, the input file

name that encodes de network.

Here is an example. If your restaurants.txt file looked like

Alice,Hussam;CTB

Harsh,Hussam;Subway

Harsh,Atheendra;CTB

Harsh,Sangha;Subway

Then, if you run your function with the root ”Alice”, your output should be (not necessarily

sorted by distance):

Alice 0

Hussam 1

Harsh 2

Atheendra 3

Sangha 3

In this exercise you will output, for every person in the dataset, their average distance between

themselves and all other actors. The format of your output should be:

actor1 avg1

actor avg2

...

actorN avgN

5

NYU Shanghai, Fall 2020; Homework 0 October 30, 2020

(Hint1: The restaurants.txt dataset contains a group of students who attend the same college

in a small town area. Therefore, their distance to one another is expected to be small.)

(Hint2: Use a list to implement the queue. You might find these two functions useful.

? list.append(x) : inserts element x at the end of a list.

? list.pop(i) : returns and deletes element at index i from list.)

(Hint3: Use a dictionary to keep track of the actors that have been visited by BFS and their

distance from the root, i.e., dist= {"Alice": 0, "Hussam" : 1})

Important: After you finish this assignment, please fill out the course survey. The link is in

the calendar section of the course website.

6

版权所有：留学生程序辅导网 2019 All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。