This assessment is CONFIDENTIAL. ? University of Sydney. You must not share your assessment, even after

submission, until the marks are returned

Assignment 5 - Camellia sinensis

Due: 11:59PM Friday 4th June 2021 Sydney time

This assignment is worth 15% of your final assessment

Task description

In this assignment you will be implementing a simple encrypted key-value store. Your code will be a library of

functions that can be called by other programs, rather than a standalone executable. Your library will also be

assessed for performance and thread safety. Concurrent requests may be made to your dictionary, and you may

also wish to implement concurrency within your library itself.

You will already be familiar with the dictionary abstract data type which maps unique keys to corresponding

values, and supports insertion, deletion and searching. You will implement your dictionary using the B tree data

structure and encrypt stored values using a modified version of the Tiny Encryption Algorithm (TEA).

B trees

The B tree is a self-balancing tree that stores key-value pairs in its nodes. When we refer to “keys” below, we

implicitly also mean the value that each key is linked to. Comparisons on keys only operate on keys, not their

linked values.

Each node may have a varying number of children, which is dictated by the branching factor b, a positive integer

such that b ≥ 2. b is a fixed property of the entire tree at creation. Each node obeys the following rules:

? The value n of every node obeys ?b/2? ≤ n ≤ b and for internal nodes, n is the number of children they

have.

? The exception is the root node. If it is a leaf, it obeys n ≤ b. If it is not a leaf, it obeys 2 ≤ n ≤ b .

? Every node has n - 1 keys. This rule also applies to leaf nodes; they do not have n children, but they still

contain n - 1 keys where n satisfies the rules above.

Every key exists in exactly one node in the tree. The keys in each node are ordered and for internal nodes,

partition their children into an ordering. Suppose the n - 1 keys of an internal node satisfy the order k0 < k1 < …

< kn-2 , and suppose the n children are C0

, C1 … Cn-1 . Every key in a child node Ci

is > ki-1 and < ki

. Every key

in the first child (C0

) is < k0

and every key in the last child (Cn-1) is > kn-2 . We will use the terminology that the

key ki

separates the children Ci

and Ci+1 and that the child Ci

is separated by the keys ki-1 and ki

(or only one

key if it is the first or last child).

Searching a B tree

To locate a key K in a B tree, follow this algorithm:

1. Starting from the root node, if the key is present in this node, the key-value pair is returned and we are

done. If it is not present, identify the correct child to consider next by comparing K with the node’s keys.

If ki-1 < K < ki

, then the correct child is Ci

.

2. Move to Ci

, and repeat step 1 until the node containing K is found.

3. If a leaf node is reached and K is not found, return key not found.

1

Example of B tree and searching

Below is a B tree with branching factor 4. Keys are presented ordered, note how the keys in each node separate

the children according to the keys in the children (separators indicated by the source of each edge at the parent).

Each key is linked to its value; this is not shown in the diagram.

To search for the value 17 in this tree, start from the root. 17 > 7, so move to the child with keys 13 and 19. 13

< 17 < 19, so move to the second child. Here we find the key 17.

Inserting into a B tree

To insert a key K with value V into a B tree, follow this algorithm:

1. First, follow the searching algorithm to search for K in the tree. It is an error if K already exists in the tree.

Identify the leaf node that would contain K.

2. Insert K and V into the node, maintaining the ordering of keys

3. If the new number of keys in the node does not exceed the limit b - 1, the insertion algorithm is complete

4. If the number of keys exceeds the limit b - 1, identify the median of the keys including the new key, Kmedian.

If the total number of keys is even, choose the smaller of the two middle keys as Kmedian. Split the node

into two new sibling nodes of the same parent with the new left node containing all keys < Kmedian and the

new right node containing all keys > Kmedian. Move Kmedian into the parent node’s keys. Note that Kmedian

now separates the two new nodes.

5. If the parent node’s number of keys also exceeds the limit, then repeat step 4 recursively up the tree. As

internal nodes are split, their children are also relinked as appropriate to the new sibling nodes.

6. If the recursive splitting reaches the root node, it is split into two siblings, and the Kmedian value is inserted

into a new root as the only key. The new root has the two new nodes as its only children. The height of

the entire tree therefore increases by 1.

Example of insertion into a B tree

We want to insert the key 21 into the B tree below with branching factor 4. A search for 21 reveals the target

node for insertion is the one containing the keys 17, 19, 20. Insertion of 21 would cause this node to overflow.

Kmedian is 19 (note the even number of keys, so choose the smaller of the 2 middle keys). 19 is moved to the

parent node, and the remaining keys are split into two siblings (node with 17, and node with 20, 21).

The key 19 is moved to the parent, the root, which also overflows and needs to split. The Kmedian, 7, is moved up

to become a new root, separating the new siblings (node with 3, and node with 13, 19).

2

Deletion from a B tree

To delete a key K from a B tree, follow this algorithm:

1. First, follow the searching algorithm to search for K in the tree. It is an error if K does not exist in the tree.

2. If the node containing K is an internal node, suppose that C is the left of the 2 child nodes that K separates.

Swap K with the largest key (Knew) in the entire subtree rooted at C. Note that Knew must reside in a leaf

node, so after the swap K now resides in this leaf node. Note that now Knew also correctly separates the

same 2 child nodes as K did.

3. If K was in an internal node, note that K is now in a leaf node and this is identical to the case where K was

originally in a leaf node. Delete K from the leaf node. If the leaf node still satisfies the minimum number

of keys, the deletion algorithm is complete.

4. After deletion, the leaf node (call this the target node) may now have less than the minimum number of

keys permitted. Suppose that the target node is separated by the keys Kleft and Kright in the parent. Firstly,

check if the immediate left sibling of the target node (by order) has more than the minimum number of

keys. If it does, in the parent replace Kleft with the largest key in the left sibling (call this Kleftchild), and

move Kleft into the target node so that it has the minimum number of keys required. If the immediate

left sibling is unsuitable, if the immediate right sibling has more than the minimum number of keys, in

the parent replace Kright with the smallest key in the right sibling (call this Krightchild), and move Kright into

the target node. If the target node is the leftmost or rightmost child of the parent, skip the check of the

immediate left or immediate right siblings respectively. If a suitable sibling was identified and the keys

moved according to this step, the deletion algorithm is complete. If a suitable sibling is not identified,

continue to step 6.

5. For future recursive steps, step 4 may need to be performed on a target node that is internal as follows. If

the left sibling has more than the minimum number of keys, let Cleft be its “largest” child, which is separated

in the left sibling by Kleftchild. Note that, due to the insertion algorithm, all keys in Cleft are also < Kleft. Then,

according to step 4, Kleft is moved into the target node, and Kleftchild is moved into the parent. If the target

node is internal, also move the child (and subtree rooted at) Cleft so that it is now the leftmost child of

the target node separated by the new key Kleft. Likewise if we are considering the right sibling, let Cright

be the “smallest” child of the right sibling, which is separated by Krightchild. Kright is moved into the target

node, Krightchild is moved to the parent, and Cright (and its subtree) is moved to become the rightmost child

of the target node, separated by Kright. Again, if this step completes successfully, the deletion algorithm

is complete.

6. If there is no immediate sibling of the target node that has more than the minimum number of keys,

merge the target node with an immediate sibling. Always pick the left sibling, unless the target node is

the leftmost child of the parent, in which case pick the right sibling to merge with. Move all keys and

children from the sibling to the target node. Additionally, move the parent key that separates the target

node and the merged sibling, into the target node. That is, if we are merging the target node with its left

sibling, then Kleft must be moved from the parent into the new merged node. Similarly, move Kright into

the new merged node if required. The parent has now lost a key. If it has less than the minimum number,

3

repeat steps 4-6 recursively up the tree. If it satisfies the minimum number of keys, the algorithm is

complete.

7. If the root node is reached and has less than the minimum number of keys (recall the minimum number

of keys for the root specifically is 1), simply delete the root node. The new root is the merged child node

produced from step 6. The height of the entire tree therefore decreases by 1.

Example of deletion from a B tree

We want to delete the key 2 from the B tree below with branching factor 4. Removal of the key 2 from its leaf node

causes the node to underflow. The only immediate sibling, node containing 5, is also at its minimum number

of keys. Therefore by step 6, we merge with the right sibling containing 5 and the key 3 from the parent, as 2

was the leftmost child. The parent node has now underflowed (referred to from here on as the target node). Its

right sibling, node containing 13, 19, has more than the minimum number of keys. This is an internal node, so

following step 5, we move Kright (7) into the target node, Krightchild (13) into the parent of the target node, and

Cright (containing the node with key 11) to become the rightmost child of the target node. The target node now

has key 7 which separates node containing 3, 5 and node containing 11.

Tiny Encryption Algorithm (modified)

Encryption is simply a function (“cipher”) which transforms input data to be encrypted (“plaintext”) into encrypted

output data (“ciphertext”) under the control of some additional data (“key”). The operation can be

reversed, which for our purposes takes the ciphertext and applies a decryption function with the same key data,

to produce the same original plaintext. TEA is a 64-bit block cipher with 128-bit key size. This means the encryption

function always operates on a fixed size block of 64 bits of plaintext and 128 bits of key and outputs 64 bits

of ciphertext. The decryption function accepts 64 bits of ciphertext and 128 bits of key and outputs 64 bits of

plaintext.

The encryption function has the following pseudocode:

// plain contains the 64 bit plaintext

uint32_t plain[2] // Represent 64 bit plaintext as array of 2 uint32_t

uint32_t cipher[2] // Represent 64 bit ciphertext as array of 2 uint32_t

uint32_t key[4] // Represent 128 bit key as array of 4 uint32_t

sum = 0

delta = 0x9E3779B9

cipher[0] = plain[0]

4

cipher[1] = plain[1]

loop 1024 times:

sum = (sum + delta) mod 2^32

tmp1 = ((cipher[1] << 4) + key[0]) mod 2^32

tmp2 = (cipher[1] + sum) mod 2^32

tmp3 = ((cipher[1] >> 5) + key[1]) mod 2^32

cipher[0] = (cipher[0] + (tmp1 XOR tmp2 XOR tmp3)) mod 2^32

tmp4 = ((cipher[0] << 4) + key[2]) mod 2^32

tmp5 = (cipher[0] + sum) mod 2^32

tmp6 = ((cipher[0] >> 5) + key[3]) mod 2^32

cipher[1] = (cipher[1] + (tmp4 XOR tmp5 XOR tmp6)) mod 2^32

// cipher now contains the 64 bit ciphertext

The decryption function has the following pseudocode:

uint32_t cipher[2] // Represent 64 bit ciphertext as array of 2 uint32_t

uint32_t plain[2] // Represent 64 bit plaintext as array of 2 uint32_t

uint32_t key[4] // Represent 128 bit key as array of 4 uint32_t

sum = 0xDDE6E400

delta = 0x9E3779B9

loop 1024 times:

tmp4 = ((cipher[0] << 4) + key[2]) mod 2^32

tmp5 = (cipher[0] + sum) mod 2^32

tmp6 = ((cipher[0] >> 5) + key[3]) mod 2^32

cipher[1] = (cipher[1] - (tmp4 XOR tmp5 XOR tmp6)) mod 2^32

tmp1 = ((cipher[1] << 4) + key[0]) mod 2^32

tmp2 = (cipher[1] + sum) mod 2^32

tmp3 = ((cipher[1] >> 5) + key[1]) mod 2^32

cipher[0] = (cipher[0] - (tmp1 XOR tmp2 XOR tmp3)) mod 2^32

sum = (sum - delta) mod 2^32

plain[0] = cipher[0]

plain[1] = cipher[1]

// plain now contains the 64 bit plaintext

Treat all integers in the TEA algorithms above as little endian.

Encrypting arbitrary data with TEA

For this assignment, you will extend the basic TEA algorithm, which operates on 64 bit blocks of data, to encrypt

arbitrary sized data. You will do this by using the “counter mode” of block cipher operation (“TEA-CTR”). Suppose

that the basic TEA encryption function is TEA_encrypt(plaintext, key), which accepts 64 bit plaintext, 128

bit key, and returns 64 bit ciphertext. The inputs are P (data to be encrypted of an arbitrary number of bytes

in size), key (128 bit key), and nonce (64 bits of data which is provided by the caller). First, divide P into 64

bit chunks: P[0], P[1], … ; the final chunk may contain less than 64 bits of plaintext. For the purposes of the

algorithm, fill the remainder of the final chunk with zero bytes.

The output of the algorithm is C, an array of 64 bit chunks C[0], C[1], … of the same number of chunks as P.

Truncate the final chunk of C such that it is of the same length as the original final chunk of P. Therefore, the final

C is exactly the same size as the original input P.

The following pseudocode describes TEA-CTR encryption:

5

uint64_t P[number of chunks] // Data to be encrypted as 64 bit chunks

uint32_t key[4] // 128 bit key which is provided

uint64_t nonce // 64 bit "nonce" data which is provided

for i in 0 ... number of chunks:

tmp1 = i XOR nonce

tmp2 = TEA_encrypt(tmp1, key)

C[i] = P[i] XOR tmp2

Decryption for TEA-CTR is extremely similar to the encryption algorithm and can be quickly deduced from the

above.

Dictionary interface

The keys of your dictionary are unsigned 32-bit integers. Each value corresponding to a key will contain the

following elements:

? Size of the actual data for this value in bytes (the maximum size is 232-1 bytes)

? The encryption key

? The “nonce” value used for TEA-CTR

? The actual encrypted data for this value

These values need to be associated with their key, but how and where exactly you store this data is up to you.

The implementation of the main B tree is also up to you, and you may use any other data structures you wish.

There are functions you need to implement that will inspect the correctness of your B tree structure.

Functions to implement

Implement the following functions for your library. Do not write any main() function. Other programs will directly

call your functions.

void * init_store(uint16_t branching, uint8_t n_processors);

This function will be called exactly once before any other functions are called.

Initialise any data structures and perform any preparation you wish. Return a pointer (void *) to a memory

area where you store data you wish to use for other functions. For all other functions, this pointer will be passed

in for you to use as the void * helper parameter. The branching parameter gives the branching factor b

for your B tree. The n_processrs parameter gives you an indication of the number of physical processors you

have access to. You have the opportunity to setup any concurrency at this point to use in further operations.

void close_store(void * helper);

This function will be called exactly once after the end of all other functions. You need to perform any cleanup

required of your library, including deallocating dynamic memory.

int btree_insert(uint32_t key, void * plaintext, size_t count, uint32_t encryption_key[4],

uint64_t nonce, void * helper);

Insert key into your B tree with the given plaintext data of size count bytes. You must encrypt the contents

of plaintext using a single instance of the TEA-CTR algorithm with the parameters given in encryption_key

and nonce and store the encrypted result in your tree. You cannot store the pointer plaintext itself in your

tree, it may not remain valid. Return 0 if the insertion is successful. Return 1 if the key already exists. For error

cases, the B tree should remain unchanged and ready for further requests.

6

int btree_retrieve(uint32_t key, struct info * found, void * helper);

Search for key in the B tree and fill its value into the struct info pointed to by found. For the data field,

return a pointer to the encrypted data, do not decrypt it. If successful, return 0, otherwise, return 1. For error

cases, the B tree should remain unchanged and ready for further requests. struct info is the following:

struct info {

uint32_t size; // Size of actual stored data in bytes

uint32_t key[4]; // Encryption key

uint64_t nonce; // Nonce data

void * data; // Pointer to the encrypted stored data

};

int btree_decrypt(uint32_t key, void * output, void * helper);

Search for key in the B tree and decrypt its data, copying the plaintext to output, which you can assume has

sufficient space for the size of the data stored for this key. The data in the tree must remain encrypted. If

successful, return 0, otherwise return 1. For error cases the B tree should remain unchanged and ready for

further requests.

int btree_delete(uint32_t key, void * helper);

Delete the key and its value from the B tree. If successful, return 0, otherwise return 1. For error cases the B

tree should remain unchanged and ready for further requests.

uint64_t btree_export(void * helper, struct node ** list);

Export the B tree as an array of struct node (described below). The array must contain the nodes in your B

tree, ordered as a preorder traversal. Each node in your tree corresponds to a struct node which contains the

number of keys it has and an ordered array of the keys. Your function must cause *list to point to the first

element of your array. Your array must be dynamically allocated; the caller will call free() on your array. The

return value must be the number of nodes in your B tree. If the B tree has no nodes, then *list can be NULL

and free() will not be called.

struct node {

uint16_t num_keys; // The number of keys in this node

uint32_t * keys; // The keys in this node, ordered

};

An example of how btree_export would be tested follows:

// This is testing code

struct node * list = NULL;

uint64_t num = btree_export(helper, &list);

// num and elements of list e.g. list[i] will be inspected

free(list); // expected that list will point to dynamic memory

void encrypt_tea(uint32_t plain[2], uint32_t cipher[2], uint32_t key[4]);

Using TEA, encrypt the 64 bit plaintext provided in plain, using the 128 bit key provided in key. Write the 64

bit ciphertext output to cipher.

void decrypt_tea(uint32_t cipher[2], uint32_t plain[2], uint32_t key[4]);

Using TEA, decrypt the 64 bit ciphertext provided in cipher, using the 128 bit key provided in key. Write the

64 bit plaintext output to plain.

7

void encrypt_tea_ctr(uint64_t * plain, uint32_t key[4], uint64_t nonce, uint64_t * cipher,

uint32_t num_blocks);

num_blocks blocks of 64 bit input plaintext are provided in plain. Using TEA-CTR, encrypt this input using the

128 bit key provided in key and the 64 bit nonce. Write the output ciphertext blocks to cipher which you can

assume has enough space.

void decrypt_tea_ctr(uint64_t * cipher, uint32_t key[4], uint64_t nonce, uint64_t * plain,

uint32_t num_blocks);

num_blocks blocks of 64 bit input ciphertext are provided in cipher. Using TEA-CTR, decrypt this input using

the 128 bit key provided in key and the 64 bit nonce. Write the output plaintext blocks to plain which you can

assume has enough space.

Note on synchronisation

The tests performed on your library functions may be multithreaded; after initialisation, any functions may be

called concurrently. For example, an insertion may be requested while a retrieval is proceeding in another

testing thread. All your library functions are expected to behave atomically. You must implement synchronisation

primitives to ensure concurrent requests are handled correctly and not interleaved. It is the caller’s responsibility

to ensure that they order their calls correctly to achieve the desired result, but it is your responsibility to ensure

that concurrent operations behave sequentially and atomically. For performance reasons, you may also wish to

multithread internally within your own functions. This is separate from your code being tested concurrently.

Test Cases

You must write your own test cases for this assignment. Details on execution and marking of your test cases are

included below.

Code Submission and Marking Criteria

Only the last submission will be graded. Late submissions will incur the late penalty described in the Unit of

Study outline. Submit your assignment on Ed via git. It must compile and run on the Ed submission system.

For Ed testing, your code will be compiled to a static library (libbtreestore.a) with the Makefile rule make

correctness; a scaffold is provided.

For Ed testing, your code will compile with these options:

-O0 -Werror=vla -std=gnu11 -g -fsanitize=address -pthread -lrt -lm

Performance testing will also be performed, for which your code will compile with the Makefile rule make

performance, with these options:

-O0 -march=native -Werror=vla -std=gnu11 -pthread -lrt -lm

Your library must declare its functions in the header btreestore.h (provided in scaffold). The scaffold contains

tests.c which includes sample main() code that calls your library functions, which you can optionally use for your

testing. For testing purposes, make tests will compile your library. Your test cases are expected to link your

library for testing. Test cases you write must be executed by make run_tests, which should report back on

their results in a human readable format. You can modify your Makefile, but you cannot change the compiler or

remove compilation flags for make correctness or make tests. You cannot change any compilation flags for

make performance. Remember that all testing will be performed against the static library libbtreestore.a.

8

You may not use variable length arrays or alloca.

Failing to adhere to these rules may cause you to receive 0.

Marking criteria follow (15 marks total):

? 8 marks for correctness (passing Ed automatic test cases). Some test cases will be hidden and not available

before or after the deadline. The total number of tests and the proportion of public versus hidden tests

are not specified.

? 5 marks for performance. You are only eligible for these marks if you pass all the correctness test cases.

Submissions will be compared against defined cutoffs, with increasing amounts of credit available for

better speed of execution against a range of test cases.

– Your code will be assessed for performance by recompiling and running on a physical computer with

an Intel Core i7-8700 CPU and 16GB of RAM with minimal other processes running. The operating

system will be a Debian Linux 10.9 installation, with the following system software versions: Linux

kernel 4.19, GNU C library 2.28, GCC 8.3. More details about performance marking will be provided

at a later date.

? 2 marks for your own automated test cases, with the library built with make tests and your test cases

built and/or executed with make run_tests. The proportion of marks for this section will be capped to

the proportion of marks that you achieve out of 8 for the correctness portion.

? There is no oral session

Warning: Any attempts to deceive or disrupt the marking system will result in an immediate zero for the entire

assignment. Negative marks can be assigned if you do not follow the assignment problem description or if your

code is unnecessarily or deliberately obfuscated.

Hints

? You are allowed to use all features of the specified CPU, including SIMD instructions. Please note for

correctness your code must still compile and run on Ed.

Academic Declaration

By submitting this assignment you declare the following:

I declare that I have read and understood the University of Sydney Student Plagiarism: Coursework Policy and

Procedure, and except where specifically acknowledged, the work contained in this assignment/project is my

own work, and has not been copied from other sources or been previously submitted for award or assessment.

I understand that failure to comply with the Student Plagiarism: Coursework Policy and Procedure can lead to

severe penalties as outlined under Chapter 8 of the University of Sydney By-Law 1999 (as amended). These

penalties may be imposed in cases where any significant portion of my submitted work has been copied without

proper acknowledgment from other sources, including published works, the Internet, existing programs, the work

of other students, or work previously submitted for other awards or assessments.

I realise that I may be asked to identify those portions of the work contributed by me and required to demonstrate

my knowledge of the relevant material by answering oral questions or by undertaking supplementary work, either

written or in the laboratory, in order to arrive at the final assessment mark.

I acknowledge that the School of Computer Science, in assessing this assignment, may reproduce it entirely, may

provide a copy to another member of faculty, and/or communicate a copy of this assignment to a plagiarism

checking service or in-house computer program, and that a copy of the assignment may be maintained by the

service or the School of Computer Science for the purpose of future plagiarism checking.

9

版权所有：留学生程序辅导网 2020,All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。