Optimal allocation of students to modules

The problem

One of my tasks as head of our final undergraduate year on our Psychology MA/BSc programme is to allocate students to elective modules. We have in the order of 80-100 undergraduates in our final year, and ignoring their undergraduate dissertation, most of them must take 3 modules. There are exceptions however for joint honours students. One very easy solution to this allocation task would be to simply allocate students to their top 3 modules. However, because our faculty numbers are on the (ahem) low side, and that some modules are much more popular than others, this would result in some modules with very large numbers of students, and others without enough students in them to be feasible. Because of various workload concerns and our particular situation, we have decided to enforce roughly equally sized modules. So the task of allocating students to modules becomes quite tricky.

Previous solutions

Before I took over this job, about 8 years ago, we operated a heuristic allocation strategy which was essentially: by hand, give students 2 modules that they have ranked highly, but satisfy the constraints by giving each student a relatively low ranked module. This was dubbed the “equal-unhappiness” approach.

When I took over the job, I tried to aim a bit higher in giving students more preferred modules. I still used a heuristic strategy however, and this is described in a short piece (Allocating students to elective modules [pdf]) which I wrote in The Higher Education Academy Psychology Newsletter. Essentially, we got an excel spreadsheet of students, along with their stated module rankings. We sorted students from highest average performance in 3rd year. To allocate the first module, we went from the top of the list, allocating them to their highest ranked module that was not yet full. Because we do module 1 allocations for all students, then move on to module 2 allocations etc, this meant that we have some reward to students who had done well in their 3rd year, but everyone on the list stood a good chance of getting a high ranked modules. We did not give the top performing students their top choices, then proceed to give the others their choices; that would have been pretty unfair.

This allocation system worked reasonably well for many years. I should stress that many students did extremely well out of this approach, with some getting their 1st, 2nd, 3rd choices, some getting 2nd, 4th, 6th, and so on. So this process was basically a success and I was very happy with that approach. There were however a few small problems.

Because of the uneven module popularity, there were always just a small group of students who ended up getting allocated to quite low ranked modules. This seemed an inevitable by-product of operating under constraints, but as time went on this became more of an issue.

The new, optimal, solution

I made a bit of a breakthrough when I realised that this module allocation task is not a unique problem, but is an example of a more general class of problems called optimisation under constraints. If I was a mathematician or an engineer then this would have been immediately apparent, but I’m a Cognitive Scientist, so it wasn’t. Regardless, recognising this meant that I could tap into sophisticated methods for solving these types of problems.

This is a completely new approach. Rather than slightly refining heuristic allocation procedures, the approach is to just flat out find the optimal solution. The art of this is figuring out how to pose your problem in a general manner so that a computer can find the optimal solution.

First, we define what we want to optimise. For this problem our task is to construct a binary allocation matrix which has rows equal to the number of students, and columns equal to the number of modules. This matrix is full of either zeros or ones, indicating whether the student (row) is not or is taking a module (column), respectively.

We are going to illustrate this using Matlab code. This first line defines the allocation matrix of the correct size, and this is our optimisation variable.

allocation = optimvar('allocation', [n_students, n_modules],...
    'Type','integer', 'LowerBound',0, 'Upperbound',1);

Now we define our objective function – this is what we want to optimise. First, we have another matrix of rankings, with rows equal to the number of students, and columns equal to the number of modules. This ranking data is collected from the students. A ranking of 1 is their most favoured, down to 13-14, depending on the number of modules available. Therefore we want to find a set of allocations such that the total rankings of allocations are as low as possible. We can do this with the following line which describes an objective.

problem = optimproblem('ObjectiveSense', 'minimize',...
    'Objective', sum(sum(allocation.*rankings)) );

The fact that the allocation matrix is full of zeros (not allocated) and ones (allocated) is very handy. It means we can do an element-wise multiplication which will give us the set of rankings of modules for which allocations have been made. Taking the sum of this then gives a total score (or objective) which we want to minimise.

Before we try to find the optimal allocation matrix, we have to define our constraints. This was quite a fun process in fact, and felt like there was a bit of an art to formulating constraints in a way that you can express mathematically. Our constraints were:

  1. All students need to take the required number of modules. This is a vector with an entry for each student, imported from excel, and called n_allocs_needed.
  2. No module should have more than the maximum number of students specified for that module. This is now a vector max_class_size with one entry per module. We only have a couple of modules with specific class size caps, because they involve student placements etc, so the rest of the max class sizes are set at a value that works given the number of students and modules that given year.
  3. Equally, no module should have fewer than the minimum number of students. This is important as running modules with very few students is not feasible. Again this is a vector of values imported from excel, called min_class_size.
  4. No student should be allocated to 3 modules which run in the same semester. This is to avoid one semester being devoid of studies (except their dissertation) and the other being very busy. This took a while to figure out how to express mathematically, but I ended up with a good solution. We have another vector, semester_run, with an entry for each module with values either 1 or 2, corresponding to the semesters.
  5. All students must have a set of allocations such that their total rankings for those allocated modules is less than a certain threshold. This constraint is put in in order to avoid the potential for a globally better ranking solution at the expense of a few students getting very low ranked modules.
  6. No student can take both PY40018 and PY40029. These modules are anti-requisites, so this constraint was included. In order to do this we define PY40018 and PY40029 as indices into the appropriate columns of the allocation matrix.

These constraints were added to the optimisation problem in the following lines.

% must be allocated to the correct number of modules
problem.Constraints.constr1 = sum(allocation,2) == n_allocs_needed;

% must not have more than a certain number of students in each module
problem.Constraints.constr2 = sum(allocation,1) <= max_class_size;

% all classes have more than the minimum number of students
problem.Constraints.constr3 = sum(allocation,1) >= min_class_size;

% all students should have total ranking sum <=11
problem.Constraints.constr4 = sum(allocation.*rankings,2) <= 12;

% NOT all modules in the same semester
problem.Constraints.constr5 = sum(allocation(:,semester_run==1),2) <=2;
problem.Constraints.constr6 = sum(allocation(:,semester_run==2),2) <=2;

% PY40018 and PY40029 are anti-requisites
problem.Constraints.constr7 = sum(allocation(:,[PY40018,PY40029]),2) <=1;

The final step is to solve the optimisation problem. I was expecting this to take 10’s of minutes, perhaps hours, as the number of possible allocation matrices is very large. However this computes in something ridiculous like 0.1 second.

[solution, fval, exitflag, output] = solve(problem);

So we can check that a solution was actually found by looking at the exitflag. This is very important. It is entirely possible to define a set of constraints such that there is actually no solution. But assuming this as all worked, then we have our optimal binary allocation matrix in the variable solution.allocation.

Evaluating the approach

Before jumping in to using this for the 2018-19 module allocations, it is important to check whether this actually works. To do this I constructed an overall score for each student. This is the sum of the ranks of the modules allocated to, and you want this to be as low as possible, representing allocation to higher ranked modules. The lowest score is either 3 (ie 1+2) for students taking 2 modules, or 6 (ie 1+2+3) for the majority of students taking 3 modules. I then plotted the histogram of these scores over all the students for the actual set of allocations I made in 2017-18, and the hypothetical allocations if the new procedure were used.

The end result is really very encouraging. We can see that the actual allocations* involved a very broad range of scores, thus levels of happiness. The new procedure however entirely takes out the bottom half of this distribution, meaning that a vast number of students will be much happier with their module allocations under this new procedure. The distribution under the new method also has much less variance, meaning much less inequality in allocations and thus happiness.

Summary

So in short, we will be moving away from a heuristic method of allocating students to elective modules. Instead, we can treat this task as optimisation under constraints. As such, we can leverage powerful algorithms to produce an optimal set of module allocations. Let’s say that again… using this approach allows us to produce the best possible set of allocations, under the constraints we are working with. This is pretty satisfying, both on an intellectual level, but also because this will have a real impact on student satisfaction. If there is an effect of module desirability on grades, then we may well expect to see a positive shift in our grade distribution for 2018-19 onwards.

* actual allocations should be taken with a pinch of salt, as this was the data before some manual reallocation of students with a poor set of allocations.

Leave a comment

Leave a Reply