Lean Manufacturing: Limiting innovation?

I’ve been running a “lean” business for 6 months now, and I’ve noticed that Lean Manufacturing principles applied to software development could lead to bad business. Let me explain:

Primarily, my concern centers around the lean manufacturing principle of waste reduction. The constant strive to reduce waste makes sense in an industrial production line, but does it make sense in a startup or exploratory environment?

An example

Let’s say there are 2 possible features you can work on. Feature 1 has a 90% chance of delivering £1 value, and Feature 2 has 10% chance of delivering £100 value.

Let’s say that the features take the same time to develop. From the maths,

Feature 1 “value” = 90% of £1 = £0.90

Feature 2 “value” = 10% of £100 = £10

And yet even with this understanding, because of the implicit risk and waste aversion of lean, we would say “there’s a 90% chance Feature 2 will be wasteful, whereas Feature 1 is sure to not be wasteful, therefore Feature 1 is a better idea”.

Good outcomes, but not as good as they could be

The waste reduction aspect of lean manufacturing gives us a local optimisation, much like gradient descent. Imagine a ball on a hill, which will roll downhill to find the bottom. This is ok, and it will find a bottom (of the valley), but maybe not the bottom (of the world, maybe the Mariana trench).  In that sense it is locally good, but not globally optimal.

The way mathematicians sometimes get round this is by repeatedly randomly starting the ball in different places: think a large variety of lat-longs. Then you save those results and take the best one. That way you are more likely to have found a global optimum.

So I’m wondering about whether this kind of random restarting makes sense in a startup world too. I guess we do see it in things like Google’s acquisitions of startups, Project Loon, etc. Perhaps we/I should be doing more off-the-wall things.

Closing commentary

Perhaps it isn’t so odd that Lean Manufacturing has “reduce waste” as a principle… In a production line environment, reduction of waste is the same as increasing value.

Still, if the optimisation problem is “maximise value” this leads to different outcomes than “minimise waste”. I would argue we should, in almost every case, be focusing on maximising value instead.

As we’ve seen with following the rituals rather than the philosophy and mindset of agile, it is beneficial to actually think about what we’re doing rather than applying things without understanding.

Comments below please, I know this may be a bit controversial…

 

Climate Change

In the past week, I’ve been to some excellent talks. The first was on Biomarkers at the Manchester Literary and Philosophical Society, and the second was Misinformation in Climate Change at Manchester Statistical Society. And both of these followed the IMA’s Early Career Mathematicians conference at Warwick, which had some excellent chat and food for thought around Big Data and effective teaching in particular.

Whilst I could share my learnings about biomarkers for personalised medicine, which makes a lot of sense and I do believe it will help the world, instead I will focus on climate change. It was aimed at a more advanced audience and had some excellent content, thanks Stephan Lewandowski!

There are a few key messages I’d like to share.

Climate is different to weather

This is worth being clear on: climate is weather over a relatively long period of time. Weather stations very near to one another can have very different (temperature) readings over time. Rather than looking at the absolute value, if you instead look at the changes in temperature you will be able to find correlations. It is these that give us graphs such as:

climate1
Note it is variation rather than absolute

Misinformation

Given any time series of climate, it is possible to find local places where the absolute temperate trend goes down, particularly if you can pick the time window.

Interestingly, Stephan’s research has showed that belief in other conspiracy theories, such as that the FBI was responsible for the assassination of Martin Luther King, Jr., was associated with being more likely to endorse climate change denial. Presumably(?) this effect is related to Confirmation Bias. If you’re interested in learning more, take a look at the Debunking Handbook.

Prediction is different to projection

According to Stephan, most climate change models are projections. That is, they use the historical data to project forward what is likely to happen. There are also some climate change models which are predictions, in that they are physics models which take the latest physical inputs and use them to predict future climate. These are often much more complex…

Climate change is hard to forecast

I hadn’t appreciated also how difficult to forecast El Niño is. El Niño is warming of the eastern tropical Pacific Ocean, the opposite (cooling) effect being called La Niña. Reliable estimates for El Nino are available around 6 months away, which given the huge changes that happen as a result I find astonishing. The immediate consequences are pretty severe:

El-Nino.jpg
Source: welt hunger hilfe

As you can see from the above infographic, it turns out that El Niño massively influences global temperatures. Scientists are trying to work out if there is a link between this and climate change (eg in Nature). Given how challenging this one section of global climate is, it is no wonder that global climate change is extremely difficult to forecast. Understanding this seems key to understanding how the climate is changing.

The future

In any case, our climate matters. In as little as 30 years (2047), we could be experiencing climatically extreme weather. Unfortunately since CO2 takes a relatively long time to be removed from the atmosphere, even if we stopping emitting CO2 today we would still have these extreme events by 2069. Basically, I think we need new tech.

 

 

Open Data

In a previous post, several months ago, we talked about Chaos and the Mandelbrot Set: an innovation brought about by the advent of computers.

In this post, we’ll talk about a present-day innovation that is promising similar levels of disruption: Open Data.

Open Data is data that is, well, open, in the sense that it is accessible and usable by anyone. More precisely, the Open Definition states:

A piece of data is open if anyone is free to use, reuse, and redistribute it – subject only, at most, to the requirement to attribute and/or share-alike

The point of this post is to share some of the cool resources I’ve found, so the reader can take a look for themselves. In a subsequent post, I’ll be sharing some of the insights I’ve found by looking at a small portion of this data. Others are doing lots of cool things too, especially visualisations such as those found on http://www.informationisbeautiful.net/ and https://www.reddit.com/r/dataisbeautiful/.

Sources

One of my go-to’s is data.gov.uk. This includes lots of government-level data, of varying quality. By quality, I mean usability and usefulness. For example, a lat-long might be useful for some things, a postcode or address for other things, or an administrative boundary for yet others. This means it can be very hard to “join” the data together, as the way they store something like “location” is many different ways. I often find myself using intermediate tables that map lat-long into postcodes etc., which takes time and effort (and lines of code).

Another nice meta-source of datasets is Reddit, especially the datasets subreddit. There is a huge variety of data there, and people happy to chat about it.

For sample datasets, I use the ones that come with R, listed here. The big advantage with these is they are neat and tidy, so they don’t have missing values etc and are nicely formatted. This makes them very easy to work with. These are ideal for trying out new techniques, and are often used in worked examples of methods which can be found online.

Similarly useful are the kaggle datasets, which cover loads of things from US election polls to video games sales. If you are inclined they have competitions which can help structure your exploration.

A particularly awesome thing if you’re into social data is the European Social Survey. This dataset is collected through a sampled survey across Europe, and is well established. It is conducted every 2 years, since 2002, and contains loads of cool stuff from TV watching habits to whether people voted. It is very wide (ie lots of different things) and reasonably long (around 170,000 respondents), so great fun to play with. They also have a quick analysis tool online so you can do some quick playing without downloading the dataset (it does require signing up by email for a free login).

Why is Open Data disruptive?

Thinking back to the start of the “information age”, the bottleneck was processing. Those with fast computers had the ability to do stuff noone else could do. Technology has made it possible for many people to get access to substantial processing power for very cheap.

Today the bottleneck is access to data. Google is making their business around mastering the world’s data. Facebook and twitter are able to exist precisely because they (in some sense) own data. By making data open, we start to be able to do really cool stuff, joining together seemingly different things and empowering anyone interested. Not only this, but in the public sector, open data means citizens can better hold government officials to account: no bad thing. There is a more polished sales pitch on why open data matters at the Open Data Institute (and they also do some cool stuff supporting Open Data businesses).

Some dodgy stuff

There are obviously concerns around sharing personal data. Deepmind, essentially a branch of Google at this point, has very suspect access to unanonymised patient data. Google also recently changed the rules, making internet browsing personally identifiable:

We may combine personal information from one service with information, including personal information, from other Google services – for example to make it easier to share things with people you know. Depending on your account settings, your activity on other sites and apps may be associated with your personal information in order to improve Google’s services and the ads delivered by Google.

Source: https://www.google.com/policies/privacy/

We’ve got to watch out, and as ever be mindful about who and what we allow our data to be shared with. Sure, this usage of data makes life easier… but at what privacy cost.

allyourdataarebelongtous
PRISM

DNA sequencing: Creating personal stories

Data matters. A great example of a smart use of data is genetic sequencing. This involves 3 billion base pairs, although scientists only know what around 1% of these do. The arguably most important ones are to do with creating proteins. By looking at people with traits, diseases or ancestry, scientists have been able to pick out those sets of genes which seem to match with those attributes. For example, breast cancer risk is 5 times higher if you have a mutation in either of the tumour-suppressing BRCA1 and BRCA2 genes.

Due to science, there are now commercial providers of DNA sequencing available, such as 23andme. They market this as a way to discover more about your ancestry and any genetic health traits you might want to watch out for. To try this out, I bought a kit to see how they surfaced the data in an understandable way. The process itself is really easy, you just give them money and post a tube of your spit to them.

23andme.jpg
nice…

After a few weeks wait for them to process it, you can look at your results. Firstly, you have your actual genetic sequencing. This is perhaps really only of interest (or any use) to geneticists. As part of their service, 23andme pull out the “interesting” parts of the DNA which have been shown (through maths/biology) to correspond to particular traits or ancestry.

They separate this out into:

  • Health:
    • Genetic risks
    • Inherited conditions
    • Drug response
    • Traits (eg hair colour or lactose tolerance)
  • Ancestry:
    • Neanderthal composition
    • Global ancestry (together with a configurable level of “speculativeness”)
    • Family tree (to find relatives who have used the service too)

Part of what is smart about this service is that while it uses DNA as underlying data, it almost entirely hides this from the end user. Instead, they see the outcome for them. They have realised that people don’t care about a sequence like “agaaggttttagctcacctgacttaccgctggaatcgctgtttgatgacgt” but they do care about whether they have a higher risk of Alzheimer’s. Because some of these things are probabilistic, they also put a 1*-4* scale of “Confidence”: again this is easy to read at a glance. It isn’t very engaging, but it looks something like this:

Screen Shot 2016-09-26 at 16.18.08.png
Examples

Perhaps more visually interesting is the ancestry stuff. Apologies that my ancestry isn’t very exciting:

Screen Shot 2016-09-26 at 16.20.14.png
Ancestry, set to “standard” speculation levels (75% confidence)

 

I hope this has been interesting. Commercial DNA sequencing is a real success story not just for biochemistry and genetics, but also for the industrialisation of these processes and the mathematics and software that makes it possible. The thing that is especially cool, according to me at least, is the ability to make something as complex as genetics accessible, understandable and useful.

Proof: Little’s Law (why to Limit WIP)

Little’s Law states that:

The average number of customers in a queuing system = ( the rate at which customers enter the system ) x (the average time spent in the system)

Typically this might be applied to things like shoppers in a supermarket, but here we will focus on the application to software development. In a software development world, we often write it the same statement with different words, thinking about tasks:

Average Work in Progress = Average Throughput x Average Leadtime

Little’s law is beautifully general. It is “not influenced by the arrival process distribution, the service distribution, the service order, or practically anything else”[1]. This almost makes it self-evident, and since it is a mathematical theorem perhaps this is correct, since it is true in and of itself. Despite being so simple to describe, the simplest generalised proof that I have been able to find (and which we will not tackle here) is however trickier, since a solid grasp on limits and infinitesimals is required. Instead, we will consider a restricted case, suitable for most practical and management purposes, which is the same equation, with the condition that every task starts and finishes within the same finite time window. The mathematical way of saying this is that the system is empty at time t = 0 and time t = T, where 0<T<∞. A diagram to show this system might look something like this:

wip
Tasks all starting and finishing between t=0 and t=T

Proof

For our proof, we start with some definitions

n(t) = the number of items in the system at time t

N = the number of items that arrive between t = 0 and t = T

λ = the average arrival rate of items between t = 0 and t = T. The arrival rate is equal to the departure rate (sometimes called throughput), since the system is empty at the beginning and the end.

L = the average number of items in the system between t = 0 and t = T. This is sometimes called “average Work in Progress (WIP)”

W = the average time items spend in the system between t = 0 and t = T. This is called W as a shorthand for wait time, but in software development we might call this leadtime

A = area under n(t) between t = 0 and t =T. This is the sum of all the time every item has spent queuing.

Using this notation, Little’s law becomes

L = λ x W

which we will prove now. The following equations can be assembled from these definitions. We will need to use these to assemble Little’s Law.

  1. L = A/T (average number of items in the system = sum of time spent / total time)
  2. λ = N/T (average arrival rate = total number of items / total time, since every item leaves before t=T)
  3. W = A/N (average time in system = sum of all time spent / number of items)

We can now use these three equations to prove Little’s Law:

L = A/T                                                from (1)  

   = (A/T)x(N/N)                                since N/N = 1

   = (N/T)x(A/N)                                 by rearranging fractions

   = λ x W                                             from (2) and (3)

This is what we wanted, so the proof is complete.

What does this mean?

A trick to getting good outcomes from Little’s Law is understanding which system we want to understand.

If we consider our queuing system to be our software development team, our system is requirements coming in, then being worked on and finished. In this case, W is the development time, and each item is a feature or bug fix, say.

To have a quicker time to market, and to be able to respond to change more quickly, we would love for our so-called “cycle time” W to be lower. If the number of new features coming into our system is the same, then we can achieve that by lowering L, the average work in progress. This is part of why Kanban advocates “limiting work in progress”.

Alternatively, we can consider our queuing system to be the whole software requirement generation, delivery, testing and deployment cycle. In this case, we might have W being the time taken between a customer needing a software feature to it being used by them. By measuring this, we get a true picture of time to market (our new W, which is true measure of “lead time”), and we with some additional measurements we would be able to discover the true cost of the time spent delivering the feature (since our new A is means total time invested).

Outside of the development side of software, we can apply Little’s Law to support tickets. We can, for example, state how long a customer will on average have to wait for their query to be closed, by looking at the arrival rate of tickets and the number of items in the system. If there are on average 10 items in the queue and items arrive at 5 per hour, the average wait time will be 2 hours, since the rearrangement of Little’s Law to  L/λ = W gives us 10 / 5 = 2.

I hope that was interesting, if you would like me to explain the proof in the general case, let me know in the comments. I think it would be about 10 pages for me to explain, so in the spirit of lean I will only do this if there is a demand for it.

Thoughts on Quantum Computing and engaging people into science

Recently, the Prime Minister of Canada Justin Trudeau amazed journalists by giving a short explanation of quantum computing. In the weeks after, many articles about quantum computing were written commending the president for explaining so eloquently this impenetrable new science. And while it is very exciting to have so many people engage with what is considered the next great frontier of computing, some of the explanations were rather disappointing, and because of the (perceived) complexity of quantum computing it is very easy to give the impression that it is even more magical and mysterious than it really is.

The biggest misconception that propagated was that the miracle of quantum computing is down to the wave-particle duality of fundamental components such as the proton, neutron, electron etc. The key to quantum computing in fact relies on the superposition of states: taking for example a proton, this has a property (which we don’t need to go into but you can read all about) called spin, which when measured in a laboratory will always be “up” or “down”. The fact that a stream of protons can also act as a wave is not the most relevant fact here.

So just quickly: what of the fact that spin is always measured as one of two states? It is that the only way of modelling mathematically the spin of a proton involves inherent randomness. It is possible to put the proton in an equal superposition of both up and down, with the binary result of an experiment only being resolved when actually measured, both up and down equally likely outcomes. Until then, the spin state is genuinely both up and down equally. But the very meaning of the word “quantum” implies the existence of distinct values and therefore no measurement can actually reveal the superposition we know is there. Rather,  in a sense, we force the universe into making a decision at the last possible moment!

So Justin Trudeau and various journalists are right in saying that a bit is binary but a qubit (quantum bit) can hold multiple values at the same time, but it is not the wave-particle duality that underlies this phenomenon.

Quantum computing is obviously far too complex a subject to go into further, but I strongly encourage you to do your own research. It is fantastic that this subject has been given more media attention, but any subject that captures your interest always deserves further research beyond journalistic simplifications. This post isn’t so much about quantum computing itself, but a reminder that initial engagement is just the first step!

Written by William A. Lebreton.

More trees, please!

A few weeks ago, I attended the Presidential Address at the Institute of Mathematics and it’s Applications (IMA). Using this, the IMA President Chris Linton argued the need for bridging the distinctions between “Pure” and “Applied” mathematics. Without giving too many spoilers, he also put the case that the “success path” for people at university is often seen as further academic study, eventually leading to professorship. Progressing into being a teacher, actuary, or any other job is culturally, perhaps, seen as a failure. In summary, I took Chris’ Address as a call for action: for changing the culture of success in mathematics to being more balanced across academic, industrial, educational, and other options. If you’re interested in mathematics, I would recommend going to one of the branch meetings where Chris will be repeating this talk.

It is this same belief that led me to start full time working on Fuza a little over 3 months ago. You can see an example of maths applied to industry in this post. Indeed perhaps more broadly, I believe we should be using the science we already know:

fields.jpg
British greenery: Sunshine not often included (Letchworth Garden City)
  • It has been shown recently that 30 minutes of visits of green spaces reduces instances of depression and high blood pressure [source].
  • In another study, countryside walks were associated with reduced rumination (associated again with depression). [source]
  • There is a weight of science supporting the benefits of urban trees [source]

To me, it seems logical and worthwhile that some city should conduct a practical experiment to see if we could reduce depression, perhaps by simply planting some trees. It seems sensible to conduct a small trial, in order to get some fast feedback. The alternative to a local trial is either to do nothing, or wait for a central government policy to implement it everywhere.

In my town of Letchworth Garden City (UK), we are lucky to have a heritage foundation who do lots of good work, for example converting one of our 100-year-old houses into an eco-home [source]. Experiments do happen, but I would love to see more.

morefields.jpg
More countryside (Bristol – you can see the suspension bridge in the distance)

As a society we are, according to me at least, not implementing enough of these experiments. I have written this blog in an attempt to inspire some of my local decision makers to try more things for the benefit of their fellow citizens.

This is of course just one piece of science that I think it would be interesting to implement. Perhaps there are things you care about that we should be experimenting/implementing within our communities or businesses. I hope we can all do our bit to help get science used in reality. In the meantime, perhaps I will go out for a countryside walk.

 

Pretty maths

Bear with this post as it goes through some equations at the beginning, but it is worth it. We’ll be doing some of the calculations to get this picture:

Mandel_zoom_00_mandelbrot_set
The Mandelbrot Set

This is the set of numbers “c” such that {\displaystyle z_{n+1}=z_{n}^{2}+c} is bounded. These z are complex numbers, which we’ll ignore for now. It is much easier to understand if we look at some examples:

Let’s say c = -1.

We start with 

This is repeating, and the numbers are bounded.

 

Let’s now try c = 0.5.

We start with 

We can see that these numbers are getting bigger and bigger, and it is not bounded.

One more: c=-1.9

It bounces around a lot, never getting very big or very small, so it is bounded. It is kinda fun to sit with a calculator and try this.

Mathematicians call this kind of system “chaos”, as it is very sensitive to the starting conditions. Sometimes this is called the butterfly effect. Note that chaotic is not the same as random: in chaotic systems if you know everything about the initial conditions you know what will happen, whereas in random systems even if you knew everything about the initial conditions you wouldn’t know what was going to happen.

Benoit Mandelbrot was one of the first mathematicians to have access to a computer. Hopefully you can also see now why Benoit Mandelbrot needed a computer to work these out. He repeated this for lots of values of c. The pretty picture we started with is really a plot of the set of c (called the Mandelbrot set), where the colours indicate what happens to the sequence (eg how quickly it converges, if it does).

Mandelbrot Set with Axes

 

You can zoom into the colourised picture to see how complex this is here. Lots of people (me included) think it is pretty cool. It is really worth taking a look to appreciate the complexity.

Other than being pretty, why does this matter?

Stepping back: This picture is made from the formula {\displaystyle z_{n+1}=z_{n}^{2}+c}. This is so simple, and yet gives rise to infinite complexity. In the words of Jonathan Coulton,

Infinite complexity can be defined by simple rules

Benoit Mandelbrot went on to apply this to the behaviour of economic markets, among other things. Later people have applied this to fluid dynamics (video), medicine, engineering, and many other areas. Apparently there is even a Society for Chaos Theory in Psychology & Life Sciences..!

Further reading

This article is good for more explanation of the maths.

Apologies to any Pure mathematicians for the simplifications in this article.

 

Should I buy wine?

Orley Ashenfelter, an economist at Princeton, wanted to guess the prices that different vintages of Bordeaux wine would have. This prediction would be most useful at the time of picking, so that investors can buy the young wine and allow it to come of age. In his own words:

The goal in this paper is to study how the price of mature wines may be predicted from data available when the grapes are picked, and then to explore the effect that this has on the initial and final prices of the wines.

For those of you not so au-fait with wine, prices vary a lot. At auction in 1991, a dozen bottles from Lafite vineyard were bought for:

  • $649 for a 1964 vintage
  • $190 for a 1965 vintage
  • $1274 for a 1966 vintage

Wines from the same location can vary by a factor of 10 between different years. Before Ashenfelter’s paper, people predicted wine quality by experts, who tasted the wine and then guessed how good it would be in future. Ashenfelter’s great achievement was to bring some simple science to this otherwise untapped field (no pun intended).

He started by using the things that were “common knowledge”: in particular that weather affects quality and thus selling price. He checked this by looking at the historical data:

In general, high quality vintages for Bordeaux wines correspond to the years in which August and September are dry, the growing season is warm, and the previous winter has been wet.

Ashenfelter showed that 80% of price variation could be down to weather, and the remaining 20% down to age. With the given inputs, the model he built was:

log(Price) = Constant + 0.238 x Age + 0.616 x Average growing season temperature (April-September) -0.00386 x August rainfall + 0.001173 x Prior rainfall (October-March)

As it turned out, this simple model was better at guessing quality than the “wine expert”: a success for science against pure intuition. The smart part of his approach was getting insight in to the things people felt mattered (weather) and checking that wisdom. Here, he showed that yes it is quite appropriate to use weather and age to model wine prices.

Through the age variable, it also gives an average 2-3% annual return on investment [1] (note this is pre-2008 so is unlikely to behave like this today[2]).

Should I buy wine? Quite possibly, as long as I don’t drink it all.

 

Source: http://www.wine-economics.org/workingpapers/AAWE_WP04.pdf

[1] http://onlinelibrary.wiley.com/doi/10.1111/j.1468-0297.2008.02148.x/abstract

[2] http://www.wineinvestmentfund.com/latest-figures/performance-glance.aspx