Background

I am loosely interested in all aspects of what we might call the data → information → knowledge → ‘wisdom’ lifecycle: information extraction, storage and retrieval.

In my past employment in industry, this meant answering questions from data, to provide (for example) information to management to help drive the business more effectively. Now, my view is more that to answer specific questions from data, we must try to build models of the underlying phenomena that gave rise to the data. Machine learning is partly about this – using the data as evidence to draw conclusions about the real world and build useful models of it.

Process Mining

My Ph.D. focussed on (Business) Process Mining. Process Mining is learning and analysing business process models from event data obtained from businesses’ information systems’ log files. Similar techniques can be applied to software processes, operating system processes, network data flow, enterprise backups systems traffic, networked storage, understanding IT infrastructure/support environment interactions, robotic interactions …

The goals of process mining are to capture the ‘reality’ of the (business) process by looking at what is actually happening, to compare with the ‘believed’ process which may be held by management or analysts. Analysis of the process flow can include comparison between models, identification of bottlenecks, improvements, how decisions are made, and so on. Other ‘perspectives’ can also be mined, such as social or organisational interactions.

My Thesis [1]

I developed a probabilistic framework for the analysis and comparison of process mining algorithms: How do different algorithms learn? How much data should we use? What does ‘noise’ mean and what should we do about it? What happens if the process evolves? How can we make it more general or easier to understand? Such a framework could provide the basis for objectively answering some of these questions.

I applied the framework to analyses of the Alpha and Heuristics Miner process mining algorithms, and practical applications to detecting process change, and mining in the the presence of noise. Using our methods, one can determine how much process data (‘traces’) are needed to mine a model with a given confidence in it being correct.

This work provides a principled foundation on which these process mining activities described above can be carried out and process mining questions investigated.

[1] P. Weber. A Framework for the Analysis and Comparison of Process Mining Algorithms. PhD thesis, University of Birmingham, UK, 2014. eTheses.

More Background

I came to research after several years in industry doing systems analysis and design, development, and administration of Unix and Storage systems.

There was always too much of

  • Complexity and data overload. Distributed and networked systems are far too complex so they just get ‘managed’ , never understood. For example, enterprise backups infrastructure, networked data storage. Yet information is available in log files etc., decribing what is occurring and where/when/how/why things are going wrong. How to extract useful information and act on it?
  • Information loss. The same problems are tackled and solved again and again, and any learning is lost as different people tackle the problem. Attempts to solve this include documentation, ‘training’, Sharepoint, bespoke scripting. How to learn, remember, and re-use the information/knowledge –> wisdom?