Process Mining
Background
I am broadly interested in all aspects of what we might call the data → information → knowledge → ‘wisdom’ lifecycle: information extraction, storage and retrieval.
This might mean answering questions from data, e.g. to provide information to management to help drive the business more effectively. Going deeper, to answer specific questions from data, we must try to build models of the underlying phenomena that gave rise to the data. Machine learning is partly about this – using the data as evidence to draw conclusions about the real world and build useful models of it.
Process Mining
Business Process Mining, or simply Process Mining Process Mining is learning and analysing business process models from event data obtained from log files written out by business information systems. Similar techniques can be applied to software processes, operating system processes, network data flow, enterprise backups systems traffic, networked storage, understanding IT infrastructure/support environment interactions, robotic interactions, etc. …
For more information, see The IEEE Process Mining Task Force and the pages at processmining.org.
The goals of process mining are to capture the ‘reality’ of the (business) process by looking at what is actually happening, to compare with the ‘believed’ process which may be held by management or analysts. Analysis of the process flow can include comparison between models, identification of bottlenecks, improvements, how decisions are made, and so on. Other ‘perspectives’ can also be mined, such as social or organisational interactions.
The field is now well established in Europe, and growing strongly in the UK, with companies like Fluxicon, Celonis, Minit and many others providing commercial software and services. ProM has long been established as one of the leading research platforms. PM4Py is a relative newcomer providing a Python framework for process mining, now also with a GUI platform PMtk.
PhD Work
My thesis [1,2] developed a probabilistic framework for the analysis and comparison of process mining algorithms: How do different algorithms learn? How much data should we use? What does ‘noise’ mean and what should we do about it? What happens if the process evolves? How can we make it more general or easier to understand? Such a framework could provide the basis for objectively answering some of these questions.
I applied the framework to analyses of the Alpha and Heuristics Miner process mining algorithms, and practical applications to detecting process change, and mining in the the presence of noise. Using our methods, one can determine how much process data (‘traces’) are needed to mine a model with a given confidence in it being correct.
This work provides a principled foundation on which these process mining activities described above can be carried out and process mining questions investigated. However that these ideas have not really (yet) been grasped by the research or commercial communities as a whole.
As of June 2022: I hope to restart work in this area. The present limited framework needs work to include cycles, develop bounds connecting the amount of data used and probabilities of “successful” process mining, and to consider other views such as data-centric processes and incorporation of domain knowledge.
Along with colleagues at the University of Birmingham’s Centre for Primary Care Improvement I am also interested in applying process mining in healthcare, particularly to improve delivery of service in primary care.
[1] P. Weber. A Framework for the Analysis and Comparison of Process Mining Algorithms. PhD thesis, University of Birmingham, UK, 2014. eTheses.
[2] P. Weber, B. Bordbar and P. Tiňo: A Framework for the Analysis of Process Mining Algorithms. IEEE Transactions on Systems, Man and Cybernetics: Systems, 43(2), pp. 303-317, 2013. DOI.
[3+] See my publications page (http://weberph.bitbucket.io/publications/),