Projects/ClusteringPartialDischarges

print · login

Home » Projects » ClusteringPartialDischarges

Clustering of Partial Discharges at Alliander

This is an MSc thesis/internship project for a student with a strong quantitative background (math, physics, software engineering, or equivalent) and a good working knowledge of data processing with Python.

Project description

Most power failures in the medium voltage power grid occur in underground cable joints. These connections between cables are the weak spots of a cable system. Alliander uses online partial discharge (PD) monitoring to prevent power outages. Partial discharges are small sparks in the insulation of cables and joints that precede a full breakdown. By detecting such partial discharges early, discharging cable joints can be replaced before they fail and cause a power outage. Currently, around 15% of the MV grid is measured by 2300 sensors systems.

Because of the large quantity of data generated continuously, machine learning methods are used to process the data into useful grid warnings. These warnings can then be handled by a grid operator who decides whether to replace a joint or not. An important step in this process is the clustering of partial discharge data. The objective of this step is to cluster the datapoints that come from a common source, either PD or noise, together for further analysis.

Figure: Partial discharge data from a cable circuit. The axes are location on cable, time, and discharge magnitude. Model clusters are distinguishable by color.

Currently we use HDBSCAN [1], a density-based clustering algorithm for this task. A feature of this model is that it can identify clusters of very different densities. Although the model produces good results for most circuits, there are issues with certain cable circuits. A more fundamental problem is that it is difficult to define a good clustering. Currently this is based on visual inspection. In this project we want to investigate the best possible clustering algorithm for our problem and data, and how to optimally calibrate it.

Research questions

Which cluster algorithm is best suited for clustering partial discharge data into meaningful clusters where each cluster belongs to a single source, either partial discharges or noise?
Is it possible to find one set of model parameters that works for all circuits, or is there a procedure to quickly optimize the parameters for every circuit?
Data is generated continuously. How robust is the clustering algorithm when new data is added?
Is it possible to define a quantitative measure for the quality of a clustering of the data?
All cluster models have their strengths and weaknesses. Is it possible to combine several clustering’s of the same data to obtain a better one? [2] and [3].

Recommended literature

Leland McInnes, John Healy, Steve Astels. The HDBSCAN clustering library. (https://hdbscan.readthedocs.io/en/latest/)
Boulis, C. and Ostendorf, M., 2004. Combining multiple clustering systems. In Knowledge Discovery in Databases: PKDD 2004: 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, September 20-24, 2004. Proceedings 8 (pp. 63-74). Springer Berlin Heidelberg. (http://www.icsi.berkeley.edu/pubs/speech/clusteringsystems04.pdf)
Strehl, A. and Ghosh, J., 2002. Cluster ensembles---a knowledge reuse framework for combining multiple partitions. Journal of machine learning research, 3(Dec), pp.583-617.(https://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf)

Contact

Supervisors at Alliander: Sander Rieken, Simon Bleuzé

Supervisors at Radboud: Yuliya Shapovalova

Department of Data Science

Clustering of Partial Discharges at Alliander

Project description

Research questions

Recommended literature

Contact

Department of
Data Science