PDKit: an open-source toolkit for digital Parkinson's assessment

Smartphones and wearables can now measure Parkinson’s symptoms at home, continuously, far more often than a clinic visit allows. But there has been a quiet problem behind the promise: every research group builds its own way of turning raw sensor data into a number, and those numbers cannot be compared across studies, centres, or devices. PDKit is the answer — an open-source toolkit that gives the whole field a shared, inspectable pipeline from sensor signal to clinical score.

The problem

Parkinson’s is the second most common neurodegenerative disease, with as many as ten million patients worldwide. Because there is no cure, care is a life-long process of managing symptoms and adjusting medication — and because the disease progresses differently in different people, that depends on frequent, objective monitoring. The gold standard, the MDS-UPDRS rating scale applied by a specialist clinician, is too coarse-grained, too time-consuming, and too subjective to capture the fine-grained, day-to-day variation that matters.

Digital methods — smartphone apps and wearables — promise exactly that finer signal. Yet despite a decade of rapid proposals, remarkably little has translated into the kind of robust, generalisable digital endpoints that medical regulators would accept for clinical trials. The paper is direct about why. The research landscape is fragmented: small study samples, differences in sensor placement and calibration, and a lack of clarity in the analytical techniques used mean results routinely fail to replicate. Studies often fail to account for inter-rater variability, so machine learning models end up learning a clinician’s subjective bias rather than removing it. And when many candidate features or algorithms are tested against limited data, the result is feature-selection bias — over-optimistic numbers that look impressive but do not hold up.

Underlying all of this is a single root cause: a lack of algorithmic and model transparency. When every group rolls its own private pipeline, nobody can inspect, reproduce, or fairly compare what was done. That is the gap PDKit was built to close.

PDKit turns raw wearable and smartphone signals into transparent biomarkers and clinical scores.

Figure 1. PDKit: from raw sensor streams to standardised, comparable measures.

What we built

PDKit is a comprehensive, open-source software toolkit for managing and processing patient data captured either continuously by wearables — passive monitoring — or by high-use-frequency smartphone apps — active monitoring. It is implemented in Python, the de facto standard for modern data science, and is released as free software under the permissive MIT license, which permits all uses without restriction. The source lives on GitHub and installs as a package from PyPI; since version 1.0 it has been downloaded over 75,000 times.

The design takes deliberate inspiration from open initiatives that transformed other areas of digital healthcare — the ADNI initiative in Alzheimer’s imaging and the SPM toolkit for brain-imaging analysis — which achieved breakthroughs precisely by being open and shared. Adopting that posture for Parkinson’s, the paper argues, brings concrete advantages: it lets researchers develop and openly share standardised methods that make results comparable across centres and hardware; it packages hard-won expertise in signal processing and machine learning so groups need not rebuild it; it raises confidence in results because the code is tested by a large community; and — unlike proprietary software — it lets anyone inspect the algorithms and their implementation, and therefore scrutinise any clinical inference drawn from them.

The technical heart of PDKit is a single organising idea: the information-processing pipeline abstraction. This well-established data-science design pattern is tailored specifically to Parkinson’s assessment, so that every computational step — from raw signal to clinical score — is captured explicitly, in detail, and in a form anyone can read.

The PDKit information-processing pipeline: ingestion, quality augmentation, feature extraction, biomarker estimation, scoring.

Figure 2. A standard pipeline anyone can inspect, reuse, and extend.

How it works

A PDKit pipeline runs raw sensor data through five sequential stages, each typically implemented as a distinct Python class that can import and export its intermediate results — so a pipeline can be run in stages, inspected partway through, or stopped at whatever point suits a given study.

1. Data ingestion. The first stage consumes wearable and smartphone measurements in a wide variety of formats. There is no universal standard for encoding this data, so PDKit handles the diversity directly: active-monitoring apps such as cloudUPDRS, mPower and Hopkins PD each use their own schema, while passive monitoring streams from wearables over low-power wireless via standard protocols such as MQTT and publish-subscribe schemes. Whatever the source, raw data is converted into standardised, symptom-specific internal representations built on Pandas — for example TremorTimeSeries and FingerTappingTimeSeries.

2. Quality of information. Before any analysis, PDKit assesses and, where needed, improves the data. This covers integrity checks for missing, out-of-range or outlier values caused by transmission errors or sensor faults; resampling to normalise irregular sampling (a prerequisite for techniques like the Fast Fourier Transform); and relevance improvements such as trimming the start-up and cool-down of a test, verifying that an unsupervised movement was performed correctly, and signal segmentation and augmentation.

3. Feature extraction. The pipeline then computes distinctive features for each symptom type — for a typical active-monitoring session, PDKit can calculate over 800 different features. Crucially, it caters to both schools of thought at once: clinically-inspired, bio-inspired features grounded in medical intuition (most of the standard ones from the PD literature are implemented), and purely data-driven features (drawing on established Python libraries such as TSFRESH for time series and Praat for voice). It is built to be extensible, so new techniques can be added.

4. Biomarker estimation. Features are distilled into digital biomarkers — indicators with strong inferential properties. PDKit supports two kinds: standard biomarkers, a snapshot feature vector for one moment in time; and the more powerful longitudinal biomarkers, which accumulate features from repeated measurements over an extended period. Rather than a single snapshot, a longitudinal biomarker captures the statistical distribution of a symptom over, say, a week — a more consistent and sensitive way to characterise a disease as heterogeneous as Parkinson’s.

5. Clinical scoring. The final stage maps biomarkers onto a standard clinical rating scale. Again two routes are offered: a data-driven clustering approach when labelled data is scarce, and a supervised machine-learning approach (ClinicalUPDRS) when clinician-labelled data is available. The payoff is that new sensor measurements can be converted fully automatically into an MDS-UPDRS score without a human rater — enabling end-to-end automatic assessment for tracking disease progression, monitoring response to medication, and patient stratification. MDS-UPDRS is supported because it is the only scale recognised by the FDA and EMA for clinical studies, but the extensible design accommodates others.

Across these stages, PDKit implements the standard battery of Parkinson’s motor tests — tremor, finger-tapping, bradykinesia (pronation-supination and leg-agility movements), and gait — alongside reaction and voice assessments.

The toolkit’s second core ingredient is a choice of two programming models exposed through one API. Developers can use a plain Python interface for a low barrier to entry, or an alternative dataflow programming model for high-performance, horizontally-scalable computation — the same code, unchanged, running on a laptop or scaled out across cloud infrastructure. This matters because deploying digital assessment at population scale means processing large volumes of data generated concurrently and out of order. PDKit’s dataflow model is implemented using Apache Beam, a unified engine that models a program as a directed graph of data flowing between operations and can run across distributed backends such as Apache Flink, Spark and Google Cloud Dataflow.

PDKit is open source, used in a real clinical trial, and makes results comparable across centres.

Figure 3. Why an open, standard toolkit matters.

Why it matters

The case for PDKit is not a headline accuracy number — it is a change in how the field works. When methods are open, standardised, and inspectable, science gets better: results become comparable across centres and hardware, findings become reproducible, the duplicated cost of every group rebuilding its own pipeline disappears, and clinical trials can move faster toward the robust digital endpoints regulators need. By making the exact algorithms visible, PDKit also directly attacks the over-optimism and hidden bias that have held digital assessment back — a transparency goal shared by initiatives such as the Critical Path for Parkinson’s, with which the team has collaborated.

This is not a paper proposal but working infrastructure. PDKit was developed with support from the Michael J. Fox Foundation, has been openly available under the MIT license since its first release in 2018, and is independently known to be used in Parkinson’s clinical studies by universities and companies across Europe and the US. Its practical use was demonstrated in the CUSSP clinical trial in the UK, where a study-specific PDKit pipeline was used to analyse 990 smartphone tests against thousands of blinded clinical ratings — and, just as importantly, made it straightforward to run both a strict pre-specified analysis and a broad exploratory one over many features and classifiers.

That is exactly the kind of MedTech stm.ai is built around: open, evidence-linked, reproducible AI that keeps clinical judgement and the patient at the centre. PDKit does not replace the clinician or hide its workings behind a proprietary wall. It does the opposite — it makes every step from raw signal to clinical score something the whole research community can inspect, reuse, and improve. In medicine, that transparency is not a nice-to-have; it is the precondition for trust.

C. Stamate, J. Saez Pons, D. Weston, G. Roussos — “PDKit: A data science toolkit for the digital assessment of Parkinson’s Disease”, PLoS Computational Biology (2021). Read the paper.