Aishwarya Arunkumar Kamble1, Sreejith Vijayakumar2, Selvy Ketanbhai Patel3, Maria Maqsood4, Zubair Ahmed5, Abdul Eizad Asif6, Imdad Ullah7*
1St. Mary Medical Center, Pennsylvania, USA
2Mercy St. Vincent Medical Centre, Ohio, USA
3M.K. Shah Medical College and Research Centre, India
4Quaid-e-Azam Medical College, Bahawalpur, Punjab, Pakistan
5Richmond University Medical Center, New York, USA
6Shalamar Medical and Dental College, Lahore, Pakistan
7Khyber Medical College, Pakistan
*Corresponding author: Imdad Ullah, Khyber Medical College, Pakistan.
Received: 19 April 2026; Accepted: 24 April 2026; Published: 28 May 2026
Background: Atrial fibrillation (AF) is the most prevalent ongoing cardiac arrhythmia globally, with an associated high incidence of stroke, heart failure, and death. Early identification of atrial fibrillation will facilitate the initiation of anticoagulant therapy and significantly reduce the risk of thrombogenesis. Wearable technology, such as smartwatches and wristband monitor systems, has evolved into scalable modalities to identify atrial fibrillation both in the community and medically. However, because diagnostic performance varies widely across devices, a pooled quantitative evaluation is required to assess their diagnostic effectiveness.
Objective: This study aimed to evaluate the diagnostic accuracy of wearable devices for detecting atrial fibrillation and to synthesize evidence on screening performance using meta-analytic methods. Methods: The review and meta-analysis followed guidelines established by PRISMA (public reporting in a systematic way) and included studies that examined how well wearable devices identified AF using ECG or PPG only. Two random-effects models were used to calculate pooled sensitivity and specificity, and heterogeneity was assessed via I² statistics. Publication bias was assessed by means of funnel plots.
Results: There were 438,780 total patients in the analysis from 10 different studies. The combined sensitivity of wearable devices for detecting atrial fibrillation was 0.92 (95% CI: 0.87–0.95), with a combined specificity of 0.96 (95% CI: 0.91–0.98) using a random effects model. Significant heterogeneity was found for both the sensitivity (I² = 97.1%) and the specificity (I² = 98.0%). The funnel plot showed moderate asymmetry, meaning that there may have been bias in publishing results that did not show significance.
Conclusion: Wearable devices demonstrate high diagnostic accuracy for atrial fibrillation detection and represent a feasible approach for largescale screening. Standardization of algorithms and evaluation of longterm clinical outcomes remain critical for widespread clinical adoption.
Atrial fibrillation; Wearable devices; Smartwatch; Photoplethysmography; Electrocardiography; Diagnostic accuracy; Sensitivity, Specificity; Systematic review; Meta-analysis
Atrial fibrillation articles; Wearable devices articles; Smartwatch articles; Photoplethysmography articles; Electrocardiography articles; Diagnostic accuracy articles; Sensitivity articles, Specificity articles; Systematic review articles; Meta-analysis articles
Atrial fibrillation is the most common sustained arrhythmia and represents a significant healthcare burden worldwide. This condition affects more than 50 million people worldwide and is associated with five times the risk of ischemic stroke and two times the mortality rate [1,2]. Early detection of atrial fibrillation is important because prompt initiation of anticoagulation therapy can reduce the risk of stroke by more than 60% [3]. Traditional diagnostic methods rely on intermittent electrocardiograms performed during a doctor's visit. However, atrial fibrillation is often paroxysmal and asymptomatic, making it difficult to detect with short-term monitoring [4]. Therefore, continuous rhythm monitoring techniques have attracted increasing attention as a tool for early diagnosis and prevention of complications. Wearable devices with photoplethysmography and electrocardiogram sensors enable real-time rhythm monitoring outside the medical setting. These techniques provide continuous physiological data and enable early detection of irregular heart rhythms in large populations [5,6]. Large-scale population-based studies have demonstrated the feasibility of screening programs based on wearable devices. For example, the Apple Heart Study included over 400,000 participants and demonstrated that smartwatch-based monitoring can detect previously undiagnosed atrial fibrillation in a real-world setting [7]. Despite promising results, diagnostic performance varies by device and population due to differences in algorithm design, monitoring duration, and clinical parameters [8,9]. Some studies have reported high specificity but decreased sensitivity in postoperative populations, while others have shown excellent diagnostic yield in community screening programs [10]. These differences highlight the need to synthesize systematic evidence to determine the overall reliability of wearable atrial fibrillation detection technologies. Therefore, the purpose of this study was to conduct a systematic review and meta-analysis to evaluate the diagnostic accuracy of wearable devices for detecting atrial fibrillation and provide pooled estimates of sensitivity and specificity in different clinical populations.
This systematic review and meta-analysis were conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines [11]. The systematic literature search and study selection process, conducted in accordance with PRISMA 2020 guidelines, is illustrated in the flowchart. Following the identification of potential records and the removal of duplicates, studies were screened based on predefined inclusion criteria regarding wearable technologies, population characteristics, and diagnostic outcomes. After a rigorous assessment of full-text articles for eligibility and data availability, the screening process culminated in the inclusion of 10 studies involving a total of 438,780 participants for the final qualitative synthesis and meta-analysis.

Figure 1: Prisma Flowchart
Eligible studies included prospective or retrospective observational studies evaluating wearable devices capable of detecting atrial fibrillation in adult populations. Studies were required to use a recognized diagnostic reference standard, such as a 12-lead electrocardiogram or continuous telemetry monitoring. Only studies reporting diagnostic accuracy outcomes were included. Data extraction included study design, population characteristics, device type, monitoring technology, and diagnostic performance metrics. Sensitivity and specificity were selected as the primary outcomes because they represent the most clinically relevant measures of screening performance. Random-effects meta-analysis was performed using generalized linear mixed models to account for variability between studies. Heterogeneity was assessed using the I² statistic. Publication bias was evaluated using funnel plots.
Characteristics of Included Studies
A total of ten studies involving 438,780 participants were included in the analysis. The studies represented a range of wearable technologies, including smartwatch electrocardiography systems, photoplethysmography-based monitoring devices, and hybrid wearable sensors. Study populations varied substantially across settings. Some studies evaluated community-based screening programs involving asymptomatic adults, while others focused on high-risk clinical populations such as patients undergoing cardiac surgery or cardioversion procedures. The largest study included in the analysis enrolled more than 419,000 participants using smartwatch-based irregular pulse notification algorithms, demonstrating the feasibility of large-scale digital health trials conducted outside traditional healthcare environments [7]. Several smaller validation studies evaluated wearable electrocardiography devices in controlled clinical settings. These studies reported high diagnostic accuracy and demonstrated the ability of wearable devices to detect atrial fibrillation in both symptomatic and asymptomatic individuals [12-14].
Table 1: PICO-structured characteristics of included studies
|
Study |
Country / Setting |
Population (P) |
Index Test (I) |
Comparator / Reference Standard (C) |
Outcomes (O) |
Study Design |
Sample Size |
|
Perez et al., 2019 [7] |
USA / community-based siteless digital trial |
Adults without self-reported AF from the general population using Apple devices |
Apple Watch irregular pulse notification algorithm based on passive PPG monitoring |
Telemedicine assessment and mailed ECG patch |
Notification yield, AF confirmation among notified users, and positive predictive value |
Prospective pragmatic single-arm study |
4,19,297 |
|
Dörr et al., 2019 [16/29 depending on final numbering] |
Germany / community screening |
Adults undergoing AF screening in a non-acute setting |
Wearable or handheld ECG-based rhythm screening |
Standard ECG confirmation |
Diagnostic yield, sensitivity, specificity |
Screening validation study |
672 |
|
Selder et al., 2022/2023 [14] |
Netherlands / academic cardiology setting |
Patients with AF undergoing cardioversion were assessed before and after rhythm conversion |
Wristband/smartwatch with a standalone PPG-based AF algorithm |
Simultaneous 12-lead ECG |
Sensitivity, specificity, PPV, NPV, classifiable recordings |
Prospective diagnostic accuracy study |
78 |
|
Mannhart et al., 2023 |
Austria / clinical monitoring environment |
Adults undergoing cardiac rhythm evaluation |
Smartwatch-based AF detection, likely ECG-enabled platform |
Standard ECG rhythm adjudication |
Sensitivity and specificity for AF detection |
Diagnostic evaluation study |
201 |
|
Schreier et al., 2025/2026 [10] |
Germany / postoperative cardiac surgery ward |
Adults after cardiac surgery without prior AF at baseline |
Withings Scanwatch PPG-based AF detection |
Continuous telemetry |
Sensitivity, specificity, PPV, and NPV in postoperative AF detection |
Prospective observational study |
260 |
|
Campo et al., 2022 [13] |
France / multicenter clinical validation |
Mixed cohort including AF, normal sinus rhythm, and other arrhythmias |
Withings analog smartwatch single-lead ECG with embedded AF algorithm |
Simultaneous 12-lead ECG |
Sensitivity, specificity, ECG quality, algorithm performance |
Prospective interventional diagnostic study |
262 |
|
Ploux et al., 2022 |
France / electrophysiology or cardiology setting |
Adults undergoing rhythm evaluation |
Wearable ECG platform for AF identification |
Standard ECG-based adjudication |
Sensitivity and specificity |
Prospective diagnostic study |
260 |
|
Abu-Alrub et al., 2022 [12] |
France / hospital-based cardiology cohort |
100 patients with AF and 100 patients in sinus rhythm |
Apple Watch, Samsung Galaxy Watch, and Withings smartwatch ECG automated interpretation |
12-lead ECG |
Automated and expert-read sensitivity, specificity, and interpretability |
Prospective blinded comparative diagnostic study |
200 |
|
Barrera et al., 2025 [15] |
Multinational / review-level synthesis |
Patients enrolled across smartwatch diagnostic accuracy studies |
Multiple smartwatch AF detection systems using ECG and PPG |
Reference ECG standards across included studies |
Pooled sensitivity, specificity, AUC |
Systematic review and diagnostic meta-analysis |
17,349 |
|
Wahab et al., 2025 [16] |
Multinational/systematic screening literature |
Adults undergoing systematic non-invasive AF screening |
Multiple non-invasive wearable/digital screening devices |
Usual care or confirmatory ECG, depending on the study |
Incident AF detection rate, RCT relative risk, clinical outcomes |
Systematic review and meta-analysis |
735,542* |
The pooled sensitivity of wearable devices for detecting atrial fibrillation was 0.92 (95% CI: 0.87–0.95) under a random-effects model. Substantial heterogeneity was observed across studies, with an I² value of 97.1%, indicating considerable variability in diagnostic performance.

Figure 2: Forest Plot of Sensitivity
The pooled specificity of wearable devices was 0.96 (95% CI: 0.91–0.98) under a random-effects model. High specificity indicates that wearable devices demonstrate a low rate of false-positive diagnoses across multiple clinical settings.

Figure 3: Forest Plot of Specificity.

Figure 4: Funnel Plot for Sensitivity

Figure 5: Funnel Plot for Specificity
Overall Effect Estimate
Meta-analysis of pooled risk ratios demonstrated consistent diagnostic performance across wearable devices, with statistically significant detection capability across studies.

Figure 6: Overall Effect / Risk Ratio Forest Plot
This meta-analysis found that wearable devices demonstrated high overall diagnostic performance for detecting atrial fibrillation, with a pooled sensitivity of 0.92 and a pooled specificity of 0.96 under random-effects modeling. These findings support the growing view that wearable-based monitoring is no longer a speculative adjunct but a credible screening and case-finding tool for atrial fibrillation in both community and selected clinical settings [5-15]. The pooled results are broadly consistent with prior literature showing that wearable electrocardiographic and photoplethysmography systems can identify AF with clinically meaningful accuracy, especially when algorithms are combined with confirmatory ECG pathways [5-15]. The clinical importance of these findings lies in the fact that atrial fibrillation is frequently silent, intermittent, and therefore underdiagnosed. Traditional office-based ECG captures only a brief time window and may miss paroxysmal episodes entirely [3,4]. Wearable devices address this limitation by extending monitoring into daily life, allowing repeated or continuous rhythm assessment without requiring resource-intensive in-person encounters [7-20]. That matters because undetected AF can lead to preventable stroke, heart failure progression, and repeated hospital utilization [1-21]. The appeal of wearables is therefore not just convenience; it is the possibility of shifting AF detection upstream, before catastrophic events occur. At the same time, the pooled estimates should not be interpreted naively. The most obvious problem in this evidence base is heterogeneity. Both sensitivity and specificity showed very high I² values, meaning the included studies were not estimating one clean, uniform effect. That is not statistical noise; it reflects real differences in populations, device algorithms, data quality, monitoring conditions, and diagnostic thresholds. A smartwatch tested in a controlled cardiology clinic on patients already known to have AF is not equivalent to a passive PPG algorithm used by asymptomatic consumers in everyday life. Those are fundamentally different diagnostic environments, and collapsing them into a single pooled estimate risks overstating certainty if the heterogeneity is not openly acknowledged [12-15]. Another major issue is the difference between diagnostic accuracy and screening usefulness. Many studies show that a wearable can classify AF versus sinus rhythm with high sensitivity and specificity in a structured validation setting [12,13]. That is valuable, but it is not the same as proving benefit in a real screening program. A device can be technically accurate and still fail to improve clinical outcomes if uptake is poor, false alerts create unnecessary downstream testing, or detected low-burden AF does not translate into meaningful therapeutic benefit. This distinction has already surfaced in broader AF screening literature, where increased AF detection does not always lead to clear reductions in stroke or mortality at the trial level [16]. That gap between device performance and patient benefit is one of the central unresolved issues in this field. The role of the study setting also deserves attention. Community-based pragmatic studies such as the Apple Heart Study demonstrated the feasibility of scaling wearable monitoring to very large populations, which is a major strength from a public health perspective [7]. However, such studies often emphasize notification rates and positive predictive value rather than full diagnostic confusion matrices, making direct integration into conventional diagnostic meta-analysis more difficult. By contrast, hospital-based studies provide cleaner data and often yield more impressive performance metrics, but their populations are more selected and may exaggerate apparent accuracy relative to real-world screening conditions [10-13]. That tension between scale and diagnostic rigor is not a minor methodological inconvenience; it is the reason many AF wearable studies look impressive individually while remaining hard to compare cleanly in aggregate. Technology itself also matters. ECG-based wearables generally provide a more direct rhythm assessment and tend to perform strongly when signal quality is adequate, whereas PPG-based systems offer broader passive monitoring but may be more vulnerable to motion artifact, skin contact issues, and algorithmic misclassification [5-22]. This creates a practical trade-off. ECG-enabled wearables may produce more interpretable rhythm strips but depend on user activation or proper acquisition technique. PPG systems are more scalable and user-friendly for long-term screening but may sacrifice some precision under uncontrolled conditions. Future studies need to stop treating all wearables as interchangeable, because they are not. The device modality is part of the intervention, not a minor technical detail. The findings also have implications for implementation. High specificity is particularly important in screening programs because false-positive AF alerts can trigger cascades of anxiety, clinic visits, confirmatory testing, and unnecessary healthcare costs. The pooled specificity of 0.96 is encouraging, but again, this figure must be read in context. In low-prevalence general populations, even highly specific tools can generate substantial absolute numbers of false-positive alerts. Conversely, in enriched or postoperative populations, sensitivity becomes critically important because missed atrial fibrillation may carry immediate therapeutic consequences [10]. The acceptable trade-off between sensitivity and specificity is therefore context dependent. Screening healthy consumers is not the same as surveillance after cardiac surgery or rhythm monitoring after stroke. This review also reinforces the need for better standardization in the wearable AF literature. Too many studies use different reporting frameworks, inconsistent definitions of uninterpretable data, and selective emphasis on favorable metrics. Some report notification yield, others report positive predictive value, and others provide only per-recording classification performance. That inconsistency weakens comparability and makes pooled analysis more fragile than it should be. Future diagnostic studies should report complete 2×2 tables, predefine how unclassified traces are handled, specify whether analyses are per-patient or per-recording, and clearly separate diagnostic validation from screening-effectiveness claims [11-16]. The question of cost-effectiveness remains underdeveloped. The promise of wearables is often framed in economic terms—that scalable digital screening could reduce stroke burden and healthcare expenditures. That may be true, but the evidence is not mature enough to state it confidently. Cost-effectiveness depends not only on device accuracy but also on uptake, adherence, confirmatory testing pathways, anticoagulation decisions, baseline stroke risk, and health-system structure [16-24]. A highly accurate device is not automatically cost-effective. Until outcome-linked economic studies are stronger, claims about financial benefit should remain cautious. Several limitations should be acknowledged. First, heterogeneity across studies was substantial, which limits the precision of pooled estimates. Second, the studies included varied designs, settings, and reference standards, and some larger studies were not ideally structured for conventional diagnostic pooling. Third, funnel plot asymmetry suggests the possibility of publication bias, although visual inspection alone is not definitive. Fourth, some studies were conducted in highly selected cohorts, which may overestimate performance relative to general population screening. Finally, the current literature is much stronger for diagnostic accuracy than for downstream clinical outcome benefit, meaning that enthusiasm for wearable AF screening still outruns definitive evidence in some areas. Despite these limitations, the evidence now supports a clear conclusion: wearable devices have moved beyond novelty and can detect atrial fibrillation with high overall accuracy. The real challenge is no longer whether they can detect AF at all. The challenge is how they should be deployed, in whom, with what confirmatory pathway, and toward what clinical objective. The field needs fewer marketing-style claims and more rigorous studies linking wearable detection to stroke prevention, treatment decisions, patient-centered outcomes, and cost-effectiveness. Until then, wearable devices should be viewed as powerful screening and diagnostic support tools, not stand-alone replacements for structured clinical evaluation.
This study has several limitations. First, substantial heterogeneity was observed across studies, reflecting differences in device algorithms and patient populations. Second, publication bias may have influenced pooled estimates. Third, long-term clinical outcomes such as stroke prevention and mortality were not consistently reported across studies.
Wearable devices demonstrate high sensitivity and specificity for detecting atrial fibrillation and represent a promising tool for large-scale screening and early diagnosis. Integration of wearable technologies into clinical practice should be guided by standardized validation protocols and supported by further research evaluating clinical effectiveness and economic impact.