Nursing-Relevant Patient Outcomes and Clinical Processes in Data Science Literature: 2019 Year in Review

Mary Anne Schultz, BSN, MSN, MBA, PhD, Rachel Lane Walden, MLIS, Kenrick Cato, RN, PhD, CPHIMS, FAAN, Assistant Professor , Cynthia Peltier Coviak, PhD, RN, FNAP, Professor Emeritus , Christopher Cruz, MSHI, RN-BC, CPHIMS, Team Leader , Fabio D'Agostino, PhD, MSN, RN, Assistant Professor , Brian J. Douthit, MSN, RN-BC, Thompson Forbes, PhD, RN, Assistant Professor , Grace Gao, PhD, DNP, RN-BC, Mikyoung Angela Lee, PhD, RN, Associate Director of PhD Program and Associate Professor , Deborah Lekan, PhD, RN-BC, Ann Wieben, MS, BSN, RN-BC, and Alvin D. Jeffery, PhD, RN-BC, CCRN-K, FNP-BC, Assistant Professor

Mary Anne Schultz

California State University

Find articles by Mary Anne Schultz

Rachel Lane Walden

Vanderbilt University, Annette and Irwin Eskind Family Biomedical Library

Find articles by Rachel Lane Walden

Kenrick Cato

Columbia University School of Nursing, Department of Emergency Medicine

Find articles by Kenrick Cato

Cynthia Peltier Coviak

Grand Valley State University

Find articles by Cynthia Peltier Coviak

Christopher Cruz

Global Health Technology & Informatics, Chevron, San Ramon, CA

Find articles by Christopher Cruz

Fabio D'Agostino

Saint Camillus International University of Health Sciences, Rome, Italy

Find articles by Fabio D'Agostino

Brian J. Douthit

Duke University School of Nursing

Find articles by Brian J. Douthit

Thompson Forbes

East Carolina University College of Nursing

Find articles by Thompson Forbes

Grace Gao

St Catherine University Department of Nursing

Find articles by Grace Gao

Mikyoung Angela Lee

Texas Woman’s University College of Nursing

Find articles by Mikyoung Angela Lee

Deborah Lekan

University of North Carolina at Greensboro School of Nursing

Find articles by Deborah Lekan

Ann Wieben

University of Wisconsin School of Nursing

Find articles by Ann Wieben

Alvin D. Jeffery

Vanderbilt University School of Nursing; Nurse Scientist, Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs

Find articles by Alvin D. Jeffery Mary Anne Schultz, California State University;

Corresponding Author: Alvin D. Jeffery, 461 21 st Ave South, Nashville, TN 37240; 615-322-4400; moc.liamg@yreffejnivla

The publisher's final edited version of this article is available at Comput Inform Nurs

Associated Data

Supplement 1: Supplemental Digital Content 1. Full search strategies. GUID: 15CC6ADE-E833-4D8F-B3F2-8C462EFA362F

Supplement 2: Supplemental Digital Content 2. Number of results remaining after each review step numbers, grouped by outcome.

GUID: 6C7D7A88-0B7C-475F-A833-D7C68AFA3D38

Abstract

Data science continues to be recognized and used within healthcare due to the increased availability of large data sets and advanced analytics. It can be challenging for nurse leaders to remain apprised of this rapidly changing landscape. In this paper, we describe our findings from a scoping literature review of papers published in 2019 that use data science to explore, explain, and/or predict 15 phenomena of interest to nurses. Fourteen of the 15 phenomena were associated with at least one paper published in 2019. We identified the use of many contemporary data science methods (e.g., natural language processing, neural networks) for many of the outcomes. We found many studies exploring Readmissions and Pressure Injuries. The topics of Artificial Intelligence/Machine Learning Acceptance, Burnout, Patient Safety, and Unit Culture were poorly represented. We hope the studies described in this paper help readers: (a) understand the breadth and depth of data science’s ability to improve clinical processes and patient outcomes that are relevant to nurses and (b) identify gaps in the literature that are in need of exploration.

Keywords: Data Analytics, Artificial Intelligence, Nursing Research, Outcome and Process Assessment

INTRODUCTION

The phrase data science, along with related phrases like artificial intelligence, predictive analytics, and machine learning, are increasingly common not only in lay news and media but also in biomedical and nursing literature. One hopes the increasing use of large data sets and advanced analytics is associated with improvements in clinical care delivery and patient outcomes. Unfortunately, the ever-expanding corpus of publications and the plethora of potential clinical applications can leave many nurse leaders struggling to remain apprised of the most current evidence. In this paper, we describe a representative selection of papers published in 2019 that use data science to explore, explain, and/or predict phenomena of interest to nurses.

This project was based on interest from members of the Data Science Workgroup of the Nursing Knowledge: Big Data Science Conference 1 hosted annually by the University of Minnesota School of Nursing. Using a concept analysis paper 2 and group consensus, we identified 15 nursing-relevant patient outcomes and clinical process measures where data science techniques could be helpful. The outcomes selected for review comprise (in alphabetical order): Artificial Intelligence/Machine Learning Acceptance, Burnout, Emergency Department Visits, Falls, Healthcare-Acquired Infections, Healthcare Utilization and Costs, Hospitalization, In-Hospital Mortality, Length of Stay, Pain, Patient Safety, Pressure Injuries, Readmissions, Staffing/Scheduling/Workload, and Unit Culture.

METHODS

A scoping literature review was conducted using PubMed and CINAHL databases in December of 2019 for English language studies published during the past year. The species filter was also used to restrict to human studies. There was one main search strategy which used a combination of keywords and subject headings to find studies discussing the use of data science. The following terms were used to create that strategy: data science, data analytics, artificial intelligence, machine learning, risk assessment, decision support techniques, clinical prediction rule, natural language processing (NLP), computer-assisted image processing, along with analytic, forecast, prediction, risk, and statistical models. This main strategy was combined with an outcome specific strategy for all 15 outcomes (see Table, Supplemental Digital Content 1, which presents full search strategies). Each outcome was reviewed by an individual author who is an expert in the outcome reviewed. Abstract and full-text screening were done using the Raayan 3 web application. Inclusion/exclusion criteria were developed via group consensus with the intention of providing a representative sample of data science publications rather than an exhaustive review of all publications. Overall, 8682 abstracts were screened, and 162 studies were included in this review (see Table, Supplemental Digital Content 2, which breaks down inclusion/exclusion numbers by outcome). Each of these studies were analyzed to identify their aims, study designs, data sources, samples, settings, populations, operational definitions of outcomes, list of variables, and data science methods.

RESULTS

Artificial Intelligence/Machine Learning Acceptance

Key Findings

Researchers approached the topics of Artificial Intelligence/Machine Learning (AI/ML) acceptance or credibility by measuring different outcomes. The selected papers investigated acceptance 4 , satisfaction 5 , trust 6 , and use of AI 7 . Methodologically, two of the studies were quantitative 4,7 , one qualitative 6 , and one a mixed-method approach 5 . Finally, most of the selected research focused on specific AI-based products, such as a smartphone app 6 , self-driving cars 4 , and home assistants like Amazon’s Echo 7 .

Discussion

While the work of the Shin group is not specific to nursing 5 , their conceptual model investigates algorithms based on the concepts of fairness, accountability, and transparency (FAT). These FAT principles are easily transferrable to the healthcare context. Subsequent studies should specifically investigate the relationship of the FAT concepts to credibility or acceptance in nursing-related AI/ML.

Burnout

Key Findings

All reports were cohort studies using survey-based data to predict some component of burnout. Three studies used logistic regression and one study used structural equation modeling (SEM) with path analysis. Notenbomer 8 explored absenteeism as a component of burnout, where both number of days and length of absence were reported. Bosman 9 similarly explored risk of sick leave, predicting this outcome as a binary variable within a timeframe. Oliver 10 explored subjective well-being in staff who care for those with intellectual disabilities, using the Satisfaction with Life Scale as a measure of well-being. And finally, Dutra 11 explored burnout using the Maslach Burnout inventory, with data collected from nurses and nursing technicians to formulate a predictive model.

Discussion

It is worth noting the limited number of studies in 2019 that discussed prediction around burnout and their variant approaches to measuring this phenomenon. Only one article approached burnout directly, using an established scale as the predictor. This variance might be due to the fact that “burnout” as a term is not clearly delineated and has many aspects that could partially be defined. Here, we included both caregiver and healthcare professional burnout, but conceivably, burnout outside of the healthcare space could be examined, as it is a factor that influences individual health.

Of interest, the data contained in these studies were all collected using surveys and questionnaires. Often when considering data science methods, either real-time or historical data collected using a standardized method, e.g., the Electronic Health Record (EHR) or wearables, are used, and primary data collection is infrequent. It’s possible the data required to predict burnout is not readily available, leading to a lack of more advanced data science methodology. In this light, we should promote the regular collection of data on staff and caregiver well-being, as we would then be able to develop decision-support tools to aid in minimizing the acquisition and effects of burnout. This is especially important as we approach increasing pressures regarding staffing shortages and costs associated with job attrition.

Emergency Department Visits

Key Findings

Screening of 626 studies resulted in 17 studies that met the inclusion criteria. Researchers have attempted to predict ED diagnoses such as sepsis 12 , traumatic brain injury 13 , and non-ST-elevation myocardial infarction patients 14 . Additional outcomes include risk prediction for opioid overdose 15 , falls after ED discharge 16 , urgent revascularization 17 , ED readmission 18-21 and patient severity and eventual discharge disposition 22-25 . In addition to the traditional machine learning approaches, ED researchers also used NLP and social network analysis to predict patient outcomes. In two studies, NLP was used to analyze computed tomography, scan reports for prediction of subdural hematoma 26 , and triage notes to predict discharge disposition 27 . Leone et al. 28 used an innovative application of social network analysis to classify women presenting to the ED due to violence exposure.

Discussion

A number of the studies highlighted the importance of nursing-collected information by exclusively or mostly using nursing triage data for prediction 12,18,22,24,25,27 .

Falls

Key Findings

Researchers studied falls using both ML methods 16,29,30 and general predictive models employing various techniques such as least absolute shrinkage and selection operator (LASSO) to determine most relevant and important risk factors for falls 31-34 . The grave consequences of falls in terms of morbidity and mortality as well as healthcare costs have prompted many healthcare organizations to employ fall risk assessment scales, such as the Morse Fall Scale (MFS), used in two studies; Falls Risk Assessment Scoring System (FRASS), used in one study; and Functional Independence Measure (FIM), used in one study; as standard assessments of newly admitted patients to their settings or client caseloads. However, the uncertain value for appropriate fall risk classifications of patients has been an inducement for using the new, more sophisticated analysis methods to determine the assessment elements most important for risk mitigation. These goals appear to be foundational to the studies noted above that used predictive models and spanned a variety of venues including tertiary care, rehabilitation, subacute, ambulatory care, and home health settings.

Discussion

The need to obtain thorough and accurate descriptions of fall episodes was highlighted in the publication sample as well, with Klock et al. 29 choosing to use ML to discern a real-time scoring method based on Agency for Healthcare Research and Quality (AHRQ) rubrics that can improve the quality of actual reports of fall incidents. These enhanced accounts can then contribute to more precise identification of valuable preventive interventions and elimination of ineffective practices.

Healthcare-Acquired Infections

Key Findings

Three studies used logistic regression to develop predictive models for the risk of healthcare acquired infection (HAI)-related outcomes. Hur et al. 35 developed a risk score for Catheter-Associated Urinary Tract Infections (CAUTI) that was incorporated into the EHR patient summary screen. Jackson et al. 36 developed a mixed model with better predictive performance than nares culture in identifying risk of nursing home residents transmitting methicillin-resistant Staphylococcus aureus (MRSA) to healthcare-worker gowns. They also used decision-curve analysis to compare the clinical utility of placing patients on contact precautions under each model. Lodise et al. 37 developed a bedside tool to predict the likelihood of six phenotypes of drug-resistant pathogens among hospitalized adult patients with Gram-negative infections. Their six logistic regression models were converted to an Excel-based user interface to estimate the risk of resistance at the bedside.

Four studies used more contemporary data science methods to predict HAI-related outcomes. Kocbek et al. 38 incorporated temporal data and preoperative blood tests to develop and compare four models to predict the onset of surgical site infection (SSI): several regression models and an extreme gradient boosting (XGBoost) model. Yee et al. 39 used publicly available ICU data and a data-driven approach using Bayesian networks and regression to develop a screening algorithm for progression into septic shock. Bush et al. 40 developed novel patient mobility predictors for unit-wide Clostridium difficile infection (CDI) susceptibility by using a network analysis and trace-route mapping to develop an in-hospital patient mobility network. The resulting calculated contagion centrality (CC) measure was found to be a statistically significant predictor of hospital-onset CDI cases. Liao et al. 41 used Cyranose 320 e-nose sensor breath-gas data to develop models of Ventilator-associated pneumonia (VAP) with Pseudomonas aeruginosa by applying neural network and support vector machine (SVM) methods. According to Liao and colleagues, while the combination of sensor data and ML methods shows promise, advancements in sensor performance and high-performance computing are needed to improve the accuracy of patient breathing gas detection models.

Discussion

System and unit-level implications of these findings suggest that addition of these types of predictors may enhance the development of more robust real-time surveillance systems for HAIs 40 . From a practice perspective, Hur et al. 35 specifically noted that a CAUTI surveillance system could reduce nurse time and effort used for risk assessment that could then be re-directed to management and education of indwelling urinary catheters. From a research perspective, models that perform similarly and also offer lower complexity and higher interpretability are often preferred 38 .

Healthcare Utilization and Costs

Key Findings

All studies explored different facets of costs and utilization, using several methods to do so. Five studies used NLP to explore outcomes 42-46 , six used quantitative ML methods 47-52 , two evaluated existing ML-based tools 53,54 , and two used a form of regression analysis 55,56 . Interesting non-traditional data source examination was reported: video data with AI 57 , deep learning with images 58 , deep learning with audio data 59 , and spatial analysis 60,61 . Outcomes reported were of such disparate interests as the prediction of financial risk of hospitalized pediatric patients 52 , the identification of problematic opioid 46 use, and the evaluation of health literacy 43 .

Discussion

The subject of “healthcare costs and utilization” covers a wide variety of topics and methods, and this is clearly reflected in the sample of papers included in this review. It is promising to see that several non-traditional data types (audio, image, text, geospatial, and video) are being used to the benefit of patient outcomes, reducing costs and increasing healthcare access beyond that of which traditional data are capable. A mix of direct and indirect economic-based outcomes were noted as well, including the use of deep learning-based image analysis to increase diagnostic quality of lower-dose positron emission tomography (PET) images 58 , both reducing costs and advocating for patient safety. Noted throughout the studies, the purpose of cost analysis focused on some portion of patient advocacy, whether relating to cost-saving measures, treatment adherence, or clinical safety.

Hospitalization

Key Findings

A few themes emerged in these ten submissions. Data sources generally originated from existing administrative, commercial claim, and hospital data. Retrospective studies were a commonly adopted study design. Predictive and associative modeling dominate the data science methods employed in these studies. Several data modeling methods include risk prediction algorithm development 62 , linear regression 63 , multivariate statistical analysis using structural equation modeling 64 , multivariable logistic regression 62,63,65-68 , negative binomial-logit hurdle regression 69 , geospatial analytic methods 60 , and a network approach 40 . Ages and gender varied as did disease conditions. Financial impacts and implications appeared to be a common interest of study.

Discussion

The abundance of results and great variety of interests in leveraging data science methods to build predictive associations and relationships among different factors and variables pertaining to hospitalization are notable. The research in this space is showing promising results in mining predictive factors and associations to improve disease prevention and management, health promotion, and detecting gaps in geographical regions that relate to the impacts associated with hospitalization.

In-Hospital Mortality

Key Findings

A number of predictive models exist for identifying patients at high risk for dying in the hospital. The majority of the works used regression (with or without additional methods) for making predictions 24,70-89 . The regression models primarily leveraged logistic regression; however, two papers applied Cox proportional hazards regression 83,87 . Ten papers noted the use of more contemporary methods for prediction: random forests 12,24,75,77,78,90 ,gradient boosting 12,24,75,78,83,91 , Naïve Bayes 78 , support vector machines 12,90 ,and neural networks 24,73,77,90 . Interestingly, one paper conducted a network analysis of healthcare providers and used the network characteristics to serve as predictors 81 . Another paper used regular expressions to extract features for a prediction model 77 .

Sample sizes ranged from 51 to 281,522. Study populations included hospitalized adults from the following countries: Australia 70 , Brazil 71 , China 86,87,89 , Israel 91 , Ireland 70 , Italy 84 ,Korea 76,77 , Singapore 12 , Spain 74 , Switzerland 79 , and the United States of America 24,72,75,78,80-82 . Several studies focused on specific admission diagnoses or surgical procedures, which resulted in a trend toward better model performance compared to models including all-cause hospitalizations. Variables serving as predictors primarily comprised: demographic information, vital signs, laboratory values, and diagnoses/comorbidities/procedures. Less commonly included but notable predictor variables comprised: physical assessments 70 , physiological status scores 71,74,75,81,91 , and medication exposures 78,81,91 . One study included a nutrition score 71 , one study included census-tract-level socioeconomic status 80 , and one study included nursing diagnoses 84 .

Discussion

All papers were limited to adult populations. There might be a need for pediatric-focused in-hospital mortality prediction models. From a nursing perspective, it was nice to see one paper include nursing diagnoses 84 and another paper include socioeconomic status 80 . These voids suggest promising areas for the nurse-investigator who possesses data science methods expertise or who works on the appropriately prepared interprofessional research team.

Length of Stay

Key Findings

Data science methods such as ML models (e.g., artificial neural network, predictive regression analysis) were used in five studies to project the hospital length of stay in different patient populations: (a) surgical patients undergoing orthopedic and neurosurgical operations 48,92,93 , (b) patients who underwent surgeries as first-case in a day 94 , and (c) critical care patients 95 . In two studies, NLP was used to characterize variables using narrative clinical notes 96 or patient comments 97 in order to study their association with hospital length of stay in a population of children with psychiatric complaints and orthopedic surgical patients, respectively.

Discussion

Only in one study 96 was nursing data (specifically, clinical notes written by triage or bedside nurses) used as predictors of length of stay even though several studies have shown the predictive power of nursing data on this outcome 98 . Further studies should include nursing data in predictive analytics methods to improve the prediction for patient and process outcomes.

Pain

Key Findings

Specific to predictive modeling, some studies focused on pain management 99-102 while other studies dealt with identifying predictors of pain 103-110 . There are studies that used biometric data to predict pain using brain grey-matter images 111 and another study that investigated the role of brain patterns in pain management among ED nurses 100 . Lee et al.’s 112 work using multi-modal imaging and autonomic signals as ML predictive variables is a very remarkable contribution in this area. Similarly, Jiang et al. 113 used multiple physiological parameters, galvanic skin response, and electromyography to predict pain level among non-verbal patients while Lim et al. 114 used a deep learning method on photoplethysmography signals to assess pain during surgery.

Discussion

The data sources used in the studies are mostly from the EHR, registry, public database, and clinical trial database containing data on inpatient/surgical encounters, survey questionnaires, and direct observation. Data types range from patient questionnaires (reported pain using numeric scale) to discrete observation data to more advanced biomedical measurements such as imaging data (e.g., magnetic resonance imaging) and physiologic sensor data (e.g., electromyography, photoplethysmography signals). These data were collected in various settings including hospitals, ambulatory clinics, community health, residential care, dental clinics, and sports medicine centers.

Data science techniques included least absolute shrinkage and selection operator, random forest regression, linear support vector classification, NLP of non-structured text, univariate/multivariate logistic regression, analysis of variance, cox proportional hazard regression model, k-means clustering, artificial neural network, support vector machine, multiple learning kernel regression model, multi-layer perceptron neural network, and deep belief network. Validation and testing techniques include bootstrapping, cluster sampling, leave-one-subject-out cross-validation, and 10-fold cross validation.

The use of data science in pain management has a major impact on medical and nursing practice. Its use can improve the ability to classify/measure pain in non-verbal patients (e.g., those on intra-operative deep sedation, those with altered levels of consciousness) and could change the way pain assessment is regarded as a purely subjective type of assessment. Advancing pain measurement or pain prediction based on various factors can greatly enhance the current pain management approach, which has the potential to reduce opioid overdose through personalized medicine. Prognostic pain models for chronic pain management can lead to more accurate opioid prescriptions for long-term opioid therapy, which may also reduce opioid overdose. Predictive models focusing on non-pharmacological interventions to pain management can improve medication safety by focusing on alternative options.

Patient Safety

Key Findings

The primary outcomes were the identification and classification of falls and fall incident reports 115-117 , safety, and predicting perspectives of patient safety on the Hospital Survey on Patient Safety Culture 118 . Studies using data science techniques primarily used NLP 29,115,119 , neural networks 29,120 , and random forest 29,116 algorithms for analysis. The data used to create models came from the EHR, narrative notes, and various surveys on hospital unit metrics. One study used online drug reference guides as part of a model to identify the potential for drug-drug interactions from EHR data 117 .

Discussion

Only one patient safety study using data science techniques included nursing data 118 . No other studies directly used nursing data or were published in nursing journals. The limited exposure of nurses to data science techniques that investigate patient safety may be due to the lack of nursing researchers with expertise in patient safety and the use of data science techniques to create understanding.

Pressure Injuries

Key Findings

Shi, Dumville, and Cullum 121 reviewed, through a systematic review and meta-analysis, clinical effects of 22 prognostic models for predicting pressure injury (PI) risk; most of the models were built by logistic regression and Cox regression.

The seven empirical studies published in 2019 used various data science methods to detect or predict PIs. Li, Lin, and Hwang 122 explored several data mining algorithms to identify the best predictive factors on the occurrence of PIs, including logistic regression and three data mining algorithms (decision trees, neural networks, and support vector machines). The predictive factors, in order of importance, comprised: PI history, without cancer, excretion, activity/mobility, and skin condition/circulation.

Logistic regression analysis was used to compare three predictive models for PI occurrence in surgical patients 123 and to determine the utility of three different PI risk assessment scales (i.e., the Spinal Cord Injury Pressure Ulcer Scale, Braden Scale, and Functional Independence Measure [FIM]) for identifying individuals at risk for developing PI during inpatient spinal cord injury patients 124 . Park et al. 123 found that the Scott Triggers tool was the best fitting model; the estimated surgery time and serum albumin level were significant to predict the development of PIs in surgical patients in acute care settings. Flett et al. 124 found that the FIM bed/chair transfer score could be readily determined at rehabilitation admission with minimal administrative and clinical burden.

Crane et al. 125 used nonlinear regression to explore predictors for identifying patients at high risk of PI and recommended the need for modification of the Glamorgan scoring system, and incorporation of the Pediatric Logistic Organ Dysfunction-2 score might improve the predictive value of a modified Glamorgan scoring system. Zhang, Yu, Shi, Shang, Hong, and Yu 126 used a univariate Cox regression analysis to find the prognostic factors for PI recurrence. Blood albumin level on admission below 25 g/dl contributed to the strongest predicting factor for recurrence, followed by multiple ulcers and presence of a single caregiver. The variable of old age might constitute a risk factor for the pressure ulcer occurrence but not a prognostic factor. Duvall, Karg, Brienza, and Pearlman 127 , using a threshold-based detection algorithm and a K-nearest neighbor classification approach, investigated the feasibility of a sensor technology (i.e., the E-scale system) for detecting and classifying movements in bed (i.e., roll, turn in place, extremity movements, and assisted turn), which are relevant for PI risk assessment. Ohura et al. 128 explored different architectures of the convolutional neural network (CNN) in image segmentation to detect and discriminate ulcer regions of PI during assessment via telemedicine. The U-Net CNN constructed using appropriately supervised data was capable of segmentation with high accuracy. The study findings suggested that eHealth wound assessment using CNNs would be of practical use in the future.

Discussion

Data science methods facilitate the prediction, detection, and management of PIs via optimized assessments. Consideration of the best prognostic factors driven from the studies, such as blood albumin level, mobility, skin conditions, and single caregiver, can be used to develop and improve nutrition programs or home care nursing programs. Notably, nurses can improve their real-time monitoring of high-pressure areas in the bed and assessing PI risk with the use of sensor technologies (e.g., E-scale system). Also, as explored by Ohura et al. 128 , the use of CNN architectures could support the eHealth wound assessment system to significantly change the management of PIs or chronic wounds. For future research, Park et al. 123 (2019) recommended the inclusion of vital signs and nursing interventions in PI predictive modeling.

Readmissions

Key Findings

The majority of papers used a readmission metric defined as unplanned readmission to an acute care hospital within 30 days of discharge 21,88,91,93,129-156 , although some studies used 90-day 157,158 , 180-day 159 , within 1 year 160 , 3 or more readmissions over 1 year 161 , and even “instantaneous hospital readmission risk over time” 162 . Readmission was also defined by urgency 133,145 and etiology (e.g., disease-specific 129,143,144,150,159-161 ). Given varying definitions, a paper by Brittan et al. 131 calculated three definitions of readmission with differing inclusion/exclusion criteria for index admissions and readmissions.

One paper applied principal components analysis, multiple correspondence analysis, and multiple factor analysis 129 . Eleven papers applied machine learning algorithms such as random forest, weighted decision trees, support vector machines, gradient boosting, neural networks, decision curve analysis, SMOTE 91,93,129,130,132,142-144,147,148 , and Naïve-Bayes 21 . In one nurse-authored paper by Kwon et al. 142 a case study was used to illustrate different statistical and ML risk models and hospital readmission outcomes of patients with diabetes mellitus.

Novel predictors used in risk models with relevance to nursing included tests of physical function 132,135,137,139,145,153-156,158,160 , symptoms 132,135,139,158,159,161 , psychosocial factors 150,154,155,157,160,161 , vital signs and/or body mass index 88,91,129,135,137,138,143,144,148,152,156-158,160,161 , and frailty 88,135,137,139,145,153,155,156,158,160 . Other predictors that are infrequently applied in prevailing risk models include laboratory and/or imaging tests 88,91,129,132,138,140,142,144,148,150-153,155,157,159,161,164 , and medications 91,136,138,142,146,148,150,151,154,157,160,161 . Nihhawan et al. 150 incorporated novel socio-behavioral predictors such as health literacy, adherence to medications, substance abuse, patient-provider relationship satisfaction, perceived health status, and housing and food security.

Discussion

Most models used administrative claims and EHR data for sociodemographic information and medical diagnoses. Overall, multiple papers demonstrated that risk factors such as older age, poor health, frailty, multimorbidity, certain medical diagnoses, and healthcare utilization confer high risk for readmission. While these risk factors potentially improve the predictive ability of models, nurses can make important contributions to model development by filling data gaps with nursing-relevant data pertaining to patients’ biopsychosocial health and function. Identifying and applying common data elements relevant to nursing across EHR systems in predictive models and including standard nursing terminology (e.g., “International Classification of Nursing Practice” codes in the EHR as suggested by Kwon et al. 142 ) would capture some nuances that provide contextual information about patient health status and thereby improve the relevance and performance of the models.

Staffing/Scheduling/Workload

Key Findings

Data science methods were reportedly used in three studies involving an estimation of the antecedents or consequences of nurse staffing that year. Two studies reported predictive models formed from ML methods, and one study was a report of NLP used to transform clinical notes into assessment forecasting. The Nadkarni 165 research group used a stepwise, iterative, object-oriented program written with workflow and treatment processes in mind in a sample of 343 patients with potentially life-threatening complications and 2,285 uncomplicated mothers in a Tanzanian hospital. Aimed at providing decision-makers with a tool to analyze the impact of resource limitations on maternal inpatient complications, key variables included treatment efficacy, severity distribution, number and frequency of nurse visits, nurse staffing at the shift level, deterioration rate, and maternal near-misses.

Similarly, the Lucero 34 group (2019) elucidated a data-driven and practice-based approach to identify factors associated with inpatient falls in a sample of 272 patients who fell and 542 who did not while hospitalized in medical-surgical units of a Florida tertiary-care hospital. Manual, semi-automated, and automated procedures deploying theoretically or practice-derived risk factors yielded a meaningful and parsimonious set of predictors for this adverse event. Skill mix, rates of nurse certification, and nurse-educational levels were among the relevant staffing variables in this observational case-controlled study.

In the remaining Menger 166 study, NLP was used to transform clinical notes from the patient’s EHR to develop and validate a multivariable prediction model for the assessment of inpatient violence risk. In this prognostic study, the authors used clinician notes from the admission encounters of over 5,000 patients in one of two different psychiatric settings in The Netherlands. The model training and estimation of predictive validity was done in a nested cross-validation setup in which the outcome of interest—the manifestation of violent behavior within four weeks of admission—was successfully predicted from inpatient violence risk assessment derived from the documentation in this manner. Although a staffing variable was not explicitly or operationally stated, the availability of a nurse (or psychiatrist) to conduct the admission assessment is inferred in this initial encounter from which language within the nursing (and medical) domain is derived.

Discussion

No articles related to Scheduling or Workload were found. In these studies we reviewed, staffing variables were of two types: nurse hours relative to either all staffing or patient load as well as nurse characteristics such as education/certification. Further, studies of the impact nurse staffing may have on patient outcomes should include characteristics of the nurse which are known or hypothesized to have an impact, such as their education, training, and mentoring needs. From a systems perspective, measures of the human capital resources, e.g. nurse hours/patient day or skill mix, should be explicitly stated and for the relevant time partitions up to and including the time of injury, adversity, or other measurement.

Unit Culture

Of the 589 papers yielded in the initial literature search, none of the studies satisfied criteria for being included in the final analysis.

DISCUSSION

Through our literature review, we have identified and described a representative sample of publications focused on the use of data science methods relevant to nurses. All but one of the outcomes for which we searched were associated with at least one paper published in 2019. From a methodological perspective, we noted the use of many contemporary data science methods (e.g., natural language processing, neural networks, and social network analysis) throughout many of the outcomes.

We found a large number of studies exploring Readmissions and Pressure Injuries (PI). Risk prediction modeling for hospital readmission has increased in recent years due to the Affordable Care Act of 2010 and the subsequent Hospital Readmission Reduction Program which has tied financial reimbursement penalties to potentially avoidable hospital readmissions. The high number of PI studies could be attributable to either (a) PI risk scores have existed for many years, so there is ample opportunity for including validated predictors within new analysis frameworks, and/or (b) PIs are regulatory quality indicators associated with malpractice litigation and excess costs.

Conversely, several topics (i.e., Artificial Intelligence/Machine Learning Acceptance, Burnout, Patient Safety, and Unit Culture) were poorly represented and could be areas where there is an opportunity to leverage data science methods in research on these nursing topics. In fact, our Unit Culture search did not reveal any results. While this could be a limitation of our search strategy, it is worth considering that more studies could be performed in this space. Given nurses’ long-standing attention to these latter areas of Burnout, Patient Safety, and Unit Culture, we are hopeful the nursing research and nursing informatics communities will apply data science methods to these problems in the coming years. Additionally, we believe identifying and applying common data elements relevant to nursing across EHR systems in predictive models and including standard nursing terminology codes in the EHR would capture some nuances that provide contextual information about patient health status. The inclusion of these codes could be worth pursuing in future research efforts, as it could be high-yield.

Limitations of our report include the non-exhaustive nature of the literature search and the single-person review process. Given that the intent of the paper was to provide readers with a broad overview of nursing-relevant data science activities, an exhaustive literature search was beyond our purpose. For interested readers, we have published search strategies so that others can reproduce our findings and/or perform an exhaustive literature review. The use of a single-person review helped expedite the process of a year-in-review paper. Additionally, because we are focused on high-level description rather than inferential comparisons, the use of a second reviewer would not have significantly changed our findings.

CONCLUSION

Data science has significant potential to assist healthcare providers in improving the nursing environment, clinical processes, and patient outcomes. By using data science techniques to identify care environment improvement opportunities and/or individual patient risk factors, we create new opportunities to design and implement interventions best able to mitigate risk and improve patient care. The use of data science to understand problems related to nursing and nursing care must include modern methods of investigation and understanding. We hope the studies and reports we have identified and described in this paper will help readers understand the breadth and depth of data science’s ability to improve clinical processes and patient outcomes that are relevant to nurses.