Data is available only upon formal request and subject to approval.
Approved users receive a secure institute account and work with the data exclusively in our Trusted Research Environment (TRE) via remote desktop.
Request data (Email to us)Data Integration, Cleaning, and Quality Assessment
Background
In this project, trial data from the Depression DC study were integrated and prepared for further analysis and dashboard development. The Depression DC study is a longitudinal randomized trial comprising multiple measurements across different time points. The data integration process involved systematic data understanding, preprocessing, transformation into REDCap-compatible format, and comprehensive quality checks.
Data Understanding
To ensure accurate data handling, an initial data exploration phase was conducted:
Data Preparation and Preprocessing
Data Structuring
Variable Standardization
Alignment with REDCap Standards
Data Type Validation
Categorical Variable Recoding
Handling Longitudinal and Repeated Measures
Data Integration into REDCap
A custom function was developed to automate the import of REDCap-compatible datasets into the REDCap system. This ensured consistency, reproducibility, and efficiency in the data integration process.
Data Quality Assessment (Pre-Anonymization)
Data Anonymization and Feature Derivation
To ensure compliance with data protection and sharing requirements:
Data Quality Assessment (Post-Anonymization)
Further validation was performed after anonymization:
Validation of repeated measurements in both:
Summary
The data integration process ensured that heterogeneous longitudinal trial data were transformed into a standardized, REDCap-compatible format. Through systematic preprocessing, validation, and quality checks, a high-quality dataset suitable for analysis and dashboard visualization was established.
patid# CSV to REDCap Data Integration Workflow
Automated R-based workflow for standardizing and importing CSV data into REDCap databases with validation and type conversion.
## Overview
This workflow handles the complete pipeline from raw CSV files to REDCap import, including:
- Auto-detection of CSV delimiters
- Column name standardization
- REDCap field alignment and type conversion
- Automated API import
## Prerequisites
### Required R Packages
```r
install.packages(c(
"dplyr",
"janitor",
"readr",
"stringr",
"lubridate",
"httr",
"jsonlite"
))
```
### Required Files
- **CSV data files**: Source data to be imported
- **REDCap data dictionary** (`Redcap_dictionary.csv`): Export from your REDCap project
- **REDCap API token**: Generated from your REDCap project settings
## Configuration
### 1. Set File Paths
```r
folder_path <- "PATH/TO/YOUR/CSV/FOLDER"
```
### 2. Configure API Credentials
**Security Best Practice**: Never hardcode API tokens in scripts. Use environment variables:
```r
# Set environment variable (run once in R console)
Sys.setenv(REDCAP_API_TOKEN = "your_token_here")
# Use in script
api_token <- Sys.getenv("REDCAP_API_TOKEN")
api_url <- "https://your-redcap-server/api/"
```
## Workflow Steps
### 1. Data Loading
The script automatically:
- Detects CSV delimiter (comma or semicolon)
- Handles UTF-8 encoding
- Loads all CSV files from the specified folder
### 2. Column Standardization
Automatically maps common column names to REDCap system fields:
| Source Column | REDCap Field |
|---------------|--------------|
| `subject` | `subjid_drv` |
| `repeat_number` | `redcap_repeat_instance` |
| `visitid` / `visit_id` | `redcap_event_name` |
| `site` | `redcap_data_access_group` |
**Note**: Column names are cleaned using `janitor::make_clean_names()` (lowercase, underscores, no special characters).
### 3. Data Type Alignment
The workflow reads your REDCap data dictionary and applies validation rules:
- **date_dmy**: Converts to date format (supports multiple input formats)
- **integer**: Converts to integer
- **number**: Converts to numeric
System fields are preserved and not subject to validation conversion.
### 4. REDCap Import
Uses the REDCap API to import data with:
- Flat JSON format
- Normal overwrite behavior (updates existing records)
- Count return for verification
## Usage Example
```r
# Source the script
source("data_integration_workflow.R")
# Load and standardize all CSV files
# (already done automatically when script runs)
# Apply REDCap dictionary alignment to a specific dataset
DM <- data_list[[1]] # First CSV file
DM_aligned <- align_by_text_validation(DM, redcap_dict)
# Import to REDCap
api_token <- Sys.getenv("REDCAP_API_TOKEN")
api_url <- "https://your-redcap-server/api/"
result <- import_to_redcap(DM_aligned, api_token, api_url)
```
## Data Preprocessing Features
### Special Value Handling
- Empty strings converted to `NA`
- "f.A." values converted to `NA` (common data export artifact)
### Date Format Support
The workflow supports multiple date input formats:
- `dmy`: 31/12/2024
- `dmY`: 31/12/24
- `Ymd`: 2024-12-31
- `Y-m-d`: 2024-12-31
- `d-m-Y`: 31-12-2024
## Error Handling
The import function provides:
- HTTP status code reporting
- Success/failure messages
- Return of API response for debugging
## Security Considerations
⚠️ **Important Security Practices**:
1. **Never commit API tokens to version control**
2. **Use environment variables for credentials**
3. **Restrict file permissions on scripts containing tokens**
4. **Use separate tokens for development/production**
5. **Regularly rotate API tokens**
## Troubleshooting
### Common Issues
**Import fails with validation errors**:
- Check REDCap data dictionary field names match CSV columns
- Verify date formats match expected validation
- Ensure required fields are present
**Column mapping doesn't work**:
- Column names are case-sensitive after cleaning
- Check for typos in source CSV headers
- Review `make_clean_names()` output
**Date conversion fails**:
- Verify date format in source data
- Add additional `orders` to `parse_date_time()` if needed
- Check for invalid dates (e.g., 31/02/2024)
## Workflow Customization
### Adding Custom Column Mappings
```r
# In standardize_columns() function, add:
if ("your_column" %in% names(df)) {
df <- df %>% rename(redcap_field = your_column)
}
```
### Adding Custom Validation Types
```r
# In align_by_text_validation() function, add:
else if (validation == "custom_type") {
df[[var]] <- your_conversion_function(df[[var]])
}
```
## Author
**Sowjanya Batchu**
## License
Include appropriate license information for your organization.
---
## Related Documentation
- [REDCap API Documentation](https://redcap.vanderbilt.edu/api/help/)
- [janitor Package](https://sfirke.github.io/janitor/)
- [httr Package](https://httr.r-lib.org/)
| Version | Language | Type | Relation | Author | Date |
|---|---|---|---|---|---|
| Global v1 (R v1) | R | Multi-file Archive | Initial Implementation | lakshmi.batchu | 2026-02-06 |
| Global v2 (R v2) | R | Multi-file Archive | Refinement/Bug Fix ← Global v1 | lakshmi.batchu | 2026-02-06 |
| Global v3 (R v3) default selected | R | Multi-file Archive | Refinement/Bug Fix ← Global v2 | lakshmi.batchu | 2026-03-16 |