DepressionDC Data Integration for FAIR Data Sharing

Dataset Info
Published on
2026-02-06

Variables
1

Data Access

Data is available only upon formal request and subject to approval.

Approved users receive a secure institute account and work with the data exclusively in our Trusted Research Environment (TRE) via remote desktop.

Request data (Email to us)

Reuse & Usage Terms
  • Data is not downloadable (TRE access only).
  • Approved users receive a personal institute account.
  • Tools available: RStudio, Jupyter, Python, Stata, etc.
  • Data resides in your TRE home directory.
  • Re-use/publication per Data Use Agreement (DUA).
  • No redistribution of the data.
Contact us for the DUA template and details.
Description

Data integration description here.

Available Variables (1)
Event: Screening
Coversheet
  • patid

Analysis Code
Viewing: v2 R Multi-file Archive
Viewing version: v2 (R)
Created by lakshmi.batchu · 2026-02-06 17:10 · Added README.md file
📦 Archive contents
  • data_integration.txt
    documentation · 4923 bytes
    docs
  • example.csv
    data · 3272 bytes
    file
  • DepressionDataDictionaryStruct_DataDictionary_2025-09-20 (1).csv
    data · 24852 bytes
    file
  • README.md
    documentation · 4908 bytes
    docs
🧾 README
# CSV to REDCap Data Integration Workflow

Automated R-based workflow for standardizing and importing CSV data into REDCap databases with validation and type conversion.

## Overview

This workflow handles the complete pipeline from raw CSV files to REDCap import, including:
- Auto-detection of CSV delimiters
- Column name standardization
- REDCap field alignment and type conversion
- Automated API import

## Prerequisites

### Required R Packages

```r
install.packages(c(
  "dplyr",
  "janitor", 
  "readr",
  "stringr",
  "lubridate",
  "httr",
  "jsonlite"
))
```

### Required Files

- **CSV data files**: Source data to be imported
- **REDCap data dictionary** (`Redcap_dictionary.csv`): Export from your REDCap project
- **REDCap API token**: Generated from your REDCap project settings

## Configuration

### 1. Set File Paths

```r
folder_path <- "PATH/TO/YOUR/CSV/FOLDER"
```

### 2. Configure API Credentials

**Security Best Practice**: Never hardcode API tokens in scripts. Use environment variables:

```r
# Set environment variable (run once in R console)
Sys.setenv(REDCAP_API_TOKEN = "your_token_here")

# Use in script
api_token <- Sys.getenv("REDCAP_API_TOKEN")
api_url <- "https://your-redcap-server/api/"
```

## Workflow Steps

### 1. Data Loading

The script automatically:
- Detects CSV delimiter (comma or semicolon)
- Handles UTF-8 encoding
- Loads all CSV files from the specified folder

### 2. Column Standardization

Automatically maps common column names to REDCap system fields:

| Source Column | REDCap Field |
|---------------|--------------|
| `subject` | `subjid_drv` |
| `repeat_number` | `redcap_repeat_instance` |
| `visitid` / `visit_id` | `redcap_event_name` |
| `site` | `redcap_data_access_group` |

**Note**: Column names are cleaned using `janitor::make_clean_names()` (lowercase, underscores, no special characters).

### 3. Data Type Alignment

The workflow reads your REDCap data dictionary and applies validation rules:

- **date_dmy**: Converts to date format (supports multiple input formats)
- **integer**: Converts to integer
- **number**: Converts to numeric

System fields are preserved and not subject to validation conversion.

### 4. REDCap Import

Uses the REDCap API to import data with:
- Flat JSON format
- Normal overwrite behavior (updates existing records)
- Count return for verification

## Usage Example

```r
# Source the script
source("data_integration_workflow.R")

# Load and standardize all CSV files
# (already done automatically when script runs)

# Apply REDCap dictionary alignment to a specific dataset
DM <- data_list[[1]]  # First CSV file
DM_aligned <- align_by_text_validation(DM, redcap_dict)

# Import to REDCap
api_token <- Sys.getenv("REDCAP_API_TOKEN")
api_url <- "https://your-redcap-server/api/"

result <- import_to_redcap(DM_aligned, api_token, api_url)
```

## Data Preprocessing Features

### Special Value Handling

- Empty strings converted to `NA`
- "f.A." values converted to `NA` (common data export artifact)

### Date Format Support

The workflow supports multiple date input formats:
- `dmy`: 31/12/2024
- `dmY`: 31/12/24
- `Ymd`: 2024-12-31
- `Y-m-d`: 2024-12-31
- `d-m-Y`: 31-12-2024

## Error Handling

The import function provides:
- HTTP status code reporting
- Success/failure messages
- Return of API response for debugging

## Security Considerations

⚠️ **Important Security Practices**:

1. **Never commit API tokens to version control**
2. **Use environment variables for credentials**
3. **Restrict file permissions on scripts containing tokens**
4. **Use separate tokens for development/production**
5. **Regularly rotate API tokens**

## Troubleshooting

### Common Issues

**Import fails with validation errors**:
- Check REDCap data dictionary field names match CSV columns
- Verify date formats match expected validation
- Ensure required fields are present

**Column mapping doesn't work**:
- Column names are case-sensitive after cleaning
- Check for typos in source CSV headers
- Review `make_clean_names()` output

**Date conversion fails**:
- Verify date format in source data
- Add additional `orders` to `parse_date_time()` if needed
- Check for invalid dates (e.g., 31/02/2024)

## Workflow Customization

### Adding Custom Column Mappings

```r
# In standardize_columns() function, add:
if ("your_column" %in% names(df)) {
  df <- df %>% rename(redcap_field = your_column)
}
```

### Adding Custom Validation Types

```r
# In align_by_text_validation() function, add:
else if (validation == "custom_type") {
  df[[var]] <- your_conversion_function(df[[var]])
}
```

## Author

**Sowjanya Batchu**

## License

Include appropriate license information for your organization.

---

## Related Documentation

- [REDCap API Documentation](https://redcap.vanderbilt.edu/api/help/)
- [janitor Package](https://sfirke.github.io/janitor/)
- [httr Package](https://httr.r-lib.org/)
Version Timeline (by language)
Version History (detailed)
Version Language Type Relation Author Date
Global v1 (R v1) R Multi-file Archive Initial Implementation lakshmi.batchu 2026-02-06
Global v2 (R v2) selected R Multi-file Archive Refinement/Bug Fix ← Global v1 lakshmi.batchu 2026-02-06
Contact
Lakshmi Batchu
Email
Publisher

Project
DepressionDC