You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

4.8 KiB

FHIR-Based Healthcare Document Processing POC

This proof-of-concept application demonstrates a secure, FHIR-compliant system for processing healthcare documents with OCR and storing the extracted data via a FHIR API.

Project Structure

  • ocr_module/: OCR functionality using Tesseract for document text extraction
  • fhir_module/: FHIR API implementation using HAPI FHIR for data storage and retrieval
  • security_module/: Authentication and authorization using OAuth2/OpenID Connect
  • compliance_module/: Audit logging and compliance features
  • api/: Main application API that integrates all modules
  • docker/: Docker and docker-compose configuration files

Features

  • Document processing with OCR to extract healthcare information
  • FHIR-compliant data storage and retrieval
  • OAuth2/OpenID Connect authentication
  • Comprehensive audit logging
  • Role-based access control
  • Local deployment with Docker

Requirements

  • Docker and Docker Compose
  • Python 3.9+
  • Tesseract OCR engine (automatically installed in Docker)

Getting Started

The easiest way to run the application is using Docker Compose:

  1. Clone the repository
  2. Navigate to the project root directory
  3. Run Docker Compose:
docker-compose up

This will start the following services:

Manual Setup

If you prefer to run the application without Docker:

  1. Install Tesseract OCR engine on your system:

    • Ubuntu/Debian: sudo apt-get install tesseract-ocr
    • macOS: brew install tesseract
    • Windows: Download installer from Tesseract GitHub page
  2. Create and activate a Python virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install Python dependencies:
pip install -r requirements.txt
  1. Run the application:
uvicorn api.app:app --host 0.0.0.0 --port 8000

Usage

1. Authentication

To access the API, you need to obtain an authentication token:

curl -X POST http://localhost:8000/auth/token \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "password"}'

This will return a JSON response with an access token:

{
  "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "bearer"
}

2. Processing a Document

Use the token to upload and process a document:

curl -X POST http://localhost:8000/ocr/process \
  -H "Authorization: Bearer YOUR_TOKEN_HERE" \
  -F "file=@/path/to/document.jpg" \
  -F "process_as=insurance_card"

The API will return the OCR results and the IDs of the created FHIR resources.

3. Retrieving FHIR Resources

Get a patient resource:

curl -X GET http://localhost:8000/fhir/Patient/PATIENT_ID \
  -H "Authorization: Bearer YOUR_TOKEN_HERE"

Get an observation resource:

curl -X GET http://localhost:8000/fhir/Observation/OBSERVATION_ID \
  -H "Authorization: Bearer YOUR_TOKEN_HERE"

4. Testing the OCR Flow Locally

For testing the OCR to FHIR flow without the API, use the provided test script:

python test_ocr_flow.py --image sample_data/your_image.jpg --output test_results

Keycloak Setup

The Docker Compose configuration includes a Keycloak server for authentication. For production use, you would need to:

  1. Access the Keycloak admin console at http://localhost:8181
  2. Log in with username admin and password admin
  3. Create a new realm (e.g., fhir-ocr)
  4. Create a new client (e.g., fhir-ocr-client)
  5. Configure client access type as "confidential"
  6. Add redirect URIs for your application
  7. Create roles (e.g., user, admin)
  8. Create users and assign roles

Security and Compliance

The application includes several security and compliance features:

  • Authentication: OAuth2/OpenID Connect with Keycloak
  • Authorization: Role-based access control
  • Audit Logging: All API calls and data access are logged
  • Privacy Filtering: Sensitive data can be masked or redacted

Docker Environment Variables

The following environment variables can be set in the docker-compose.yml file:

  • ENVIRONMENT: development or production
  • JWT_SECRET_KEY: Secret key for JWT token signing
  • JWT_ALGORITHM: Algorithm for JWT token signing
  • ACCESS_TOKEN_EXPIRE_MINUTES: Token expiration time in minutes

API Documentation

API documentation is available at http://localhost:8000/docs when the application is running.

License

This project uses open-source components:

  • Tesseract OCR (Apache License 2.0)
  • HAPI FHIR (Apache License 2.0)
  • Keycloak (Apache License 2.0)