You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
NORM/FHIR_OCR_POC
jac is jake c25548a2d7 Add initial implementation of FHIR OCR POC
- Created a Docker Compose setup for the FHIR OCR application, including services for the main app, Keycloak for authentication, and a HAPI FHIR server.
- Added a README file detailing project structure, features, requirements, and setup instructions.
- Included necessary Python dependencies in requirements.txt.
- Implemented core modules for OCR processing, FHIR resource mapping, and security features.
- Developed test scripts for API security and OCR to FHIR flow.
- Established compliance and privacy modules for audit logging and data protection.
- Created sample data generation script for testing purposes.
- Set up a basic FastAPI application structure with endpoints for authentication and FHIR resource management.
1 year ago
..
api Add initial implementation of FHIR OCR POC 1 year ago
compliance_module Add initial implementation of FHIR OCR POC 1 year ago
docker Add initial implementation of FHIR OCR POC 1 year ago
fhir_module Add initial implementation of FHIR OCR POC 1 year ago
ocr_module Add initial implementation of FHIR OCR POC 1 year ago
sample_data Add initial implementation of FHIR OCR POC 1 year ago
security_module Add initial implementation of FHIR OCR POC 1 year ago
README.md Add initial implementation of FHIR OCR POC 1 year ago
docker-compose.yml Add initial implementation of FHIR OCR POC 1 year ago
requirements.txt Add initial implementation of FHIR OCR POC 1 year ago
test_api_security.py Add initial implementation of FHIR OCR POC 1 year ago
test_ocr_flow.py Add initial implementation of FHIR OCR POC 1 year ago

README.md

FHIR-Based Healthcare Document Processing POC

This proof-of-concept application demonstrates a secure, FHIR-compliant system for processing healthcare documents with OCR and storing the extracted data via a FHIR API.

Project Structure

  • ocr_module/: OCR functionality using Tesseract for document text extraction
  • fhir_module/: FHIR API implementation using HAPI FHIR for data storage and retrieval
  • security_module/: Authentication and authorization using OAuth2/OpenID Connect
  • compliance_module/: Audit logging and compliance features
  • api/: Main application API that integrates all modules
  • docker/: Docker and docker-compose configuration files

Features

  • Document processing with OCR to extract healthcare information
  • FHIR-compliant data storage and retrieval
  • OAuth2/OpenID Connect authentication
  • Comprehensive audit logging
  • Role-based access control
  • Local deployment with Docker

Requirements

  • Docker and Docker Compose
  • Python 3.9+
  • Tesseract OCR engine (automatically installed in Docker)

Getting Started

The easiest way to run the application is using Docker Compose:

  1. Clone the repository
  2. Navigate to the project root directory
  3. Run Docker Compose:
docker-compose up

This will start the following services:

Manual Setup

If you prefer to run the application without Docker:

  1. Install Tesseract OCR engine on your system:

    • Ubuntu/Debian: sudo apt-get install tesseract-ocr
    • macOS: brew install tesseract
    • Windows: Download installer from Tesseract GitHub page
  2. Create and activate a Python virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install Python dependencies:
pip install -r requirements.txt
  1. Run the application:
uvicorn api.app:app --host 0.0.0.0 --port 8000

Usage

1. Authentication

To access the API, you need to obtain an authentication token:

curl -X POST http://localhost:8000/auth/token \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "password"}'

This will return a JSON response with an access token:

{
  "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "bearer"
}

2. Processing a Document

Use the token to upload and process a document:

curl -X POST http://localhost:8000/ocr/process \
  -H "Authorization: Bearer YOUR_TOKEN_HERE" \
  -F "file=@/path/to/document.jpg" \
  -F "process_as=insurance_card"

The API will return the OCR results and the IDs of the created FHIR resources.

3. Retrieving FHIR Resources

Get a patient resource:

curl -X GET http://localhost:8000/fhir/Patient/PATIENT_ID \
  -H "Authorization: Bearer YOUR_TOKEN_HERE"

Get an observation resource:

curl -X GET http://localhost:8000/fhir/Observation/OBSERVATION_ID \
  -H "Authorization: Bearer YOUR_TOKEN_HERE"

4. Testing the OCR Flow Locally

For testing the OCR to FHIR flow without the API, use the provided test script:

python test_ocr_flow.py --image sample_data/your_image.jpg --output test_results

Keycloak Setup

The Docker Compose configuration includes a Keycloak server for authentication. For production use, you would need to:

  1. Access the Keycloak admin console at http://localhost:8181
  2. Log in with username admin and password admin
  3. Create a new realm (e.g., fhir-ocr)
  4. Create a new client (e.g., fhir-ocr-client)
  5. Configure client access type as "confidential"
  6. Add redirect URIs for your application
  7. Create roles (e.g., user, admin)
  8. Create users and assign roles

Security and Compliance

The application includes several security and compliance features:

  • Authentication: OAuth2/OpenID Connect with Keycloak
  • Authorization: Role-based access control
  • Audit Logging: All API calls and data access are logged
  • Privacy Filtering: Sensitive data can be masked or redacted

Docker Environment Variables

The following environment variables can be set in the docker-compose.yml file:

  • ENVIRONMENT: development or production
  • JWT_SECRET_KEY: Secret key for JWT token signing
  • JWT_ALGORITHM: Algorithm for JWT token signing
  • ACCESS_TOKEN_EXPIRE_MINUTES: Token expiration time in minutes

API Documentation

API documentation is available at http://localhost:8000/docs when the application is running.

License

This project uses open-source components:

  • Tesseract OCR (Apache License 2.0)
  • HAPI FHIR (Apache License 2.0)
  • Keycloak (Apache License 2.0)