- Created a Docker Compose setup for the FHIR OCR application, including services for the main app, Keycloak for authentication, and a HAPI FHIR server. - Added a README file detailing project structure, features, requirements, and setup instructions. - Included necessary Python dependencies in requirements.txt. - Implemented core modules for OCR processing, FHIR resource mapping, and security features. - Developed test scripts for API security and OCR to FHIR flow. - Established compliance and privacy modules for audit logging and data protection. - Created sample data generation script for testing purposes. - Set up a basic FastAPI application structure with endpoints for authentication and FHIR resource management. |
1 year ago | |
|---|---|---|
| .. | ||
| api | 1 year ago | |
| compliance_module | 1 year ago | |
| docker | 1 year ago | |
| fhir_module | 1 year ago | |
| ocr_module | 1 year ago | |
| sample_data | 1 year ago | |
| security_module | 1 year ago | |
| README.md | 1 year ago | |
| docker-compose.yml | 1 year ago | |
| requirements.txt | 1 year ago | |
| test_api_security.py | 1 year ago | |
| test_ocr_flow.py | 1 year ago | |
README.md
FHIR-Based Healthcare Document Processing POC
This proof-of-concept application demonstrates a secure, FHIR-compliant system for processing healthcare documents with OCR and storing the extracted data via a FHIR API.
Project Structure
ocr_module/: OCR functionality using Tesseract for document text extractionfhir_module/: FHIR API implementation using HAPI FHIR for data storage and retrievalsecurity_module/: Authentication and authorization using OAuth2/OpenID Connectcompliance_module/: Audit logging and compliance featuresapi/: Main application API that integrates all modulesdocker/: Docker and docker-compose configuration files
Features
- Document processing with OCR to extract healthcare information
- FHIR-compliant data storage and retrieval
- OAuth2/OpenID Connect authentication
- Comprehensive audit logging
- Role-based access control
- Local deployment with Docker
Requirements
- Docker and Docker Compose
- Python 3.9+
- Tesseract OCR engine (automatically installed in Docker)
Getting Started
Using Docker Compose (Recommended)
The easiest way to run the application is using Docker Compose:
- Clone the repository
- Navigate to the project root directory
- Run Docker Compose:
docker-compose up
This will start the following services:
- FHIR OCR application at http://localhost:8000
- Keycloak authentication server at http://localhost:8181
- HAPI FHIR server at http://localhost:8090 (included but not integrated in the POC)
Manual Setup
If you prefer to run the application without Docker:
-
Install Tesseract OCR engine on your system:
- Ubuntu/Debian:
sudo apt-get install tesseract-ocr - macOS:
brew install tesseract - Windows: Download installer from Tesseract GitHub page
- Ubuntu/Debian:
-
Create and activate a Python virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install Python dependencies:
pip install -r requirements.txt
- Run the application:
uvicorn api.app:app --host 0.0.0.0 --port 8000
Usage
1. Authentication
To access the API, you need to obtain an authentication token:
curl -X POST http://localhost:8000/auth/token \
-H "Content-Type: application/json" \
-d '{"username": "admin", "password": "password"}'
This will return a JSON response with an access token:
{
"access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"token_type": "bearer"
}
2. Processing a Document
Use the token to upload and process a document:
curl -X POST http://localhost:8000/ocr/process \
-H "Authorization: Bearer YOUR_TOKEN_HERE" \
-F "file=@/path/to/document.jpg" \
-F "process_as=insurance_card"
The API will return the OCR results and the IDs of the created FHIR resources.
3. Retrieving FHIR Resources
Get a patient resource:
curl -X GET http://localhost:8000/fhir/Patient/PATIENT_ID \
-H "Authorization: Bearer YOUR_TOKEN_HERE"
Get an observation resource:
curl -X GET http://localhost:8000/fhir/Observation/OBSERVATION_ID \
-H "Authorization: Bearer YOUR_TOKEN_HERE"
4. Testing the OCR Flow Locally
For testing the OCR to FHIR flow without the API, use the provided test script:
python test_ocr_flow.py --image sample_data/your_image.jpg --output test_results
Keycloak Setup
The Docker Compose configuration includes a Keycloak server for authentication. For production use, you would need to:
- Access the Keycloak admin console at http://localhost:8181
- Log in with username
adminand passwordadmin - Create a new realm (e.g.,
fhir-ocr) - Create a new client (e.g.,
fhir-ocr-client) - Configure client access type as "confidential"
- Add redirect URIs for your application
- Create roles (e.g.,
user,admin) - Create users and assign roles
Security and Compliance
The application includes several security and compliance features:
- Authentication: OAuth2/OpenID Connect with Keycloak
- Authorization: Role-based access control
- Audit Logging: All API calls and data access are logged
- Privacy Filtering: Sensitive data can be masked or redacted
Docker Environment Variables
The following environment variables can be set in the docker-compose.yml file:
ENVIRONMENT:developmentorproductionJWT_SECRET_KEY: Secret key for JWT token signingJWT_ALGORITHM: Algorithm for JWT token signingACCESS_TOKEN_EXPIRE_MINUTES: Token expiration time in minutes
API Documentation
API documentation is available at http://localhost:8000/docs when the application is running.
License
This project uses open-source components:
- Tesseract OCR (Apache License 2.0)
- HAPI FHIR (Apache License 2.0)
- Keycloak (Apache License 2.0)