# FHIR-Based Healthcare Document Processing POC This proof-of-concept application demonstrates a secure, FHIR-compliant system for processing healthcare documents with OCR and storing the extracted data via a FHIR API. ## Project Structure - `ocr_module/`: OCR functionality using Tesseract for document text extraction - `fhir_module/`: FHIR API implementation using HAPI FHIR for data storage and retrieval - `security_module/`: Authentication and authorization using OAuth2/OpenID Connect - `compliance_module/`: Audit logging and compliance features - `api/`: Main application API that integrates all modules - `docker/`: Docker and docker-compose configuration files ## Features - Document processing with OCR to extract healthcare information - FHIR-compliant data storage and retrieval - OAuth2/OpenID Connect authentication - Comprehensive audit logging - Role-based access control - Local deployment with Docker ## Requirements - Docker and Docker Compose - Python 3.9+ - Tesseract OCR engine (automatically installed in Docker) ## Getting Started ### Using Docker Compose (Recommended) The easiest way to run the application is using Docker Compose: 1. Clone the repository 2. Navigate to the project root directory 3. Run Docker Compose: ```bash docker-compose up ``` This will start the following services: - FHIR OCR application at http://localhost:8000 - Keycloak authentication server at http://localhost:8181 - HAPI FHIR server at http://localhost:8090 (included but not integrated in the POC) ### Manual Setup If you prefer to run the application without Docker: 1. Install Tesseract OCR engine on your system: - **Ubuntu/Debian**: `sudo apt-get install tesseract-ocr` - **macOS**: `brew install tesseract` - **Windows**: Download installer from [Tesseract GitHub page](https://github.com/UB-Mannheim/tesseract/wiki) 2. Create and activate a Python virtual environment: ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 3. Install Python dependencies: ```bash pip install -r requirements.txt ``` 4. Run the application: ```bash uvicorn api.app:app --host 0.0.0.0 --port 8000 ``` ## Usage ### 1. Authentication To access the API, you need to obtain an authentication token: ```bash curl -X POST http://localhost:8000/auth/token \ -H "Content-Type: application/json" \ -d '{"username": "admin", "password": "password"}' ``` This will return a JSON response with an access token: ```json { "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...", "token_type": "bearer" } ``` ### 2. Processing a Document Use the token to upload and process a document: ```bash curl -X POST http://localhost:8000/ocr/process \ -H "Authorization: Bearer YOUR_TOKEN_HERE" \ -F "file=@/path/to/document.jpg" \ -F "process_as=insurance_card" ``` The API will return the OCR results and the IDs of the created FHIR resources. ### 3. Retrieving FHIR Resources Get a patient resource: ```bash curl -X GET http://localhost:8000/fhir/Patient/PATIENT_ID \ -H "Authorization: Bearer YOUR_TOKEN_HERE" ``` Get an observation resource: ```bash curl -X GET http://localhost:8000/fhir/Observation/OBSERVATION_ID \ -H "Authorization: Bearer YOUR_TOKEN_HERE" ``` ### 4. Testing the OCR Flow Locally For testing the OCR to FHIR flow without the API, use the provided test script: ```bash python test_ocr_flow.py --image sample_data/your_image.jpg --output test_results ``` ## Keycloak Setup The Docker Compose configuration includes a Keycloak server for authentication. For production use, you would need to: 1. Access the Keycloak admin console at http://localhost:8181 2. Log in with username `admin` and password `admin` 3. Create a new realm (e.g., `fhir-ocr`) 4. Create a new client (e.g., `fhir-ocr-client`) 5. Configure client access type as "confidential" 6. Add redirect URIs for your application 7. Create roles (e.g., `user`, `admin`) 8. Create users and assign roles ## Security and Compliance The application includes several security and compliance features: - **Authentication**: OAuth2/OpenID Connect with Keycloak - **Authorization**: Role-based access control - **Audit Logging**: All API calls and data access are logged - **Privacy Filtering**: Sensitive data can be masked or redacted ## Docker Environment Variables The following environment variables can be set in the docker-compose.yml file: - `ENVIRONMENT`: `development` or `production` - `JWT_SECRET_KEY`: Secret key for JWT token signing - `JWT_ALGORITHM`: Algorithm for JWT token signing - `ACCESS_TOKEN_EXPIRE_MINUTES`: Token expiration time in minutes ## API Documentation API documentation is available at http://localhost:8000/docs when the application is running. ## License This project uses open-source components: - Tesseract OCR (Apache License 2.0) - HAPI FHIR (Apache License 2.0) - Keycloak (Apache License 2.0)