Deployment and Operations
Effective deployment and operations ensure that your Documentation as Code implementation runs reliably in production. This section covers deployment strategies, monitoring, maintenance, and operational best practices for DocFX sites hosted on Azure App Service.
Deployment Overview
Architecture Components
graph TB
A[Azure DevOps Repository] --> B[Azure Pipeline]
B --> C[Build Agent]
C --> D[DocFX Build]
D --> E[Static Files]
E --> F[Azure App Service]
F --> G[Users]
H[Monitoring] --> F
I[Analytics] --> F
J[CDN] --> F
K[Custom Domain] --> F
subgraph "Production Environment"
F
H
I
J
K
end
subgraph "Staging Environment"
L[Staging App Service]
M[Staging Monitoring]
end
E --> L
Deployment Environments
Development Environment:
- Purpose: Local development and testing
- Location: Developer workstations
- Characteristics: Fast iteration, debugging enabled
- Access: Development team only
Staging Environment:
- Purpose: Pre-production validation and testing
- Location: Azure App Service (staging slot)
- Characteristics: Production-like configuration
- Access: Development team and stakeholders
Production Environment:
- Purpose: Live documentation for end users
- Location: Azure App Service (production slot)
- Characteristics: High availability, monitoring, CDN
- Access: Public or authenticated users
Deployment Strategies
Blue-Green Deployment
Implementation with Azure App Service Slots:
# azure-pipelines/deploy-production.yml
trigger:
branches:
include:
- main
variables:
azureSubscription: 'Production-Subscription'
resourceGroup: 'docs-production-rg'
appServiceName: 'company-docs-prod'
stages:
- stage: BuildAndTest
displayName: 'Build and Test'
jobs:
- job: Build
displayName: 'Build Documentation'
pool:
vmImage: 'ubuntu-latest'
steps:
- task: DocFxTask@0
inputs:
solution: 'docfx.json'
command: 'build'
displayName: 'Build DocFX site'
- task: PublishPipelineArtifact@1
inputs:
targetPath: '_site'
artifact: 'documentation-site'
displayName: 'Publish build artifacts'
- stage: DeployStaging
displayName: 'Deploy to Staging'
dependsOn: BuildAndTest
jobs:
- deployment: DeployToStaging
displayName: 'Deploy to Staging Slot'
environment: 'staging'
strategy:
runOnce:
deploy:
steps:
- task: AzureWebApp@1
inputs:
azureSubscription: $(azureSubscription)
appType: 'webApp'
appName: $(appServiceName)
slotName: 'staging'
package: $(Pipeline.Workspace)/documentation-site
displayName: 'Deploy to staging slot'
- task: AzureAppServiceManage@0
inputs:
azureSubscription: $(azureSubscription)
action: 'Start Azure App Service'
webAppName: $(appServiceName)
specifySlotOrASE: true
slot: 'staging'
displayName: 'Start staging slot'
- stage: ProductionValidation
displayName: 'Production Validation'
dependsOn: DeployStaging
jobs:
- job: SmokeTests
displayName: 'Smoke Tests'
steps:
- script: |
# Run smoke tests against staging
python scripts/smoke-tests.py https://$(appServiceName)-staging.azurewebsites.net
displayName: 'Execute smoke tests'
- script: |
# Performance validation
python scripts/performance-check.py https://$(appServiceName)-staging.azurewebsites.net
displayName: 'Performance validation'
- stage: ProductionDeployment
displayName: 'Production Deployment'
dependsOn: ProductionValidation
condition: succeeded()
jobs:
- deployment: SwapToProduction
displayName: 'Swap to Production'
environment: 'production'
strategy:
runOnce:
deploy:
steps:
- task: AzureAppServiceManage@0
inputs:
azureSubscription: $(azureSubscription)
action: 'Swap Slots'
webAppName: $(appServiceName)
sourceSlot: 'staging'
targetSlot: 'production'
displayName: 'Swap staging to production'
- script: |
# Post-deployment validation
python scripts/post-deploy-check.py https://$(appServiceName).azurewebsites.net
displayName: 'Post-deployment validation'
Rolling Deployment
For multi-instance deployments:
# Rolling deployment strategy
strategy:
rolling:
maxParallel: 50%
preDeploy:
steps:
- script: echo "Pre-deployment health check"
deploy:
steps:
- task: AzureWebApp@1
inputs:
azureSubscription: $(azureSubscription)
appName: $(appServiceName)
package: $(Pipeline.Workspace)/documentation-site
postDeploy:
steps:
- script: |
# Health check after each instance
python scripts/health-check.py $(appServiceName)
Canary Deployment
Gradual traffic shifting:
# Canary deployment with traffic management
- stage: CanaryDeployment
jobs:
- deployment: CanaryRelease
environment: 'production-canary'
strategy:
canary:
increments: [10, 25, 50, 100]
preDeploy:
steps:
- script: echo "Pre-canary deployment"
deploy:
steps:
- task: AzureWebApp@1
inputs:
azureSubscription: $(azureSubscription)
appName: $(appServiceName)
package: $(Pipeline.Workspace)/documentation-site
postDeploy:
steps:
- script: |
# Monitor metrics during canary
python scripts/canary-metrics.py --percentage $(strategy.increments)
Infrastructure as Code
Azure Resource Manager Templates
App Service Infrastructure:
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"siteName": {
"type": "string",
"metadata": {
"description": "Name of the documentation site"
}
},
"environment": {
"type": "string",
"allowedValues": ["dev", "staging", "prod"],
"defaultValue": "dev"
}
},
"variables": {
"appServicePlanName": "[concat(parameters('siteName'), '-plan-', parameters('environment'))]",
"appServiceName": "[concat(parameters('siteName'), '-', parameters('environment'))]"
},
"resources": [
{
"type": "Microsoft.Web/serverfarms",
"apiVersion": "2021-02-01",
"name": "[variables('appServicePlanName')]",
"location": "[resourceGroup().location]",
"sku": {
"name": "[if(equals(parameters('environment'), 'prod'), 'S1', 'B1')]",
"tier": "[if(equals(parameters('environment'), 'prod'), 'Standard', 'Basic')]"
},
"properties": {
"reserved": false
}
},
{
"type": "Microsoft.Web/sites",
"apiVersion": "2021-02-01",
"name": "[variables('appServiceName')]",
"location": "[resourceGroup().location]",
"dependsOn": [
"[resourceId('Microsoft.Web/serverfarms', variables('appServicePlanName'))]"
],
"properties": {
"serverFarmId": "[resourceId('Microsoft.Web/serverfarms', variables('appServicePlanName'))]",
"siteConfig": {
"defaultDocuments": ["index.html"],
"httpLoggingEnabled": true,
"requestTracingEnabled": true,
"detailedErrorLoggingEnabled": true
}
}
}
],
"outputs": {
"websiteUrl": {
"type": "string",
"value": "[concat('https://', reference(variables('appServiceName')).defaultHostName)]"
}
}
}
Bicep Templates
Modern ARM template alternative:
// main.bicep
@description('Name of the documentation site')
param siteName string
@description('Environment name')
@allowed(['dev', 'staging', 'prod'])
param environment string = 'dev'
@description('Location for all resources')
param location string = resourceGroup().location
var appServicePlanName = '${siteName}-plan-${environment}'
var appServiceName = '${siteName}-${environment}'
var sku = environment == 'prod' ? 'S1' : 'B1'
var tier = environment == 'prod' ? 'Standard' : 'Basic'
resource appServicePlan 'Microsoft.Web/serverfarms@2021-02-01' = {
name: appServicePlanName
location: location
sku: {
name: sku
tier: tier
}
properties: {
reserved: false
}
}
resource appService 'Microsoft.Web/sites@2021-02-01' = {
name: appServiceName
location: location
properties: {
serverFarmId: appServicePlan.id
siteConfig: {
defaultDocuments: ['index.html']
httpLoggingEnabled: true
requestTracingEnabled: true
detailedErrorLoggingEnabled: true
minTlsVersion: '1.2'
ftpsState: 'Disabled'
}
}
}
// Production-specific resources
resource stagingSlot 'Microsoft.Web/sites/slots@2021-02-01' = if (environment == 'prod') {
name: 'staging'
parent: appService
location: location
properties: {
serverFarmId: appServicePlan.id
siteConfig: appService.properties.siteConfig
}
}
output websiteUrl string = 'https://${appService.properties.defaultHostName}'
output stagingUrl string = environment == 'prod' ? 'https://${stagingSlot.properties.defaultHostName}' : ''
Configuration Management
Application Settings
Environment-specific configuration:
# azure-pipelines/configure-app-settings.yml
parameters:
- name: environment
type: string
- name: appServiceName
type: string
steps:
- task: AzureAppServiceSettings@1
inputs:
azureSubscription: $(azureSubscription)
appName: ${{ parameters.appServiceName }}
resourceGroupName: $(resourceGroup)
appSettings: |
[
{
"name": "WEBSITE_NODE_DEFAULT_VERSION",
"value": "18.x"
},
{
"name": "DOCS_ENVIRONMENT",
"value": "${{ parameters.environment }}"
},
{
"name": "DOCS_VERSION",
"value": "$(Build.BuildNumber)"
},
{
"name": "ENABLE_ORYX_BUILD",
"value": "false"
},
{
"name": "SCM_DO_BUILD_DURING_DEPLOYMENT",
"value": "false"
}
]
Connection Strings
Secure configuration for external services:
connectionStrings: |
[
{
"name": "ApplicationInsights",
"connectionString": "$(ApplicationInsights.ConnectionString)",
"type": "Custom"
},
{
"name": "Storage",
"connectionString": "$(Storage.ConnectionString)",
"type": "Custom"
}
]
Deployment Validation
Smoke Tests
Post-deployment validation:
#!/usr/bin/env python3
# scripts/smoke-tests.py
import requests
import sys
import time
from urllib.parse import urljoin
class SmokeTests:
def __init__(self, base_url):
self.base_url = base_url.rstrip('/')
self.session = requests.Session()
self.session.timeout = 30
def test_homepage(self):
"""Test homepage loads successfully."""
response = self.session.get(self.base_url)
assert response.status_code == 200, f"Homepage failed: {response.status_code}"
assert 'Documentation' in response.text, "Homepage missing expected content"
print("✅ Homepage test passed")
def test_search_functionality(self):
"""Test search endpoint."""
search_url = urljoin(self.base_url, '/search')
response = self.session.get(search_url)
assert response.status_code in [200, 404], f"Search test failed: {response.status_code}"
print("✅ Search functionality test passed")
def test_api_documentation(self):
"""Test API documentation pages."""
api_url = urljoin(self.base_url, '/api/')
response = self.session.get(api_url)
assert response.status_code == 200, f"API docs failed: {response.status_code}"
print("✅ API documentation test passed")
def test_critical_pages(self):
"""Test critical documentation pages."""
critical_pages = [
'/docs/getting-started/',
'/docs/tutorials/',
'/docs/reference/'
]
for page in critical_pages:
url = urljoin(self.base_url, page)
response = self.session.get(url)
assert response.status_code == 200, f"Critical page {page} failed: {response.status_code}"
print("✅ Critical pages test passed")
def test_performance(self):
"""Basic performance validation."""
start_time = time.time()
response = self.session.get(self.base_url)
load_time = time.time() - start_time
assert response.status_code == 200, "Homepage not accessible"
assert load_time < 5.0, f"Homepage too slow: {load_time:.2f}s"
print(f"✅ Performance test passed (load time: {load_time:.2f}s)")
def run_all_tests(self):
"""Run all smoke tests."""
tests = [
self.test_homepage,
self.test_search_functionality,
self.test_api_documentation,
self.test_critical_pages,
self.test_performance
]
print(f"🧪 Running smoke tests against {self.base_url}")
for test in tests:
try:
test()
except Exception as e:
print(f"❌ Test failed: {test.__name__} - {e}")
sys.exit(1)
print("🎉 All smoke tests passed!")
def main():
if len(sys.argv) != 2:
print("Usage: python smoke-tests.py <base_url>")
sys.exit(1)
base_url = sys.argv[1]
tester = SmokeTests(base_url)
tester.run_all_tests()
if __name__ == "__main__":
main()
Performance Validation
Load time and performance checks:
#!/usr/bin/env python3
# scripts/performance-check.py
import requests
import time
import statistics
from concurrent.futures import ThreadPoolExecutor
import sys
def measure_page_load(url, iterations=5):
"""Measure page load time over multiple iterations."""
times = []
for _ in range(iterations):
start = time.time()
response = requests.get(url, timeout=30)
end = time.time()
if response.status_code == 200:
times.append(end - start)
else:
print(f"❌ Request failed: {response.status_code}")
return None
return {
'url': url,
'mean': statistics.mean(times),
'median': statistics.median(times),
'min': min(times),
'max': max(times),
'iterations': iterations
}
def concurrent_load_test(url, concurrent_users=10, duration_seconds=30):
"""Test concurrent load handling."""
results = []
start_time = time.time()
def make_request():
try:
start = time.time()
response = requests.get(url, timeout=10)
end = time.time()
return {
'status_code': response.status_code,
'response_time': end - start,
'success': response.status_code == 200
}
except Exception as e:
return {
'status_code': 0,
'response_time': 0,
'success': False,
'error': str(e)
}
with ThreadPoolExecutor(max_workers=concurrent_users) as executor:
futures = []
while time.time() - start_time < duration_seconds:
future = executor.submit(make_request)
futures.append(future)
time.sleep(0.1) # Stagger requests
for future in futures:
results.append(future.result())
success_count = sum(1 for r in results if r['success'])
total_requests = len(results)
success_rate = success_count / total_requests if total_requests > 0 else 0
successful_times = [r['response_time'] for r in results if r['success']]
avg_response_time = statistics.mean(successful_times) if successful_times else 0
return {
'total_requests': total_requests,
'successful_requests': success_count,
'success_rate': success_rate,
'average_response_time': avg_response_time,
'duration': duration_seconds,
'concurrent_users': concurrent_users
}
def main():
if len(sys.argv) != 2:
print("Usage: python performance-check.py <base_url>")
sys.exit(1)
base_url = sys.argv[1].rstrip('/')
print(f"🚀 Performance testing {base_url}")
# Single page load test
print("\n📊 Page Load Performance:")
load_result = measure_page_load(base_url)
if load_result:
print(f" Mean load time: {load_result['mean']:.2f}s")
print(f" Median load time: {load_result['median']:.2f}s")
print(f" Min/Max: {load_result['min']:.2f}s / {load_result['max']:.2f}s")
# Performance assertions
if load_result['mean'] > 3.0:
print("⚠️ Warning: Average load time exceeds 3 seconds")
if load_result['max'] > 5.0:
print("❌ Error: Maximum load time exceeds 5 seconds")
sys.exit(1)
# Concurrent load test
print("\n🔄 Concurrent Load Test:")
concurrent_result = concurrent_load_test(base_url)
print(f" Total requests: {concurrent_result['total_requests']}")
print(f" Success rate: {concurrent_result['success_rate']:.2%}")
print(f" Average response time: {concurrent_result['average_response_time']:.2f}s")
# Load test assertions
if concurrent_result['success_rate'] < 0.95:
print("❌ Error: Success rate below 95%")
sys.exit(1)
if concurrent_result['average_response_time'] > 2.0:
print("⚠️ Warning: Average response time under load exceeds 2 seconds")
print("✅ Performance tests completed successfully")
if __name__ == "__main__":
main()
Rollback Procedures
Automated Rollback
Pipeline-based rollback:
# azure-pipelines/rollback.yml
parameters:
- name: targetSlot
displayName: 'Rollback Target'
type: string
default: 'previous'
values:
- previous
- staging
- specific-version
- name: specificVersion
displayName: 'Specific Version (if selected above)'
type: string
default: ''
trigger: none
variables:
azureSubscription: 'Production-Subscription'
resourceGroup: 'docs-production-rg'
appServiceName: 'company-docs-prod'
stages:
- stage: RollbackValidation
displayName: 'Validate Rollback'
jobs:
- job: ValidateRollback
steps:
- script: |
echo "Validating rollback target: ${{ parameters.targetSlot }}"
# Add validation logic here
displayName: 'Validate rollback parameters'
- stage: ExecuteRollback
displayName: 'Execute Rollback'
dependsOn: RollbackValidation
jobs:
- deployment: Rollback
displayName: 'Rollback Deployment'
environment: 'production-rollback'
strategy:
runOnce:
deploy:
steps:
- ${{ if eq(parameters.targetSlot, 'previous') }}:
- task: AzureAppServiceManage@0
inputs:
azureSubscription: $(azureSubscription)
action: 'Swap Slots'
webAppName: $(appServiceName)
sourceSlot: 'production'
targetSlot: 'staging'
displayName: 'Swap back to previous version'
- script: |
# Post-rollback validation
python scripts/post-deploy-check.py https://$(appServiceName).azurewebsites.net
displayName: 'Validate rollback'
- script: |
# Notify team of rollback
python scripts/notify-rollback.py --version "${{ parameters.targetSlot }}"
displayName: 'Notify rollback completion'
Manual Rollback
Emergency rollback procedures:
#!/bin/bash
# scripts/emergency-rollback.sh
set -e
RESOURCE_GROUP="docs-production-rg"
APP_SERVICE_NAME="company-docs-prod"
SUBSCRIPTION_ID="your-subscription-id"
echo "🚨 Emergency Rollback Procedure"
echo "================================"
# Login check
if ! az account show &>/dev/null; then
echo "Please login to Azure CLI first: az login"
exit 1
fi
# Set subscription
az account set --subscription "$SUBSCRIPTION_ID"
# Get current slot configuration
echo "📋 Current slot configuration:"
az webapp deployment slot list \
--resource-group "$RESOURCE_GROUP" \
--name "$APP_SERVICE_NAME" \
--output table
# Confirm rollback
read -p "🔄 Proceed with slot swap rollback? (y/N): " confirm
if [[ $confirm != [yY] ]]; then
echo "Rollback cancelled"
exit 0
fi
# Execute rollback
echo "⏪ Executing rollback..."
az webapp deployment slot swap \
--resource-group "$RESOURCE_GROUP" \
--name "$APP_SERVICE_NAME" \
--slot "staging" \
--target-slot "production"
# Validate rollback
echo "✅ Rollback completed. Validating..."
sleep 30 # Allow time for the swap
# Basic health check
SITE_URL="https://${APP_SERVICE_NAME}.azurewebsites.net"
if curl -f -s "$SITE_URL" > /dev/null; then
echo "✅ Site is responding at $SITE_URL"
else
echo "❌ Site health check failed"
exit 1
fi
echo "🎉 Emergency rollback completed successfully"
echo "📋 Next steps:"
echo " 1. Investigate the root cause"
echo " 2. Fix the issue in the development branch"
echo " 3. Test thoroughly before next deployment"
echo " 4. Update incident documentation"
This deployment and operations guide provides the foundation for reliable, scalable documentation deployment. Customize these practices based on your organization's specific requirements and constraints.