Content Migration

Migrating existing documentation to a Documentation as Code approach requires careful planning and systematic execution. This guide provides strategies and tools for successfully transitioning from traditional documentation formats to DocFX-based Markdown documentation.

Migration Assessment

Document Inventory

Before beginning migration, create a comprehensive inventory of existing documentation:

# migration-inventory.yml
documentation_sources:
  - type: "Word Documents"
    location: "SharePoint/Teams"
    count: 45
    estimated_pages: 230
    priority: "high"
  
  - type: "Wiki Pages"
    location: "Azure DevOps Wiki"
    count: 120
    estimated_pages: 180
    priority: "medium"
  
  - type: "PDF Files"
    location: "File Shares"
    count: 15
    estimated_pages: 95
    priority: "low"

Content Assessment Matrix

Content Type	Volume	Complexity	Priority	Migration Method
API Documentation	High	Complex	Critical	Automated
User Guides	Medium	Moderate	High	Semi-automated
Release Notes	High	Simple	High	Automated
Procedures	Medium	Moderate	Medium	Manual
Legacy Archives	Low	Variable	Low	Selective

Migration Strategies

1. Big Bang Migration

Complete migration of all documentation at once.

Pros:

Single cutover event
Immediate consistency
No dual maintenance

Cons:

High risk
Significant downtime
Resource intensive

Best for: Small documentation sets, greenfield projects

2. Phased Migration

Gradual migration of documentation sections over time.

Pros:

Lower risk
Manageable workload
Learning opportunities

Cons:

Dual maintenance period
Potential inconsistencies
Longer timeline

Best for: Large documentation sets, active development

3. On-Demand Migration

Migrate documentation when it requires updates.

Pros:

Natural prioritization
Minimal upfront effort
Continuous improvement

Cons:

Indefinite timeline
Potential orphaned content
Inconsistent experience

Best for: Legacy documentation, resource-constrained teams

Automated Migration Tools

Pandoc Conversion

Pandoc provides excellent conversion capabilities for various formats:

# Convert Word documents to Markdown
pandoc -f docx -t markdown_strict --wrap=preserve \
  --extract-media=./images \
  input.docx -o output.md

# Convert HTML to Markdown
pandoc -f html -t markdown_strict \
  --wrap=preserve \
  input.html -o output.md

# Batch conversion script
for file in *.docx; do
  pandoc -f docx -t markdown_strict \
    --wrap=preserve \
    --extract-media=./images \
    "$file" -o "${file%.docx}.md"
done

Custom Migration Scripts

PowerShell script for processing multiple files:

# migration-script.ps1
param(
    [string]$SourcePath,
    [string]$OutputPath,
    [string]$Format = "docx"
)

# Create output directory
if (!(Test-Path $OutputPath)) {
    New-Item -ItemType Directory -Path $OutputPath -Force
}

# Process files
Get-ChildItem -Path $SourcePath -Filter "*.$Format" | ForEach-Object {
    $inputFile = $_.FullName
    $outputFile = Join-Path $OutputPath ($_.BaseName + ".md")
    
    Write-Host "Converting: $($_.Name)"
    
    # Run pandoc conversion
    & pandoc -f $Format -t markdown_strict `
        --wrap=preserve `
        --extract-media="$OutputPath/images" `
        $inputFile -o $outputFile
    
    # Post-process the file
    $content = Get-Content $outputFile -Raw
    $content = $content -replace '\\', '/'  # Fix image paths
    $content = $content -replace '\r\n', "`n"  # Normalize line endings
    Set-Content $outputFile $content -NoNewline
}

Content Transformation

Front Matter Addition

Add DocFX front matter to migrated files:

---
title: "Migrated Document Title"
description: "Brief description of the document content"
tags: ["tag1", "tag2", "category"]
category: "documentation"
difficulty: "beginner"
last_updated: "2025-07-06"
author: "Original Author"
migrated_from: "SharePoint/OriginalDocument.docx"
migration_date: "2025-07-06"
---

# Document Content

Link Conversion

Convert internal links to Markdown format:

# link-converter.py
import re
import os

def convert_links(content, base_path):
    # Convert Word cross-references
    content = re.sub(
        r'See section "([^"]+)"',
        r'See [section \1](#\1)',
        content
    )
    
    # Convert file references
    content = re.sub(
        r'Reference: ([^\.]+\.docx)',
        r'Reference: [\1](\1.md)',
        content
    )
    
    # Convert image references
    content = re.sub(
        r'!\[([^\]]*)\]\(([^)]+)\)',
        lambda m: f'![{m.group(1)}](images/{os.path.basename(m.group(2))})',
        content
    )
    
    return content

Table Conversion

Ensure proper Markdown table formatting:

// table-formatter.js
function formatMarkdownTable(tableText) {
    const rows = tableText.split('\n').filter(row => row.trim());
    const formattedRows = rows.map(row => {
        const cells = row.split('|').map(cell => cell.trim());
        return '| ' + cells.join(' | ') + ' |';
    });
    
    // Add header separator
    if (formattedRows.length > 1) {
        const headerSeparator = '| ' + 
            formattedRows[0].split('|').slice(1, -1)
                .map(() => '---').join(' | ') + ' |';
        formattedRows.splice(1, 0, headerSeparator);
    }
    
    return formattedRows.join('\n');
}

Quality Assurance

Validation Checklist

Create a comprehensive validation process:

# validation-checklist.yml
content_validation:
  structure:
    - front_matter_present
    - headings_properly_formatted
    - toc_updated
    - file_naming_conventions
  
  content:
    - links_functional
    - images_accessible
    - tables_formatted
    - code_blocks_syntax_highlighted
  
  metadata:
    - tags_appropriate
    - category_assigned
    - author_attributed
    - dates_updated

Automated Testing

PowerShell script for validation:

# validate-migration.ps1
function Test-MarkdownFile {
    param([string]$FilePath)
    
    $content = Get-Content $FilePath -Raw
    $issues = @()
    
    # Check front matter
    if ($content -notmatch '^---\s*\n.*?\n---\s*\n') {
        $issues += "Missing front matter"
    }
    
    # Check for broken links
    $links = [regex]::Matches($content, '\[([^\]]+)\]\(([^)]+)\)')
    foreach ($link in $links) {
        $target = $link.Groups[2].Value
        if ($target -match '^[^#].*\.md$') {
            $targetPath = Join-Path (Split-Path $FilePath) $target
            if (!(Test-Path $targetPath)) {
                $issues += "Broken link: $target"
            }
        }
    }
    
    # Check for images
    $images = [regex]::Matches($content, '!\[([^\]]*)\]\(([^)]+)\)')
    foreach ($image in $images) {
        $imagePath = $image.Groups[2].Value
        $fullImagePath = Join-Path (Split-Path $FilePath) $imagePath
        if (!(Test-Path $fullImagePath)) {
            $issues += "Missing image: $imagePath"
        }
    }
    
    return $issues
}

# Validate all migrated files
Get-ChildItem -Path "docs" -Filter "*.md" -Recurse | ForEach-Object {
    $issues = Test-MarkdownFile $_.FullName
    if ($issues) {
        Write-Host "Issues in $($_.Name):" -ForegroundColor Yellow
        $issues | ForEach-Object { Write-Host "  - $_" }
    }
}

Migration Timeline Template

Phase 1: Planning (Week 1-2)

[ ] Complete content inventory
[ ] Define migration strategy
[ ] Set up tooling and scripts
[ ] Create validation procedures
[ ] Establish team responsibilities

Phase 2: Preparation (Week 3-4)

[ ] Convert high-priority content
[ ] Validate automation scripts
[ ] Create style guide updates
[ ] Train team on new processes
[ ] Set up staging environment

Phase 3: Migration (Week 5-8)

[ ] Execute migration plan
[ ] Validate converted content
[ ] Update links and references
[ ] Implement quality checks
[ ] Gather team feedback

Phase 4: Cleanup (Week 9-10)

[ ] Address validation issues
[ ] Optimize content structure
[ ] Update navigation and TOCs
[ ] Finalize style consistency
[ ] Document lessons learned

Common Challenges and Solutions

Challenge: Complex Formatting

Problem: Rich formatting doesn't translate well to Markdown

Solution:

Simplify formatting to essential elements
Use custom CSS for special formatting needs
Document formatting standards for future content

Challenge: Large Media Files

Problem: Images and videos increase repository size

Solution:

Optimize images before migration
Use external storage for large media
Implement Git LFS for binary files

Challenge: Cross-References

Problem: Internal document references break during migration

Solution:

Create a reference mapping document
Update links systematically
Use automated link checking

Challenge: Version History

Problem: Losing document history during migration

Solution:

Archive original documents
Document migration metadata
Preserve key revision information

Post-Migration Tasks

Content Optimization

Structure Review: Reorganize content for better navigation
SEO Enhancement: Add meta descriptions and optimize titles
Cross-Linking: Create connections between related topics
Search Optimization: Implement proper tagging and categories

Process Integration

Workflow Documentation: Update content creation procedures
Training Materials: Create guides for the new system
Quality Standards: Establish review and approval processes
Maintenance Schedule: Plan regular content reviews

Success Metrics

Track migration success with these metrics:

Content Completeness: Percentage of content successfully migrated
Quality Score: Validation test pass rate
User Adoption: Team usage of new documentation system
Time to Publish: Speed of content updates post-migration

This migration guide ensures a systematic approach to transitioning existing documentation to Documentation as Code, minimizing disruption while maximizing the benefits of the new system.

Table of Contents