Sanitize copy/paste text from word
By: Daniel

In a recent project I have had to deal with text copied from a Microsoft Word document and pasted into a textarea. Word automatically changes a few certain characters to what it thinks it should be, such as the ellipsis and quotes. When dealing with inserting that text into a database I was getting errors. To solve my problems I created a sanitize function to replace these certain characters with acceptable characters.

<?php
// Used to sanitize Microsoft Word's Special Characters
// Good reference http://www.lookuptables.com

function SanitizeFromWord($Text = '') {

	$chars = array(
		130=>',',     // baseline single quote
		131=>'NLG',   // florin
		132=>'"', 	  // baseline double quote
		133=>'...',   // ellipsis
		134=>'**',	  // dagger (a second footnote)
		135=>'***',	  // double dagger (a third footnote)
		136=>'^', 	  // circumflex accent
		137=>'o/oo',  // permile
		138=>'Sh',	  // S Hacek
		139=>'<',	  // left single guillemet
		140=>'OE',	  // OE ligature
		145=>'\'',	  // left single quote
		146=>'\'',	  // right single quote
		147=>'"',	  // left double quote
		148=>'"',	  // right double quote
		149=>'-',	  // bullet
		150=>'-',	  // endash
		151=>'--',	  // emdash
		152=>'~',	  // tilde accent
		153=>'(TM)',  // trademark ligature
		154=>'sh',	  // s Hacek
		155=>'>',	  // right single guillemet
		156=>'oe',	  // oe ligature
		159=>'Y',	  // Y Dieresis
		169=>'(C)',	  // Copyright
		174=>'(R)'	  // Registered Trademark
	);
	
	foreach ($chars as $chr=>$replace) {
		$Text = str_replace(chr($chr), $replace, $Text);
	}
	return $Text;
}
?>

Enjoy!

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 3.00 out of 5)
Loading ... Loading ...

One Response to “Sanitize copy/paste text from word”

  1. Pablo Martinez Says:

    Hi Daniel

    Thank you to share !

    cheers

    pablobr

Leave a Reply

geovisitors