|
Apparently Java's Regex flavor counts Umlauts and other special characters as non-"word characters" when I use Regex.
"TESTÜTEST".replaceAll( "\\W", "" )
returns "TESTTEST" for me. What I want is for only all truly non-"word characters" to be removed....
Started by Epaga on
, 3 posts
by 3 people.
Answer Snippets (Read the full thread at stackoverflow):
Use [^\p{L}\p{N}] - this matches all (Unicode) characters.
); } }
result ends up with the desired result...
|
|
I'm looking for pseudocode, or sample code, to convert higher bit ascii characters (like, Ü which is extended ascii 154) into U (which is ascii 85).
My initial guess is that since there are only about 25 ascii characters that are similar to 7bit ascii...
Started by Michael Pryor on
, 13 posts
by 13 people.
Answer Snippets (Read the full thread at stackoverflow):
They can take different interpretations (code language and has a special option... .
The upper 128 characters do not have standard meanings.
To replace accented characters with standard ASCII, but it depends on the language, and it often it is.
|
|
I have Finnish characters in my text (for example ä, ö and å) that are unsafe in XML, is there any library/framwork for this purpose?
Started by newbie on
, 3 posts
by 3 people.
Answer Snippets (Read the full thread at stackoverflow):
If you use StringEscapeUtils.escapeXML....
XML supports Unicode, so the only thing you really need to escape are the five basic XML entities (gt, lt, quot, amp, apos) .
StringEscapeUtils from Commons Lang has the escapeXML method which will suit your needs .
|
Ask your Facebook Friends
|
So I have an ASP.Net (vb.net) application. It has a textbox and the user is pasting text from Microsoft Word into it. So things like the long dash (charcode 150) are coming through as input. Other examples would be the smart quotes or accented characters...
Started by Will Rickards on
, 5 posts
by 4 people.
Answer Snippets (Read the full thread at stackoverflow):
Which is not what the Text version to a non-unicode character set, you will lose....
But it converts all the accented characters to questions marks.
How big is the range of these input characters? 256? (each char fits into a single byte.
|
|
Hi, I have some source code files which came to me by an HTML output, so they're pretty unusable.
I have things like this:
%include "macros.mac"
Which should be:
%include "macros.mac"
Is there any script (sh, perl, batch, ...) to convert every...
Started by Silvio González on
, 3 posts
by 3 people.
Answer Snippets (Read the full thread at stackoverflow):
Gt; characters, sed(1) could help:
sed 's/"/"/g; s/</</g; s/>/>/g; s/&/\&/g.
|
|
Say I have a collection of strings "123AB", "456CDEF", "789G", "012-HI". How do I find all the strings that are number(1 or more) followed by alpha(1 or more) with no special characters, where the alpha characters are not AB ?
To clarify, the regex applied...
Started by thorncp on
, 4 posts
by 4 people.
Answer Snippets (Read the full thread at stackoverflow):
^\d+(?:a(?:b[....
But I'll leave the answer visible as an example of what won't work .
You want a lookahead assertion: ^[0-9]+(?!AB$)[A-Z]+$ Without look-ahead:
^[0-9]+([B-Z]|A[AC-Z])[A-Z]*$
Edit:
Withdrawn, because this won't match something like 123A .
|
|
Using C#, when a user types a text in a normal textbox, how can you see the Hebrew equivalent of that text?
I want to use this feature on a data entry form, when the secretary puts in the customer name using English characters to have it converted automatically...
Started by Ovi on
, 4 posts
by 4 people.
Answer Snippets (Read the full thread at stackoverflow):
You'd have to know how the English is pronounced to... .
I didn't even know there was such a thing as Hebraization — thanks for asking this question :-)
This functionality is not provided by the .NET Framework, so I'm afraid you'd have to build it yourself .
|
|
I have a script that counts the characters in each of my comments, excluding any Html Tags.
But it doesn't take into account that my comments contain åäöÅÄÖ (swedish letters). So how do I edit this to "exclude" these from the regexp variable? (If the ...
Started by elundmark on
, 3 posts
by 3 people.
Answer Snippets (Read the full thread at stackoverflow):
This should work:
$("ol li p").each(function() { var count = $(this).text().length; if ( count >= 620 ) { $(this).parent().addClass("too-big-overflow"); } });
This works, but includes any and all white... .
There's no need to use a regular expression here.
|
|
I would like to implement a text box with a 140 character limit in Javascript such that when the 141st character is added that character is erased.
I don't want to show any message or alert to user they have exceeded the limit. There is already a counter...
Started by Yasir on
, 9 posts
by 9 people.
Answer Snippets (Read the full thread at stackoverflow):
Assuming you're using a textarea , which doesn't have a built-in maxlength , you can just assign a keyup event:
$('#tweet').keyup(function(){ var s = $(this).text(); if(s.length > 140) $(this).text(s.substring(0,140)) });
Off the top of my head:
$... .
|
|
Hello, I am asking for your help with sed. I need to remove duplicate underscores and underscores from beginning and end of string.
For example:
echo '[Lorem] ~ ipsum *dolor* sit metus !!!' | sed 's/[^ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz...
Started by Andrew on
, 3 posts
by 3 people.
Answer Snippets (Read the full thread at stackoverflow):
Also, as ladenedge suggested....
Then you can delete the beginning and ending ones.
All you need to do is add a "+" after your bracket expression to eliminate runs of multiple underscores .
Just add ;s/__*/_/g;s/^_//;s/_$// just after g in your sed command .
|