Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   Assembly (http://www.programmingforums.org/forum20.html)
-   -   Deleting chracters (http://www.programmingforums.org/showthread.php?t=15400)

Vb Programmer Mar 13th, 2008 1:27 PM

Deleting chracters
 
can someone please tell me how to delete characters from a string, say like if i have a string called 'hello world' i want all the 'o' to be deleted so i end up with 'hell wrld' and how do i print the same string in uppercase 'hello world' 'HELLO WORLD' in sim8086, please i need to know how to do it asap.

Benoit Mar 13th, 2008 2:10 PM

Re: Deleting chracters
 
Point to the character in the string you want to remove, replace it with the next character in the string, then move to the next character, replace that one with the one after it, and so on. If the string is null terminated, don't forget to add a null byte.

To convert an ASCII character to uppercase, simply subtract 32 (The difference between a and A on the ascii table).

Vb Programmer Mar 13th, 2008 2:15 PM

Re: Deleting chracters
 
how would i go about doing that? im fairley new to assembly

this is the code to ask the user to enter a string
:

mov AH, 63
mov BX, 0
mov CX, 25
lea DX, UserText
int 33


this code prints the string to screen
:

mov AH, 64
        mov BX, 1
        mov CX, Length
        lea DX, UserText
        int 33


Vb Programmer Mar 13th, 2008 2:46 PM

Re: Deleting chracters
 
Quote:

Originally Posted by Vb Programmer (Post 142456)
how would i go about doing that? im fairley new to assembly

this is the code to ask the user to enter a string
:

mov AH, 63
mov BX, 0
mov CX, 25
lea DX, UserText
int 33


this code prints the string to screen
:

mov AH, 64
        mov BX, 1
        mov CX, Length
        lea DX, UserText
        int 33


i need some help please if any one can help i would be greatful.

lectricpharaoh Mar 13th, 2008 2:59 PM

Re: Deleting chracters
 
Quote:

Originally Posted by Benoit
To convert an ASCII character to uppercase, simply subtract 32 (The difference between a and A on the ascii table).

You're making the assumption that the character is lowercase to begin with; you'll need to test for that. Also, though you qualified it by specifying ASCII, the OP didn't say that (though if he's writing x86 assembly, it's most likely the case).

Assuming that he's using ASCII, one way of doing it faster would be to populate a 256-byte table, point DS:(E)BX at this table, and use XLATB. This eliminates checks for range, so you can quickly map one character to another. If you want strict ASCII with no support for the so-called 'high ASCII', you can cut the table by half, and mask against 07Fh before the lookup; this'll save a whopping 128 bytes. Just loop through the string, doing a lookup for each byte until you hit the end. Presumably, your string is null-terminated, but if it's not, you'll have a length count that you can load into (E)CX before you start.

Vb Programmer Mar 13th, 2008 3:08 PM

Re: Deleting chracters
 
most of the charactes may already be capitals, depending on what the user types in so i just need the non caps to be capital, and how do i delete letters from a string, for example if i have a string called hello i want to delete the 'll'

lectricpharaoh Mar 14th, 2008 4:18 AM

Re: Deleting chracters
 
Quote:

Originally Posted by Vb Programmer
most of the charactes may already be capitals, depending on what the user types in so i just need the non caps to be capital

Benoit answered this one in the first post in the thread, and I elaborated on it. You have two choices.

First is to subtract from the character values. In ASCII, the uppercase letters are values 65 through 90 (decimal), and the lowercase letters are 97 through 122. The order is purely alphabetical, so 65 is 'A', 66 is 'B', and so on. Thus, if you subtract 32 from the value of a lowercase letter, it becomes an uppercase one.

My elaboration on Benoit's post was simply pointing out you need to first make sure it's a lowercase letter. If it's something else, you will get unpredictable results, so you can do something like this (NASM-style syntax, but you can adapt it):
:

  mov cx, length  ; load the string length
  jcxz L3        ; guard against zero-length string; if this is the case, exit
  mov si, string  ; load the string start address
  mov al, 32      ; mem, reg operations are more efficient than mem, imm8
L1:
  cmp [si], byte 97
  jb L2          ; go on to next iteration if [si] < 97, ie if it's < 'a'
  cmp [si], byte 122
  ja L2          ; go on to next iteration if [si] > 122, ie if it's > 'z'
  sub [si], al
L2:
  inc si          ; point at next character
  loop L1        ; loop to process remaining characters
L3:              ; all done now

If you're unsure of any of the asm mnemonics I used, you can look them up here.

The second option is to load a table (either 256 or 128 bytes in size) with the 'convert to' values. Then you do a lookup to convert a value into another value, or more specifically, use value A as an index into the table to fetch value B. In your specific case, you'd have each value equal its own index, except for the lowercase letter indices; these would be equal to their uppercase counterparts. Thus, index 10 would store the value 10, index 65 would hold 65, but index 97 would also hold 65. Get it?

Then you use the XLATB (table lookup translation) instruction, like so:
:

  mov cx, length  ; load the string length
  jcxz L2        ; guard against zero-length string; if this is the case, exit
  mov si, string  ; load the string start address
  mov bx, table  ; load the lookup table
  mov al, [si]    ; fetch the first byte of the string
  and al, 07Fh    ; limit it to ASCII range (0-127)
  xlatb          ; transform it
  mov [si], al    ; and write it back
  inc si
  loop L1
L2:              ; all done now

This one relies on you having the lookup table initialized (you should be able to figure out how to do this, as it's just a block of 128 or 256 bytes, and I explained it above), and is arguably faster than the first method, since there is less conditional branching. Also, most of the branching is just the main loop, so branch prediction in modern CPUs will handle this well, not like the first method (where the branching depends on the character's value).

There is a line in red there. This can be deleted if your table is 256 bytes in size, but if you want strict conformance with ASCII, and don't care about 'extended ASCII', you can cut the table to 128 bytes. If you do this, the line in red will prevent you from indexing past the end of your lookup buffer.
Quote:

Originally Posted by Vb Programmer
and how do i delete letters from a string, for example if i have a string called hello i want to delete the 'll'

This I will leave as an exercise for you, since I already wrote you two routines. You can use REP MOVSB to do this for you (bonus points if you optimize it to use MOVSW and MOVSD when appropriate).

Irwin Mar 14th, 2008 10:19 PM

Re: Deleting chracters
 
Such an useless argument...

Anyway, just AND the byte (assuming it's ANSI) by 0xDF, that should force it go to uppercase and that is faster than any of the aforementioned arguments. After doing so just do a REP SCASB (again, assuming it's ANSI) until you find the character you're looking for then just use REP MOVSB to copy over it.

lectricpharaoh Mar 15th, 2008 9:45 AM

Re: Deleting chracters
 
Quote:

Originally Posted by Irwin
Such an useless argument...

It's not an argument, and you might want to check your suggested code before trolling.
Quote:

Originally Posted by Irwin
Anyway, just AND the byte (assuming it's ANSI) by 0xDF, that should force it go to uppercase and that is faster than any of the aforementioned arguments. After doing so just do a REP SCASB (again, assuming it's ANSI) until you find the character you're looking for then just use REP MOVSB to copy over it.

This solution won't work. The value needs to be checked before you transform it, unless you're using a lookup table. What if the character happens to be, oh, a space? 0x20 & 0xDF = 0x00, which is not what the OP wants.

If the number of letters in the alphabet was a power of two, a simple AND could work (assuming the values were mapped appropriately). However, since the number of letters is 26 (NOT a power of two), it won't work. It requires either a subtraction or a table lookup, and the former first requires range checking to ensure you don't change anything except lowercase letters.


All times are GMT -5. The time now is 12:39 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC