![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Newbie
Join Date: Jun 2008
Posts: 12
Rep Power: 0
![]() |
I am writing a perl program which should do the following...
for ex. if I have a html file like.. <b>this is bold.</b>This is bold too</b> I have to write the program (without using any html parser function) that would print it like..... <b>this is bold.This is bold too</b> basically it would remove unnecarry tags. I just have to use regular expressions for it. My instructor advised me not to read the html file line by line as it would not take care of if a tags have beginning tags in on line 1 and the end tag is on the line after (as seen in the file above). I was suggested to put all the html file into one scalar variable. Now I have made the program so it puts all the html file in one scalar variable. Now my question is how would I search for several instances of <b> and </b> tags in the scalar variable. Should I read it character by character? I am very consfused on this part. Please advise me. Thanks! |
|
|
|
|
|
#2 |
|
Newbie
Join Date: Jun 2008
Posts: 12
Rep Power: 0
![]() |
Hi,
so far i have am able to remove the bold tags as..... <b>abcd</b>efgh<b>ijkl</b> to <b>abcdefghijkl</b> by using... $allHtmlDocument =~ s/$endBoldTag(\s*)$startBoldTag//gi; now the problem is... if I have <b>abcd</b><i><b>efgh</i></b> and I want to make it like <b>abcd<i>efgh</i></b> then I still need to remove the bold tags (as there are only tags between them) but I also need to keep the tags between them.how would i capture those tags. I am unable to figure out any way since I am not reading the whole document line by line. Thanks! |
|
|
|
|
|
#3 |
|
Programming Guru
![]() |
Re: string search
For clarification, did you make a mistake in your first post?
<b>this is bold.</b>This is bold too</b> Was that supposed to be: <b>this is bold.</b><b>This is bold too</b> Or are you saying you want to remove these two "types" of unecessary tags?
Looking at your second post... you say you want the second type. So could you clarify? |
|
|
|
|
|
#4 |
|
Newbie
Join Date: Jun 2008
Posts: 12
Rep Power: 0
![]() |
thanks for yr quick reply.
sorry about the mistake on the first post and for the lack of clariffication. You are right....i need to get rid of the tags that 'close and open' needlessly please advise me that how would I deal with them if I have other tags in between them (but no text). will special variables $1... play any role. I tried using special variables but what if I have other tags (more than one time) between the bold tags. thanks! |
|
|
|
|
|
#5 |
|
Programming Guru
![]() |
Re: string search
I'd find each pair of "</b>[random junk]<b>", and then call some function that checks the "sanity" of the [random junk]. If the random junk is insane (meaning that there only exists other tags within), then delete the "</b>" and "<b>". If it is sane, then proceed to the next pair.
The way I would check the sanity is by seeing if there exists any non-space characters that lie outside a pair of <> html tags. You might be able to do that with regex. I'm not experienced enough in regex to say. |
|
|
|
|
|
#6 | |
|
Newbie
Join Date: Jun 2008
Posts: 12
Rep Power: 0
![]() |
Re: string search
Quote:
|
|
|
|
|
|
|
#7 |
|
Programming Guru
![]() |
Re: string search
It might also be important to note... that even if the tags are "redundant" in a sense, removing them might make the HTML non-compliant with certain standards...
For example, <div>
<b>This is the first body of text.</b>
</div>
<div>
<b>This is the second body of text.</b>
</div>Will be processed to: <div>
<b>This is the first body of text.
</div>
<div>
This is the second body of text.</b>
</div>And even though that may work on all browsers (can anyone confirm this?), it is still non-compliant with HTML standards (something you should not do in a job). Therefore, it's best to make sure that the [random junk] only consists of some set of predictable tags (probably <i>, <u>, <em>, etc...). If you can't predict what tags might be in the [random junk], then there's a bunch more work ahead of you. |
|
|
|
|
|
#8 |
|
Newbie
Join Date: Jun 2008
Posts: 12
Rep Power: 0
![]() |
Re: string search
thanks.
buts its ok for the project I needed to do. Thanks a lot! |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| An Attempt at a DBMS | grimpirate | PHP | 8 | Apr 17th, 2007 1:01 PM |
| Throwing an exception when using string constructor | csrocker101 | C# | 3 | Apr 8th, 2007 2:04 PM |
| Help with breaking apart a string | csrocker101 | C# | 6 | Apr 6th, 2007 7:50 AM |
| madlib search through string vector | uniacid | C++ | 2 | Mar 29th, 2007 4:59 AM |
| Function Parameters | grimpirate | PHP | 10 | Mar 14th, 2007 6:55 PM |