I have some PDF books/stories that weren't formatted well, so that the paragraphs are stacked on top of each other instead of having a line between them. I want to print some of these, and make them as readable as possible.
If I copy from PDF and paste into Word, or convert from PDF to Word, is there a way to separate each paragraph by putting a line beneath it, separating it from the next? An example is below.
This is how it looks now:This is way I would like it:It was a dark and stormy night. In her attic bedroom Margaret Murry, wrapped in an old patchwork quilt, sat on the foot of her bed and watched the trees tossing in the frenzied lashing of the wind. Behind the trees clouds scudded frantically across the sky. Every few moments the moon ripped through them, creating wraith- like shadows that raced along the ground. The house shook. Wrapped in her quilt, Meg shook. She wasn't usually afraid of weather. —It's not just the weather, she thought. —It's the weather on top of everything else. On top of me. On top of Meg Murry doing everything wrong. School. School was all wrong. She'd been dropped down to the lowest section in her grade. That morning one of her teachers had said crossly, "Really, Meg, I don't understand how a child with parents as brilliant as yours are supposed to be can be such a poor student. If you don't manage to do a little better you'll have to stay back next year." During lunch she'd rough-housed a little to try to make herself feel better, and one of the girls said scornfully, "After all, Meg, we aren't grammar-school kids any more. Why do you always act like such a baby?" And on the way home from school, walking up the road with her arms full of books, one of the boys had said something about her "dumb baby brother." At this she'd thrown die books on the side of the road and tackled him with every ounce of strength she had, and arrived home with her blouse torn and a big bruise under one eye. Sandy and Dennys, her ten-year-old twin brothers, who got home from school an hour earlier than she did, were disgusted. "Let us do the fighting when it's necessary," they told her. —A delinquent, that's what I am, she thought grimly. — That's what they'll be saying next. Not Mother. But Them. Everybody Else. I wish Father— But it was still not possible to think about her father without the danger of tears. Only her mother could talk about him in a natural way, saying, "When your father gets back—" Gets back from where? And when? Surely her mother must know what people were saying, must be aware of the smugly vicious gossip. Surely it must hurt her as it did Meg. But if it did she gave no outward sign. Nothing ruffled the serenity other expression. —Why can't I hide it, too? Meg thought. Why do I always have to show everything? The window rattled madly in the wind, and she pulled the quilt dose about her. Curled up on one of her pillows a gray fluff of kitten yawned, showing its pink tongue, tucked its head under again, and went back to sleep.It was a dark and stormy night. In her attic bedroom Margaret Murry, wrapped in an old patchwork quilt, sat on the foot of her bed and watched the trees tossing in the frenzied lashing of the wind. Behind the trees clouds scudded frantically across the sky. Every few moments the moon ripped through them, creating wraith- like shadows that raced along the ground. The house shook. Wrapped in her quilt, Meg shook. She wasn't usually afraid of weather. —It's not just the weather, she thought. —It's the weather on top of everything else. On top of me. On top of Meg Murry doing everything wrong. School. School was all wrong. She'd been dropped down to the lowest section in her grade. That morning one of her teachers had said crossly, "Really, Meg, I don't understand how a child with parents as brilliant as yours are supposed to be can be such a poor student. If you don't manage to do a little better you'll have to stay back next year." During lunch she'd rough-housed a little to try to make herself feel better, and one of the girls said scornfully, "After all, Meg, we aren't grammar-school kids any more. Why do you always act like such a baby?" And on the way home from school, walking up the road with her arms full of books, one of the boys had said something about her "dumb baby brother." At this she'd thrown die books on the side of the road and tackled him with every ounce of strength she had, and arrived home with her blouse torn and a big bruise under one eye. Sandy and Dennys, her ten-year-old twin brothers, who got home from school an hour earlier than she did, were disgusted. "Let us do the fighting when it's necessary," they told her. —A delinquent, that's what I am, she thought grimly. — That's what they'll be saying next. Not Mother. But Them. Everybody Else. I wish Father— But it was still not possible to think about her father without the danger of tears. Only her mother could talk about him in a natural way, saying, "When your father gets back—" Gets back from where? And when? Surely her mother must know what people were saying, must be aware of the smugly vicious gossip. Surely it must hurt her as it did Meg. But if it did she gave no outward sign. Nothing ruffled the serenity other expression. —Why can't I hide it, too? Meg thought. Why do I always have to show everything? The window rattled madly in the wind, and she pulled the quilt dose about her. Curled up on one of her pillows a gray fluff of kitten yawned, showing its pink tongue, tucked its head under again, and went back to sleep.
sub snb() thisdocument.content=replace(thisdocument.content,vbcr,vbcr & vbcr) end Sub
Last edited by snb; 10-08-2011 at 09:15 AM.
SNB,
The code didn't work at first, then I discovered ThisDocument was mispelled. However, even after correcting for that, running the code produced no effect.
I've attached a sample I'm trying with.
Last edited by jomili; 10-07-2011 at 05:03 PM.
It works here in your file.
Maybe replace thisdocument with activedocument, but this won't give you the result you are looking for. When you convert or import PDFs into Word it assumes that every line in PDF file is a paragraph on its own. Therefore when you run this code, all it will do is create the impression of double-spacing. You can modify the code that snb provided to replace the paragraph marks with a space and then it will all be merged into one paragraph and then manually add in the correct paragraph marks. I don't know of any way to automate this from a PDF file to a Word document.
Hope this helps.
abousetta
Please consider:
Thanking those who helped you. Click the star icon in the lower left part of the contributor's post and add Reputation.
Cleaning up when you're done. Mark your thread [SOLVED] if you received your answer.
Hmm...
SNB, I found my problem with your code; I had it in a module in NORMAL. When I moved it to a module for my document, it worked. However, it worked the way Abousetta described, creating the impression of double-spacing, not paragraphing as intended.
Would there be a way to do it based on line-length? I'm no expert in Word, but isn't there a limit to the number of characters that can exist in a line (determined by font and margins). So, if a line is less than, say, 95% of the max characters, we can assume that's a paragraph end, and plug in a paragraph mark?
I'm sure I'm not the only person who's run into this problem. If Word can't do it, do you know of a non-Word based approach that might work?
Still desiring a solution.
What about checking the first character of each paragraph ?
If it's lowercase than it's part of a paragraph, not a new one.
That would probably straighten it up a lot, but it wouldn't catch the ones where the line begins with a capital letter.
Going with my earlier thought, could I copy the text into Excel, and somehow get each line into a cell of it's own, then do LEN to determine length, plug in a symbol (say, a tilde) in the cell to the right of the cells with the least characters, assuming those are paragraph endings, then paste back into Word and replace the tildes with a paragraph symbol?
I'm definitely in favor of an easier solution if it's out there.
It wouldn't just straitghten up a 'lot', but probably 99 %.
Reason enough to do this.
Your Excel 'solution' won't bring you closer.
Do not skip to Excel because of your unfamiliarity with Word. Word's VBA is much more capable to handle texts than Excel's.
Last edited by snb; 11-11-2011 at 04:17 AM.
I'm all in favor of using Word's VBA, but am completely lost here.
So, how would we test the first character in each paragraph? I'm not even familiar enough with Word's VBa to get started.
Use VBA's intellisense.
Start typing Activedocument. in the VBEditor
Search for the period and paragraph marker:
Sub Macro1() ' ' Macro1 Macro ' ' Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = ".^p" .Replacement.Text = ".^p^p" .Forward = True End With Selection.Find.Execute Replace:=wdReplaceAll End Sub
---
Ben Van Johnson
Here is my humble attempt at combining the codes from snb and protonLeah
The end result will look something like:Sub test() ThisDocument.Content = _ Replace(Replace(Replace(ThisDocument.Content, "." & vbCr, "~"), vbCr, " "), "~", "." & vbCr & vbCr) End Sub
Hope this helps.It was a dark and stormy night. In her attic bedroom Margaret Murry, wrapped in an old patchwork quilt, sat on the foot of her bed and watched the trees tossing in the frenzied lashing of the wind. Behind the trees clouds scudded frantically across the sky. Every few moments the moon ripped through them, creating wraith- like shadows that raced along the ground. The house shook. Wrapped in her quilt, Meg shook. She wasn't usually afraid of weather. —It's not just the weather, she thought. —It's the weather on top of everything else. On top of me. On top of Meg Murry doing everything wrong.
abousetta
Last edited by abousetta; 11-15-2011 at 03:43 AM. Reason: Added period to code
Please consider:
Thanking those who helped you. Click the star icon in the lower left part of the contributor's post and add Reputation.
Cleaning up when you're done. Mark your thread [SOLVED] if you received your answer.
Hi
I use this code in Word to quickly reformat imported text where lines import as paragraphs. It:
you might want to save your file before you try it!
- replaces paragraph marks with spaces where they precede a word that does not begin with a capital letter (note this removes breaks intended to precede lower case letters, eg dot points. You might want to sort those out separately)
- Removes paragraph breaks between commonly capitalised words and phrases
- removes para breaks after words unlikely to precede a para break (such as 'and')
Sub imported_paras_quick_reformat() Dim ToRep, RepWith, vv, n, ww, an For Each pp In ActiveDocument.Paragraphs pp.Range.Characters.Last.Select Selection.MoveEnd (1) On Error Resume Next If Selection.Characters(2).Case <> wdUpperCase Then Selection.Characters(1) = " " Next 'find commonly capitalised phrases and words unlikely to precede a paragraph break and removes following breaks 'common capitalised phrases vv = Array("Mid West", "South West", "Australian Government", “United States”, “United Kingdom”) 'words unlikely to precede a paragraph mark ww = Array("the", "and", "is", "are", "of", "a", "if", "at", "in", "then") For n = LBound(vv) To UBound(vv) Selection.HomeKey Unit:=wdStory ToRep = Replace(vv(n), " ", "^p") RepWith = vv(n) Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = ToRep .Replacement.Text = RepWith .MatchCase = True End With Selection.Find.Execute Replace:=wdReplaceAll Next n For an = LBound(ww) To UBound(ww) Selection.HomeKey Unit:=wdStory ToRep = " " & ww(an) & "^p" RepWith = " " & ww(an) & " " Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = ToRep .Replacement.Text = RepWith End With Selection.Find.Execute Replace:=wdReplaceAll Next an End sub
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks