SEARCH

The How-To Geek Forums Have Migrated to Discourse

How-To Geek Forums / Microsoft Office

Is it possible to cross check in detail the contents of word files?

(7 posts)
  • Started 1 year ago by RobinRocket
  • Latest reply from moreeg
  • Topic Viewed 1082 times

RobinRocket
Posts: 2

Hello everyone,

I'm a English foreign language teacher, and I'm wondering if there might be a 'geek' solution to my problem.

I'm trying to recycle vocabulary lists students have learnt two weeks previously into my lessons. This is a very laborious and boring process.
So I'm thinking there might be a way to speed it up, or any least check how effectively the recycling is being done.

Are there any programs or tools that could cross check a word file vocabulary list of words, with a word file lesson? I mean, tell the user if the vocabulary list word was present in the lesson, or even tell the user if there was a synonym of a vocabulary list word present in the lesson word file?

I'm aware of the tools; 'compare document', 'research' 'find' etc., but they don't exactly do what I'm trying to achieve. Or I might just be using them wrong.

Any ideas or suggestions? Any comments are appreciated, thanks.

Posted 1 year ago
Top
 
RRRoman
Posts: 53

hmm don't have a direct knowledge of any such programs. Searched around for the web and found dtSearch. Have a look at it and see if that is what you are looking for, because at their support page they say that dtSearch can search files of a variety of formats for lists of words.

It feels like there should be lots of other programs like that on the web but perhaps they are difficult to find. At school we even had an assignment to produce a word mapping program that I think could be twisted in something like you describe so in case dtSearch doesn't cut it, just keep looking.

A program that could find synonyms for the words might prove a bit more difficult to find because it take more effort to implement as there would be a need of dictionary model that would need to have a relations between words such as synonyms which is kind of a big project itself.

Posted 1 year ago
Top
 
Enthusiast
Enthusiast
Posts: 566

It seems a Word macro could do what you want. The steps would be:

1. Set up a reference document containing the list of words to search for in the target document.
2. Read the word list from the reference document into an array. The array should consist of 2 elements, the reference word and a "Found" indicator.
3. Search through the target document, setting the "Found" element to 1 if the search was successful.
4. Go to the next item in the array and repeat step 3 until there are no more words to search.
5. Generate a report of the array, for example:

.
     "Sarcasm was found"
     "Theoretical was NOT found"
     "Equipment was found"
.

Note: This could also work for synonyms, if they are added to the list of words to search.

If this sounds like what you are in need of, we may be able to put something together that does this. I do not do much VBA programming for word, but maybe moreeg and nosparks would be interested. Between us I think we can come up with a solution.

Posted 1 year ago
Top
 
RobinRocket
Posts: 2

Thank you both for your replies!

RRRoman: dtSearch is pretty much perfect, but sadly the licensing costs are a bit too expensive for me. I doubt my university would pay for it.
I found an open source alternative http://lucene.apache.org/ , but it seems pretty complicated to use. That being said I only played with it
for about 10 minutes. Unsuccessfully. Thanks for your help, dtSearch is pretty much exactly what I was looking for.

Enthusiast: This sounds like a really great idea, but I have zero experience with programming. I understand the basic concepts you talked about, I just have no idea how I would even begin to go about it. If you have any advice, or a recommendation for a good tutorial site for this kind of stuff I'd be very grateful.

Posted 1 year ago
Top
 
moreeg
moreeg
Posts: 842

Hi
Here are examples of macros that do approximately what you want.

Either of the 2 examples given would need to be adapted to reading a list of words that you would provide. Like Enthusiast, I am not familiar with Word VBA so haven't been able to come up with a working example.

I did play around with the 2nd example and I have hardcoded specific words to search for and that works okay. Here is the code that I adapted. Word of warning though - I first tried this on a 10,000 word document and at one point thought I saw smoke coming out of my laptop (slight exaggeration but illustrative of the fact that this macro will take time to run).

Sub WordFrequency()
    Const maxwords = 9000          'Maximum unique words allowed
    Dim SingleWord As String       'Raw word pulled from doc
    Dim Words(maxwords) As String  'Array to hold unique words
    Dim Freq(maxwords) As Integer  'Frequency counter for unique words
    Dim WordNum As Integer         'Number of unique words
    Dim ByFreq As Boolean          'Flag for sorting order
    Dim ttlwds As Long             'Total words in the document
    Dim Excludes As String         'Words to be excluded
    Dim Found As Boolean           'Temporary flag
    Dim j, k, l, Temp As Integer   'Temporary variables
    Dim ans As String              'How user wants to sort results
    Dim tword As String            '

    ' Set up excluded words
    Excludes = "[the][a][of][is][to][for][by][be][and][are]"

    ' Find out how to sort
    ByFreq = True
    ans = InputBox("Sort by WORD or by FREQ?", "Sort order", "WORD")
    If ans = "" Then End
    If UCase(ans) = "WORD" Then
        ByFreq = False
    End If

    Selection.HomeKey Unit:=wdStory
    System.Cursor = wdCursorWait
    WordNum = 0
    ttlwds = ActiveDocument.Words.Count

    ' Control the repeat
    For Each aword In ActiveDocument.Words
        SingleWord = Trim(LCase(aword))
        'Out of range?
        If SingleWord < "a" Or SingleWord > "z" Then
            SingleWord = ""
        End If
        'On exclude list?
        If InStr(Excludes, "[" & SingleWord & "]") Then
            SingleWord = ""
        End If
        If SingleWord = "categorised" Or SingleWord = "originator" Or SingleWord = "functionality" Then
        If Len(SingleWord) > 5 Then
            Found = False
            For j = 1 To WordNum
                If Words(j) = SingleWord Then
                    Freq(j) = Freq(j) + 1
                    Found = True
                    Exit For
                End If
            Next j
            If Not Found Then
                WordNum = WordNum + 1
                Words(WordNum) = SingleWord
                Freq(WordNum) = 1
            End If
            If WordNum > maxwords - 1 Then
                j = MsgBox("Too many words.", vbOKOnly)
                Exit For
            End If
        End If
        ttlwds = ttlwds - 1
        StatusBar = "Remaining: " & ttlwds & ", Unique: " & WordNum
    Next aword

    ' Now sort it into word order
    For j = 1 To WordNum - 1
        k = j
        For l = j + 1 To WordNum
            If (Not ByFreq And Words(l) < Words(k)) _
              Or (ByFreq And Freq(l) > Freq(k)) Then k = l
        Next l
        If k <> j Then
            tword = Words(j)
            Words(j) = Words(k)
            Words(k) = tword
            Temp = Freq(j)
            Freq(j) = Freq(k)
            Freq(k) = Temp
        End If
        StatusBar = "Sorting: " & WordNum - j
    Next j

    ' Now write out the results
    tmpName = ActiveDocument.AttachedTemplate.FullName
    Documents.Add Template:=tmpName, NewTemplate:=False
    Selection.ParagraphFormat.TabStops.ClearAll
    With Selection
        For j = 1 To WordNum
            .TypeText Text:=Trim(Str(Freq(j))) _
              & vbTab & Words(j) & vbCrLf
        Next j
    End With
    System.Cursor = wdCursorNormal
    j = MsgBox("There were " & Trim(Str(WordNum)) & _
      " different words ", vbOKOnly, "Finished")
End Sub

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

The line

If SingleWord = "categorised" Or SingleWord = "originator" Or SingleWord = "functionality" Then"

is the one I adapted

Posted 1 year ago
Top
 
moreeg
moreeg
Posts: 842

Okay, here's a rough approximation of what I think you want.

1. Create your word list in an excel spreadsheet in column A starting in A1
2. Every time you add a word you need to Save the workbook
3. The program will only recognise unique words so if you have "evaluate" in your list it will not find "evaluated" or "evaluating". you will have to enter these separately.
4. The Word macro should be place in the Normal/NewMacros module
5. In the Macro screen click on Tools/Reference and make sure that "Microsoft Excel 12.0 Object Library" is ticked (the number may be different)
6. Paste the following code into the NewMacros module
7. Go to Word Options/Cutomise and click on the arrow next to the top input area and select "Macros"
8. You should see Normal.NewMacros.WordFrequency .... select it and click on "Add" and then OK - this will allow you to run the macro from the Quick Access toolbar in Word

Open the student document you want to analyse and click on the new icon in the Quick Access Toolbar and it will do its thing - may take up to a minute or more for larger documents > 1000 words. In the end it will create a new Word file with the results.

Chances are that I may have missed a few steps in the above but if you run into any troubles just come back and ask.

Enthusiast/Nosparks et al .... the code is messy and contains stuff not needed. Also it will only present words that have been used and the number of times each has been used. It will not present the words that weren't used. This may actually be better if the list of words gets really long.

Here is the Macro .....

Sub WordFrequency()
    Const maxwords = 9000          'Maximum unique words allowed
    Dim SingleWord As String       'Raw word pulled from doc
    Dim Words(maxwords) As String  'Array to hold unique words
    Dim Freq(maxwords) As Integer  'Frequency counter for unique words
    Dim WordNum As Integer         'Number of unique words
    Dim ByFreq As Boolean          'Flag for sorting order
    Dim ttlwds As Long             'Total words in the document
    Dim Excludes As String         'Words to be excluded
    Dim Found As Boolean           'Temporary flag
    Dim j, k, l, Temp As Integer   'Temporary variables
    Dim ans As String              'How user wants to sort results
    Dim tword As String            '

Dim workBook As workBook
Dim DataInExcel
Dim NoRows
Dim NRows As Integer

Set workBook = Workbooks.Open("C:\users\MG\!Mydoqs\HTG\NewWords.xlsx", True, True) 'Enter the path and file name of your Excel file

NoRows = workBook.Worksheets("Sheet1").Range("A1").End(xlDown).Row
DataInExcel = workBook.Worksheets("Sheet1").Range("A1:A" & NoRows)

    ' Set up excluded words
    Excludes = "[the][a][of][is][to][for][by][be][and][are]"

    ' Find out how to sort
    ByFreq = True
    ans = InputBox("Sort by WORD or by FREQ?", "Sort order", "WORD")
    If ans = "" Then End
    If UCase(ans) = "WORD" Then
        ByFreq = False
    End If

    Selection.HomeKey Unit:=wdStory
    System.Cursor = wdCursorWait
    WordNum = 0
    ttlwds = ActiveDocument.Words.Count

For i = 1 To UBound(DataInExcel)
    ' Control the repeat
    For Each aword In ActiveDocument.Words
        SingleWord = Trim(LCase(aword))
        'Out of range?
        If SingleWord < "a" Or SingleWord > "z" Then
            SingleWord = ""
        End If
        'On exclude list?
        If InStr(Excludes, "[" & SingleWord & "]") Then
            SingleWord = ""
        End If

        If SingleWord = DataInExcel(i, 1) Then
'        If Len(SingleWord) > 5 Then
            Found = False
            For j = 1 To WordNum
                If Words(j) = SingleWord Then
                    Freq(j) = Freq(j) + 1
                    Found = True
                    Exit For
                End If
            Next j
            If Not Found Then
                WordNum = WordNum + 1
                Words(WordNum) = SingleWord
                Freq(WordNum) = 1
            End If
            If WordNum > maxwords - 1 Then
                j = MsgBox("Too many words.", vbOKOnly)
                Exit For
            End If
        End If
        ttlwds = ttlwds - 1
        StatusBar = "Remaining: " & ttlwds & ", Unique: " & WordNum
    Next aword
Next

    ' Now write out the results
    tmpName = ActiveDocument.AttachedTemplate.FullName
    Documents.Add Template:=tmpName, NewTemplate:=False
    Selection.ParagraphFormat.TabStops.ClearAll
    With Selection
        For j = 1 To WordNum
            .TypeText Text:=Trim(Str(Freq(j))) _
              & vbTab & Words(j) & vbCrLf
        Next j
    End With
    System.Cursor = wdCursorNormal
    j = MsgBox("There were " & Trim(Str(WordNum)) & _
      " different words used out of a total of  " & NoRows, vbOKOnly, "Finished")
End Sub

========================================================================

Thanks to this site for direction on how to get stuff from Excel from a Word Macro and of course the site mentioned in the previous post that had all the Word codes.

Posted 1 year ago
Top
 
moreeg
moreeg
Posts: 842

Hi again

Here is another approach to the problem. In this approach the selected words will be highlighted in the document however it won't tell you explicitly which words were used.

The advantages of this approach are
      It is much, much, much faster.
      you can easily determine if the selected words were used correctly
      it will list all the words you are searching for and show what highlight colour was used

'=============================================================

Sub HighlightWords()
'
' HighlightWords Macro
'
'

Dim workBook As workBook
Dim DataInExcel
Dim NoRows
Dim NRows As Integer

  Application.ScreenUpdating = False      'set to True for debugging/False to improve performance

'Gets rid of any highlighting in the document
    Selection.WholeStory
    Options.DefaultHighlightColorIndex = wdNoHighlight
    Selection.Range.HighlightColorIndex = wdNoHighlight
    Options.DefaultHighlightColorIndex = wdYellow
    Selection.Range.HighlightColorIndex = wdYellow
    Options.DefaultHighlightColorIndex = wdYellow
    Selection.Range.HighlightColorIndex = wdNoHighlight

 '   Colour2 = wdBlue             '2
 '   Colour3 = wdTurquoise        '3
 '   Colour4 = wdBrightGreen      '4
 '   Colour5 = wdPink             '5
 '   Colour6 = wdRed              '6
 '   Colour7 = wdYellow           '7

Set workBook = Workbooks.Open("C:\users\MG\!Mydoqs\HTG\NewWords.xlsx", True, True) 'Enter the path and file name of your Excel file

    NoRows = workBook.Worksheets("Sheet1").Range("A1").End(xlDown).Row
    DataInExcel = workBook.Worksheets("Sheet1").Range("A1:A" & NoRows)

For Index = 1 To UBound(DataInExcel)
    Selection.GoTo What:=wdGoToPage, Which:=wdGoToNext, Name:="1"
    Selection.TypeParagraph
    Selection.TypeText Text:=DataInExcel(Index, 1)
Next

For i = 1 To UBound(DataInExcel)

 Selection.GoTo What:=wdGoToPage, Which:=wdGoToNext, Name:="1"

 Curword = DataInExcel(i, 1)

If i <= 6 Then
    c = i + 1
ElseIf i <= 12 Then
    c = i + 1 - 6
ElseIf i <= 18 Then
    c = i + 1 - 12
ElseIf i <= 24 Then
    c = i + 1 - 18
ElseIf i <= 30 Then
    c = i + 1 - 24
ElseIf i <= 36 Then
    c = i + 1 - 30
Else
    c = 7
End If

Tint = Colour & c

 Selection.Find.ClearFormatting
    With Selection.Find
        .Text = Curword
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = True
    End With

For n = 1 To 10                       'will find the first 10 occurrences of the selecte word
    Selection.Find.Execute
    Selection.Range.HighlightColorIndex = Tint
Next n

Next

  Selection.GoTo What:=wdGoToPage, Which:=wdGoToNext, Name:="1"   'goes to top of the document
  Application.ScreenUpdating = True      'set to True for debugging/False to improve performance

End Sub

'=============================================================

Here is what it looks like in a sample document

And this is the Excel Worksheet that is the source of the words ....

Posted 1 year ago
Top
 



Topic Closed

This topic has been closed to new replies.

Enter Your Email Here to Get Access for Free:

Go check your email!