+ Reply to Thread
Results 1 to 9 of 9

Very special occurence count

  1. #1
    Forum Contributor
    Join Date
    04-04-2013
    Location
    Sydney, Australia
    MS-Off Ver
    Excel 2010
    Posts
    204

    Very special occurence count

    Hi All,
    I'm in the process of analyzing a pretty large data sheet of two columns (A &B) and over 8000 rows of data.

    ColA carries all the data strings in its cells. Data strings vary in length which could reach 32000 chars per cell.

    ColB has very particular words and phrases that pertain specifically to the contiguous cells. So B1 has words and phrases from A1, and B30 has also words and phrases from A30 and so forth.

    BolB words and phrases also vary in length from 6 to 400 chars.

    My task is to get a 100% accurate count of occurence of every cell in ColB from its neighbour in ColA.

    To do that I directly used this formula in ColC:
    Please Login or Register  to view this content.
    yielding the occurence count of ColB strings from ColA.

    The job would've been finished if the data was in English. It is rather in Arabic, with its full diacritical marks in both ColA and ColB.

    I tried normalizing the text by removing all the diacritical marks, and this actually worsened the problem as the text lost its meaning when the diactritical marks were removed. The formula yielded totally wrong count for words that looked the same as normalized, but they are in entirely different words and meanings with their diacritical marks on.

    To solve this problem, I need to run a smart script that uses dictionary to record all the ColA and ColB data "as they are" with their diacritical marks, and run its smart logic to detect the occurence of every cell in ColB in its contiguous cell of ColA, yielding the count in ColC respectively.

    Unfortunately I cannot send a data sample of that file and will be really difficult to try to make up an example, this is why I had to explain my problem in detail.

    Can I get some precious assistance with this problem?

    Always, many thanks in advance.

    T.

  2. #2
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Very special occurence count

    I have no experience with Arabic, but am surprised that Substitute doesn't work rigorously with Unicode strings.
    Entia non sunt multiplicanda sine necessitate

  3. #3
    Forum Contributor
    Join Date
    04-04-2013
    Location
    Sydney, Australia
    MS-Off Ver
    Excel 2010
    Posts
    204

    Re: Very special occurence count

    Substitute worked fine with normalized Arabic text, and the formula did count the occurrence of the text from ColB in ColA, but it wasn't right at all. I tried to use UPPER with original text with diacritical marks, but of course it didn't make any difference.
    Thanks shg.

  4. #4
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Very special occurence count

    "Wasn't right at all" -- what does that mean?

    Were the results not always integer, smaller than expected, larger than expected?

    Are columns A and B consistent as to composition? See https://en.wikipedia.org/wiki/Unicod...ite_characters

    Edit: This in particular:

    For example, é can be represented in Unicode as U+0065 (LATIN SMALL LETTER E) followed by U+0301 (COMBINING ACUTE ACCENT), but it can also be represented as the precomposed character U+00E9 (LATIN SMALL LETTER E WITH ACUTE). Thus, in many cases, users have multiple ways of encoding the same character.
    Those would look the same, but would differ as strings.
    Last edited by shg; 05-04-2019 at 05:03 PM.

  5. #5
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Very special occurence count

    Also, see https://en.wikipedia.org/wiki/Unicode_equivalence

    I think you have a tiger by the tail.

  6. #6
    Forum Contributor
    Join Date
    04-04-2013
    Location
    Sydney, Australia
    MS-Off Ver
    Excel 2010
    Posts
    204

    Re: Very special occurence count

    "Wasn't right at all" -- what does that mean?
    It meant that the occurrence in ColA for the text in ColB was not correct. Once the text is normalized it turns to be basic, and the difference in meaning between two same words with same letters but different diacritical marks becomes eliminated. So, for example, instead of counting 6 for the occurrence of a word with diacritical marks, it would be 10 without diacritical marks due to stripping out the differences between words of the same letters.

    Yes, ColA and ColB are consistent.

    I hope I made it much clearer.

  7. #7
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Very special occurence count

    How about paring one cell in col A to the same (apparent) string in col B. What happens?

  8. #8
    Forum Contributor
    Join Date
    04-04-2013
    Location
    Sydney, Australia
    MS-Off Ver
    Excel 2010
    Posts
    204

    Re: Very special occurence count

    I think we'll be going the wrong way if we looked into the Arabic text as text not as bits. In other words, we should be dealing with it as if it is English, but using scripting dictionary.

  9. #9
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Very special occurence count

    I don't know what that means.

    Substitute is a mindless comparison of characters. If you have two strings that should be the same, but Substitute tells you they're not, it's easy enough to compare their hex strings and see how they differ.
    Last edited by shg; 05-04-2019 at 06:00 PM.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. [SOLVED] count each occurence....
    By Lidd in forum Excel Formulas & Functions
    Replies: 2
    Last Post: 10-08-2012, 12:09 AM
  2. Replies: 12
    Last Post: 01-18-2012, 02:51 PM
  3. count no of occurence per item
    By okl in forum Excel Programming / VBA / Macros
    Replies: 4
    Last Post: 05-11-2009, 06:37 AM
  4. Counting the Occurence of Special characters in Single/Multiple cells?
    By all4excel in forum Excel Formulas & Functions
    Replies: 1
    Last Post: 05-23-2008, 03:33 PM
  5. how to count on every occurence
    By 3nigm@ in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 04-02-2008, 03:26 AM
  6. [SOLVED] count occurence and present results
    By Hobbes2006 in forum Excel Formulas & Functions
    Replies: 2
    Last Post: 08-07-2006, 04:45 PM
  7. [SOLVED] Using Sumproduct to count text occurence
    By Dave in forum Excel Formulas & Functions
    Replies: 3
    Last Post: 07-25-2006, 02:28 PM
  8. Count the occurence of more than one condition
    By wutang1105 in forum Excel Formulas & Functions
    Replies: 19
    Last Post: 09-06-2005, 09:05 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1