Hoping someone can help with this project - see attached spreadsheet.
To explain the spreadsheet, its all tennis matches on the ATP Tour 2017.
It shows the two players and under column header 'bpb' it shows the point by point data of the match.
Example
SSRSRRSRRR;RSSSRS;RRRR;SRRSSS;SSRRSRSS;SRRRR;SRSRSS;SSRSS;RRRSSSSRRSRSSRRSRR.RSSRRSSRSRSS;
The 'S' represents a point won by the server, 'R' represents a point won by the receiver, ';' represents the end of a game and '.' represents the end of the set. The server in the first game will be Player 1 and the server in the second game will be Player 2. This will change throughout the match after each game.
So basically what I'm looking to do is create a database including all the matches which would have the ability to query results given a certain scorelines. For example if a certain player is 15-30 down on serve (which would be represented by a combination of 1 'S' and 2 'R's in the first three points of that game), how often did they go on to win that game. Another example would be if a player is 4-2 down in games how often do they come back to win that set.
Could anyone provide any suggestions as to where to start with this? How many levels would it need to be broken down into to be able to carry out queries like the above - points,sets,games?
Obviously I don't expect a step by step guide as I'm guessing this is a long winded process but anyone's input would be greatly appreciated.
Bookmarks