POB's daughter Catherine here, using his log-in:
I’ve had a read-through of what I will call ‘Smedley’s system’ (albeit it was co-authored) on Amazon Web Services (AWS):
The fastest driver in Formula 1
by Rob Smedley, Colby Wise, Delger Enkhbayar, George Price, Ryan Cheng, and Guang Yang | on 20 AUG 2020 | in Amazon SageMaker, Artificial IntelligenceRob Smedley
https://aws.amazon.com/blogs/machine-le ... formula-1/Here are a few things that jumped out at me while reading through it.
[1] They call their system “
the first objective and data-driven model to determine who might be the fastest driver ever”. This strikes me as a deliberate lack of acknowledgement of POB’s work, and smacks of plagiarism, given the many points of conceptual similarity between the systems, and the fact that POB’s system has been in the public arena since 2011 (it was first publicised on Peter Windsor’s website):
https://grandprixratings.blogspot.com/2 ... ricks.html.
The tagline of POB’s blog clearly established his work as “the first” and “the most objective”:
Patrick O’Brien’s Grand Prix Ratings
"He was a mine of accurate information and his book is a respected and valued part of my racing library." ~ Stirling Moss, OBE. *** “Patrick O’Brien’s system is the most objective I’ve seen to date.” ~ Peter Windsor. *** "He probably pushed F1 metrics forward further than anybody else has ever done and his contribution will not be forgotten." ~ PF1 forum. *** "His rating system [...] brought some kind of consistent view of F1 throughout the decades and offered me a lot of insight." ~ PF1 forum.
https://grandprixratings.blogspot.com/
The following strike me as evidence that they got a significant leg-up from POB’s rating system, which goes completely unacknowledged, as far as I can see:
[1a] The concept of inter-linkages between drivers who raced together on the same team providing a constant as a basis for comparison.
POB ('Explanatory Factors', 2016, p. 125):
"Today with just two-driver teams, there are usually only about four or five interlinks. Even though there are fewer interlinks today than before 1961, there are more data points today because it is now compulsory for teams to compete in every race and all pre-race times are fully recorded."POB ('Explanatory Factors', 2016, p. 118):
"STEP 5: I now link this third driver’s (Coulthard’s) performances through Webber’s to Vettel’s. This I do on my working spreadsheets, developed from race-by-race tracking to arrive at season-by-season average time-based driver-ratings. All are measured proportionally against the fastest driver(s) within each season."Whereas POB writes about his massive spreadsheet of driver-ratings going back to 1894, Smedley writes of
“a network of teammate comparisons over the years”, going back to 1983.
“For example, Sebastian Vettel and Max Verstappen have never been on the same team, so we compare them through their respective connections with Daniel Ricciardo at Red Bull.”This ground-breaking concept of ‘the gap’ between team-mates was a method that POB cracked in 2002, aged 58, after a lifetime’s immersion in GP & F1 literature and race-viewing, and after publishing an analytical book on GP racing in 1994.
Smedley:
“we compare qualifying data for drivers on the same race team (such as Aston Martin Red Bull Racing), where teammates have competed against each other in a minimum of five qualifying sessions. By holding the team constant, we get a direct performance comparison under the same race conditions while controlling for car effects.”Instead of "team", POB referred to using the "car" as a constant. Both capture the same variables.
[1b] POB’s system is mindful to exclude outlier sessions – a nuance Smedley et al. incorporated in their system:
“We identify and remove anomalous lap time outliers” ... e.g., "
Vettel being penalized to comply with the 107% rule (which forced him to start from the pit lane)."POB (Explanatory Chapter, 2016, p. 103):
"But even in these ‘pure’ cases, a driver would very occasionally have trouble with his car or encounter traffic (even during one lap!) and set a time much slower than his norm. These I treated as outliers and excluded the time. Conversely, if race-time data were contaminated by car or driver trouble, causing a package’s time to deviate very obviously from its norm, I omitted the time and reverted to the pre-race times, as being more representative of how the package compared."[1c] In ‘separating driver performance from car performance,” one of the aims of POB’s system was to identify fast drivers in slow cars and slow drivers in fast cars.
Similarly, Smedley’s model prides itself on recognising ‘unsung heroes’:
“the model has ranked him [Kovalainen] so highly because of his consistent qualifying performances throughout his career. I, for one, am extremely happy to see Kovalainen get the data-driven recognition that he deserves for that raw talent that was always on display during qualifying”= but would this translate into race performance? Or would some drivers excel in qualifying while others excel instead in the actual races? Surely the races require more stamina so that would be a more accurate measure (less dependent on car machinery) overall? Surely a point debate.
Smedley:
“…Kovalainen doesn’t have the same number of World Championships as Hamilton, but his qualifying statistics speak for themselves—the model has ranked him high because of his consistent qualifying performance throughout his career.”To address the question of whether some drivers differ in skill between qualifying and actual races, is there some statistical analysis of differentials between qualifying vs race performance?
POB appears to have done this analysis (from his 'Explanatory Chapters', 2016, p. 99):
"Although packages are invariably slower in the races than in pre-race times, the gaps between team-mate packages and between different team packages are fairly constant whether pre-race or race-time. This is shown by an example of just two drivers below, although this pattern holds for the whole field throughout history.”Do Smedley et al. make any effort to compare qualifying and race performance?
[1d] Smedley’s laying out of the problem (problematisation) bears an uncanny similarity to POB’s:
POB’s ‘introduction’ to his system, in the Guidelines section of all his Rating System books:
"The 2013 Formula One season was dominated by the Vettel/ Red Bull-Renault package, which won 13 of the 19 races. Many reckon that Vettel is undoubtedly one of the great drivers. Some however question this, arguing that Vettel was fortunate in having the fastest car, the Red Bull-Renault. Just how good was Vettel compared with his peers? Can his performance be separated from the performance of his car?
"The 2012 Formula One season was a close-fought, year-long battle between three packages: the Hamilton/ McLaren-Mercedes, the Vettel/ Red Bull Renault, and the Alonso/ Ferrari. It ended at the Brazilian Grand Prix finale on an exciting note: Red Bull-Renault driver Sebastian Vettel won the Drivers Championship title narrowly, by just three points from Ferrari’s Fernando Alonso. Vettel won 5 races from 7 poles, Hamilton won 4 races from 8 poles, while Alonso scored 3 wins from just 2 poles.
"However, if we consider that the front-running Hamilton/ McLaren-Mercedes package suffered several tardy pit-stops that curbed potential wins, and that the Alonso/ Ferrari package was clearly slower in both qualifying and the races throughout the season, one has to ask: ‘Who really was the fastest driver?’”Smedley:
“Formula 1 (F1) racing is the most complex sport in the world. It is the blended perfection of human and machine that create the winning formula. It is this blend that makes F1 racing, or more pertinently, the driver talent, so difficult to understand. How many races or Championships would Michael Schumacher really have won without the power of Benetton and later, Ferrari, and the collective technical genius that were behind those teams? Could we really have seen Lewis Hamilton win six World Championships if his career had taken a different turn and he was confined to back-of-the-grid machinery? Maybe these aren’t the best examples because they are two of the best drivers the world has ever seen. There are many examples, however, of drivers whose real talent has remained fairly well hidden throughout their career. Those that never got that “right place, right time” break into a winning car and, therefore, those that will be forever remembered as a midfield driver.”[1e] Similarly, the ‘humble disclaimer’ about the accuracy/ objectivity/ validity/veracity of the figures sounds like POB’s:
POB: (from his 'Explanatory Chapters', 2016, p. 38):
‘Although journalist and analyst Peter Windsor wrote in 2010 that my Rating System was “the most objective I’ve seen to date” (either from his 2010 blog and/or from one of his 2010 F1 Racing Japan magazine race reports), my System is not entirely objective nor perfect due to a number of factors which I will discuss below.’
[...]
“Peter Windsor, writing about my Rating System in December 2010, stated:
'… this system, like any other, is by definition imperfect. It is, though, about as near as you can get to the truth. Patrick did his own arithmetic – and guess what: the difference between Nico Rosberg and Michael Schumacher (0.3) is exactly the difference between the two drivers that the Mercedes F1 team themselves established by mid-season with their own methodology. Someone must be doing something right!'
Source: ‘Unique F1 Driver Ratings, 2010’ by Peter Windsor 阿拉蕾, 2010-12-10, Retrieved from https://m.hupu.com/bbs/1754685.htmlPOB ('Explanatory Factors', 2016, p. 177):
"My own view is that Grand Prix racing is not an exact science and therefore does not lend itself entirely to black-and-white analysis. I suspect however that we will continue to argue for the superiority of one method over another, rather than appreciating that both contribute to the scientific analysis of Grand Prix racing performance and, most importantly, generate discernment and debate."Smedley:
“These rankings aren’t proposed as definitive, and there will no doubt be disagreement among fans. In fact, we encourage a healthy debate! Fastest Driver presents a scientific approach to driver ranking aimed at objectively assessing a driver’s performance controlling for car difference.”
[2] Where the systems differ:
[2a] In using “qualifying sessions lap times”, Smedley points out that their system does not take into account “
racecraft or the ability to win races or drive at 200 mph while still having the bandwidth to understand everything going on around you…”.
In contrast, by using actual-race times as a primary measure and pre-race-times as a secondary measure when required, POB claims to take all this into account:
“During this work, my Rating System was criticised by a prominent Formula One journalist to the effect that it does not take into account a driver’s ‘management abilities’, that is, out-of-car skills and talents, such as Michael Schumacher had in ‘organising’ a team around himself to enhance success. This is not so. My System takes everything into account. Driver and car performance are measured on-track, and therefore include testing, preparation, qualifying, racing, life-path experience, and every capability a driver or anyone else may have brought to bear on performance during design.” ('Explanatory Chapters', 2016, p. 31)
[2b] Continuous updating of the figures:
Smedley:
“the qualifying data consumed by the model is updated with fresh lap times after every race weekend”In contrast, POB used to do the updating by hand – an incredible mass of figures to manipulate in one’s head.
ConclusionIt is clear that there are more than a few similarities between the two system, POB's devised from 2002 and publicised online since 2011; Smedley's account online dated 20 August 2020:
https://aws.amazon.com/blogs/machine-le ... formula-1/It’s a shame that Smedley et al. were unable to acknowledge any indebtedness to POB. The echoes of so many of POB’s analytical sentiments, nuances and conclusions are evident in Smedley’s blog alone. Why not simply acknowledge and critique POB’s contribution and then state how their system differs and how it develops it further e.g., using “Amazon SageMaker, a fully managed service to build, train, and deploy ML models”?
Even Isaac Newton showed more humility: “If I have seen further than others,” Newton wrote in a 1675 letter to fellow scientist Robert Hooke, “it is only because I have stood on the shoulders of giants who have come before me.”
In contrast, POB freely acknowledged the leg-up he got from Laurence Pomeroy’s (1949, 1954) system, and he critiqued it:
“[Pomeroy] devised a clever system of time-comparisons on ‘circuits that were used over more than one season’, in order to quantify and compare car-speeds, and identify progress (or a lack of progress) in car performance. Covering the seasons from 1906 to 1953, Pomeroy reduced the times to a simple numerical formula, the Pomeroy Index or ‘Py’ Index. Car performances were scored, measured and compared directly against his benchmark car, the 13.0-litre/ 793-cubic inch 1906 Renault AK, which won the 1906 French Grand Prix. He chose this car because it won this first French Grand Prix. He allocated it his benchmark figure of 100.0.
“Pomeroy’s was a ground-breaking and accurate system, based on fastest lap-times from every Grand Prix race for each season. Using interval data (fastest lap times), Pomeroy’s system gave cars a rating figure.”
[…]
“Pomeroy’s method inspired my Rating System, prompting me to use 100.0 as my base as well. However, as mentioned above, Pomeroy assumed that he was rating cars whereas he was in fact measuring packages (car-and-driver combined).” ('Explanatory Chapters', 2016, p. 45)
Standing on the shoulders of giants is a necessary part of creativity, innovation, and development; it doesn't make what you do less valuable!
An academic gave me the following advice:
If they have made no reference at all to the years of work done by POB then their public writing on it should be rejected until the omission has been made good.
I hope this prompts a reaction from the authors by way of an apology to POB plus a statement affirming his priority in this field.
Plagiarism is always a strong charge to level at anyone. But the definition of plagiarism is broad. The OED says it involves taking the work or the idea of someone else and passing it off as one's own. It makes no mention of it having to be verbatim.
From the evidence you present, it certainly looks like plagiarism to me. The fact, too, that it has been done so blatantly suggests that those involved seem to believe that anything published can simply be “lifted” and re-used under their own names, without any acknowledgement or attribution. And what is that if not plagiarism?