Listen "Tom Plasterer: The Origins of FAIR Data Practices – Episode 35"
Episode Synopsis
Tom Plasterer
Shortly after the semantic web was introduced, the demand for discoverable and shareable data arose in both research and industry.
Tom Plasterer was instrumental in the early conception and creation of the FAIR data principle, the idea that data should be findable, accessible, interoperable, and reusable.
From its origins in the semantic web community, scientific research, and the pharmaceutical industry, the FAIR data idea has spread across academia, research, industry, and enterprises of all kinds.
We talked about:
his recent move from a big pharma company to Exponential Data where he leads the knowledge graph and FAIR data practices
the direct line from the original semantic web concept to FAIR data principles
the scope of the FAIR acronym, not just four concepts, but actually 15
how the accessibility requirement in FAIR distinguishes the standard from the open data
the role of knowledge graphs in the implementation of a FAIR data program
the intentional omission of prescribed implementations in the development of FAIR and the ensuing variety of implementation patterns
how the desire for consensus in the biology community smoothed the development of the FAIR standard
the role of knowledge graphs in providing a structure for sharing terminology and other information in a scientific community
how his interest in omics led him to computer science and then to the people skills crucial to knowledge graph work
the origins of the impetus for FAIR in European scientific research and the pharmaceutical industry
the growing adoption of FAIR as enterprises mature their web thinking and vendors offer products to help with implementations
the roles of both open science and the accessibility needs in industry contributed to the development of FAIR
the interesting new space at the intersection of generative AI and FAIR and knowledge graph
the crucial foundational role of FAIR in AI systems
Tom's bio
Dr. Tom Plasterer is a leading expert in data strategy and bioinformatics, specializing in the application of knowledge graphs and FAIR data principles within life sciences and healthcare. With over two decades of experience in both industry and academia, he has significantly contributed to bioinformatics, systems biology, biomarker discovery, and data stewardship. His entrepreneurial ventures include co-founding PanGenX, a Personalized Medicine/Pharmacogenetics Knowledge Base start-up, and directing Project Planning and Data Interpretation at BG Medicine. During his extensive tenure at AstraZeneca, he was instrumental in championing Data Centricity, FAIR Data, and Knowledge Graph initiatives across various IT and scientific business units.
Currently, Dr. Plasterer serves as the Managing Director of Knowledge Graph and FAIR Data Capability at XponentL Data, where he defines strategy and implements advanced applications of FAIR data, knowledge graphs, and generative AI for the life science and healthcare industries. He is also a prominent figure in the community, having co-founded the Pistoia Alliance FAIR Data Implementation group and serving on its FAIR data advisory board. Additionally, he co-organizes the Health Care and Life Sciences symposium at the Knowledge Graph Conference and is a member of Elsevier’s Corporate Advisory Board.
Connect with Tom online
LinkedIn
Video
Here’s the video version of our conversation:
https://youtu.be/Lt9Dc0Jvr4c
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 35. With the introduction of semantic web technologies in the early 2000s, the World Wide Web began to look something like a giant database. And with great data, comes great responsibility. In response to the needs of data stewards and consumers across science, industry, and technology, the FAIR data principle - F A I R - was introduced. Tom Plasterer was instrumental in the early efforts to make web data findable, accessible, interoperable, and reusable.
Interview transcript
Larry:
Hi everyone. Welcome to episode number 35 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show, Tom Plaster. Tom is the managing director who leads the knowledge graph and FAIR practices at Exponential Data, which is a company in the Boston area, or he's in the Boston area. So welcome Tom, tell the folks a little bit more about what you're up to these days.
Tom:
Thanks, Larry. And great pleasure to be with you and the audience. So I'm now, just last week I hit a year at Exponential Data, after 12 and a half years at big pharma. And so, I came over to Exponential Data to lead the knowledge graph and FAIR data practices, and also to unite with our expertise around artificial intelligence. One of the things that I started to get really excited about with the knowledge graph conference over the last few years was the convergence of these two communities, and really how AI knowledge graphs and especially FAIR data, as a way of having curated trusted data for these applications, could be completely synergistic. And so that was really what brought me there. And when I joined, we were around 40 people. As I was leading this practice, we grew to about 240. And were recently acquired by Genpact.
Tom:
And so, now we're now part of a much bigger organization bringing our strength of artificial intelligence, generative AI, knowledge graphs and FAIR data to this larger organization. So that's been really my journey over the last year. And really wanted to bring these two technologies together. And one of the things that we've really found is how important FAIR data is to both sides of the equation. And so, this is really where trusted data, clean data, data that follows standards, data that's self-describing, all of the things that you want to do for FAIR data, are really important foundationally for what you want to do with knowledge graphs and for how you want to give this trusted data to large language models, generative AI, to get the most out of those technologies. So in a nutshell, that's been my journey over the last year.
Larry:
Yeah. And we didn't talk explicitly about it as we were preparing for this, but AI is the logical and obvious place where all this is going now. And I think everybody's concerned about delivering trustworthy, clean, FAIR data wherever you are. But do you feel like have you been uniquely well-prepared for that with both your company but... And I know your background, that's what we want to talk about today, is the origins of the FAIR data standard and you've been around it right from the get-go right?
Tom:
Right from the beginning. And the community leans a lot on earlier trends around the Semantic Web, Semantic Web technology. I think a lot of the founders are very web centric in their thinking. And there's a direct tie between with Tim Berners-Lee, Ora Lassila, Jim Hendler wanted to accomplish with the Semantic Web, how the standards evolved there and then grew up and became available within graph databases, eventually knowledge graphs, as a vehicle to prove that FAIR data worked. And so, that's a direct thread between that and wanting to have knowledge injection for generative AI and the value there. The whole thing flows really, really well.
Larry:
Yeah, interesting. And one thing as you said that the direct descendants from Tim Berners-Lee's and Ora and Jim's, I guess the paper in Scientific American, one of the things that arose like, I don't know what, five or 10 years after that was Tim Berners-Lee's notion of five star data, like the kind of 1, 2, 3, 4, 5 star rating. And then only, what, five, not five, seven years later, FAIR came along. Can you talk a little bit about how these perceptions of and the way good data and their practices are codified?
Tom:
Sure. So if we think about five star linked data and kind of what Tim was trying to accomplish there, get your data on the web, having an accessible format, follow standards, have it linked together, that's really, really close to the FAIR data principles itself. And I think a lot of the things within the FAIR data principles were learned directly from that. And I guess first I should take a step back and explain. People have probably come across the FAIR data principles, and they've heard Findable, Accessible, Interoperable, Reusable, and they think there's four of them. There's 15 of them. So this is where it gets to be a little bit more complicated. So FAIR as an acronym was just a very nice way of marketing and putting these things together, but a lot of the ways that they can become really useful is the cell principle. So I'm just going to talk about them and describe them real briefly without being too technical. People can learn more about it in the 2016 Nature Medicine paper.
Tom:
So the findable is really about URIs. And so it's really about can I identify both an instance of data or a concept, a class that follows a URI, later an IRI, and sometimes we're calling them persistent identifiers or GUPRIs, so Global, Unique, Persistent Resource Identifiers, all the same thing. So can you use that to identify a piece of data, and if so, when you resolve it, will it provide useful metadata for both humans and machines? That's really the most important piece that you need to do to get started. Let's put an identifier on our data, on our metadata, so that we can resolve it, find it, put it in an index, so that we can get something useful out of it. So that's about four of the F principles there.
Tom:
Accessible is really about interoperability and it's following common protocols. So HTTP, HTTPS, we're not reinventing protocols, we're following standards. And then authentication on top of that in some sort of a certified manner. Usually it ends up being LDAP with single sign-on or something like that. Some way of authenticating your data.
Shortly after the semantic web was introduced, the demand for discoverable and shareable data arose in both research and industry.
Tom Plasterer was instrumental in the early conception and creation of the FAIR data principle, the idea that data should be findable, accessible, interoperable, and reusable.
From its origins in the semantic web community, scientific research, and the pharmaceutical industry, the FAIR data idea has spread across academia, research, industry, and enterprises of all kinds.
We talked about:
his recent move from a big pharma company to Exponential Data where he leads the knowledge graph and FAIR data practices
the direct line from the original semantic web concept to FAIR data principles
the scope of the FAIR acronym, not just four concepts, but actually 15
how the accessibility requirement in FAIR distinguishes the standard from the open data
the role of knowledge graphs in the implementation of a FAIR data program
the intentional omission of prescribed implementations in the development of FAIR and the ensuing variety of implementation patterns
how the desire for consensus in the biology community smoothed the development of the FAIR standard
the role of knowledge graphs in providing a structure for sharing terminology and other information in a scientific community
how his interest in omics led him to computer science and then to the people skills crucial to knowledge graph work
the origins of the impetus for FAIR in European scientific research and the pharmaceutical industry
the growing adoption of FAIR as enterprises mature their web thinking and vendors offer products to help with implementations
the roles of both open science and the accessibility needs in industry contributed to the development of FAIR
the interesting new space at the intersection of generative AI and FAIR and knowledge graph
the crucial foundational role of FAIR in AI systems
Tom's bio
Dr. Tom Plasterer is a leading expert in data strategy and bioinformatics, specializing in the application of knowledge graphs and FAIR data principles within life sciences and healthcare. With over two decades of experience in both industry and academia, he has significantly contributed to bioinformatics, systems biology, biomarker discovery, and data stewardship. His entrepreneurial ventures include co-founding PanGenX, a Personalized Medicine/Pharmacogenetics Knowledge Base start-up, and directing Project Planning and Data Interpretation at BG Medicine. During his extensive tenure at AstraZeneca, he was instrumental in championing Data Centricity, FAIR Data, and Knowledge Graph initiatives across various IT and scientific business units.
Currently, Dr. Plasterer serves as the Managing Director of Knowledge Graph and FAIR Data Capability at XponentL Data, where he defines strategy and implements advanced applications of FAIR data, knowledge graphs, and generative AI for the life science and healthcare industries. He is also a prominent figure in the community, having co-founded the Pistoia Alliance FAIR Data Implementation group and serving on its FAIR data advisory board. Additionally, he co-organizes the Health Care and Life Sciences symposium at the Knowledge Graph Conference and is a member of Elsevier’s Corporate Advisory Board.
Connect with Tom online
Video
Here’s the video version of our conversation:
https://youtu.be/Lt9Dc0Jvr4c
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 35. With the introduction of semantic web technologies in the early 2000s, the World Wide Web began to look something like a giant database. And with great data, comes great responsibility. In response to the needs of data stewards and consumers across science, industry, and technology, the FAIR data principle - F A I R - was introduced. Tom Plasterer was instrumental in the early efforts to make web data findable, accessible, interoperable, and reusable.
Interview transcript
Larry:
Hi everyone. Welcome to episode number 35 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show, Tom Plaster. Tom is the managing director who leads the knowledge graph and FAIR practices at Exponential Data, which is a company in the Boston area, or he's in the Boston area. So welcome Tom, tell the folks a little bit more about what you're up to these days.
Tom:
Thanks, Larry. And great pleasure to be with you and the audience. So I'm now, just last week I hit a year at Exponential Data, after 12 and a half years at big pharma. And so, I came over to Exponential Data to lead the knowledge graph and FAIR data practices, and also to unite with our expertise around artificial intelligence. One of the things that I started to get really excited about with the knowledge graph conference over the last few years was the convergence of these two communities, and really how AI knowledge graphs and especially FAIR data, as a way of having curated trusted data for these applications, could be completely synergistic. And so that was really what brought me there. And when I joined, we were around 40 people. As I was leading this practice, we grew to about 240. And were recently acquired by Genpact.
Tom:
And so, now we're now part of a much bigger organization bringing our strength of artificial intelligence, generative AI, knowledge graphs and FAIR data to this larger organization. So that's been really my journey over the last year. And really wanted to bring these two technologies together. And one of the things that we've really found is how important FAIR data is to both sides of the equation. And so, this is really where trusted data, clean data, data that follows standards, data that's self-describing, all of the things that you want to do for FAIR data, are really important foundationally for what you want to do with knowledge graphs and for how you want to give this trusted data to large language models, generative AI, to get the most out of those technologies. So in a nutshell, that's been my journey over the last year.
Larry:
Yeah. And we didn't talk explicitly about it as we were preparing for this, but AI is the logical and obvious place where all this is going now. And I think everybody's concerned about delivering trustworthy, clean, FAIR data wherever you are. But do you feel like have you been uniquely well-prepared for that with both your company but... And I know your background, that's what we want to talk about today, is the origins of the FAIR data standard and you've been around it right from the get-go right?
Tom:
Right from the beginning. And the community leans a lot on earlier trends around the Semantic Web, Semantic Web technology. I think a lot of the founders are very web centric in their thinking. And there's a direct tie between with Tim Berners-Lee, Ora Lassila, Jim Hendler wanted to accomplish with the Semantic Web, how the standards evolved there and then grew up and became available within graph databases, eventually knowledge graphs, as a vehicle to prove that FAIR data worked. And so, that's a direct thread between that and wanting to have knowledge injection for generative AI and the value there. The whole thing flows really, really well.
Larry:
Yeah, interesting. And one thing as you said that the direct descendants from Tim Berners-Lee's and Ora and Jim's, I guess the paper in Scientific American, one of the things that arose like, I don't know what, five or 10 years after that was Tim Berners-Lee's notion of five star data, like the kind of 1, 2, 3, 4, 5 star rating. And then only, what, five, not five, seven years later, FAIR came along. Can you talk a little bit about how these perceptions of and the way good data and their practices are codified?
Tom:
Sure. So if we think about five star linked data and kind of what Tim was trying to accomplish there, get your data on the web, having an accessible format, follow standards, have it linked together, that's really, really close to the FAIR data principles itself. And I think a lot of the things within the FAIR data principles were learned directly from that. And I guess first I should take a step back and explain. People have probably come across the FAIR data principles, and they've heard Findable, Accessible, Interoperable, Reusable, and they think there's four of them. There's 15 of them. So this is where it gets to be a little bit more complicated. So FAIR as an acronym was just a very nice way of marketing and putting these things together, but a lot of the ways that they can become really useful is the cell principle. So I'm just going to talk about them and describe them real briefly without being too technical. People can learn more about it in the 2016 Nature Medicine paper.
Tom:
So the findable is really about URIs. And so it's really about can I identify both an instance of data or a concept, a class that follows a URI, later an IRI, and sometimes we're calling them persistent identifiers or GUPRIs, so Global, Unique, Persistent Resource Identifiers, all the same thing. So can you use that to identify a piece of data, and if so, when you resolve it, will it provide useful metadata for both humans and machines? That's really the most important piece that you need to do to get started. Let's put an identifier on our data, on our metadata, so that we can resolve it, find it, put it in an index, so that we can get something useful out of it. So that's about four of the F principles there.
Tom:
Accessible is really about interoperability and it's following common protocols. So HTTP, HTTPS, we're not reinventing protocols, we're following standards. And then authentication on top of that in some sort of a certified manner. Usually it ends up being LDAP with single sign-on or something like that. Some way of authenticating your data.
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.