How much metadata?
That’s a series of screensnaps from the ePub editor, SIGIL. Those designations are just for Author.
A example of possible user-contributed metadata linking:
I once had a wise and honest teacher; I asked him one day to explain a certain matter; and he replied: “I haven’t time now. If I knew it thoroughly I could tell you in a few minutes, but as I know it only imperfectly, it would take me an hour.”
— Footnotes to Life by Dr. Frank Crane, published 1920
connects/relates loosely to:
I have made this [letter] longer, because I have not had the time to make it shorter. –Blaise Pascal
— quoted in The Art of the Start by Guy Kawasaki, published 2004
I think these observations are fundamental.
The sheer volume of what you’re talking about makes it clear to me that (contrary what might be assumed from your screenshot) user-generated metadata can realistically only be small part of what is needed, unless of course we consider generation of metadata to be a collective activity, like Wikipedia for smart book metadata. Still, I think it argues for machine understanding to create a large fraction of the necessary metadata.
I am aware of an equivalent today related to the descriptions of audio/video content. Even without the kind of extensive linking you imply, identifying relationships between content in large media libraries is today not manageable without extensive computer assistance.
The example of user-contributed metadata linking (which I suspect is not randomly chosen) also brings up several important questions like: What distinction, if any, is made between metadata and annotations? My guess is that a good fraction of metadata might be even generated after the work’s publication. Should we make a distinction between metadata furnished by the publisher and metadata added afterwards? What about the authority of the metadata? And how can I make sure that I don’t get distracted by metadata I don’t care about? How do I refer to a particular passage to attach the metadata there, and not to the whole work, for example?
The list of questions is long. I’m sure librarians have already answered many of them, and I know that some of the basic technologies and protocols for semantic web applications are applicable here.
What I find striking is the gap between the communities that are thinking about these aspects and the businesses that are trying to capture short term beachhead without giving a thought to them.
All excellent questions for which there might or might not be answers. I never said it would be simple, only that it'd be better than the scam being perpetrated today.