Tables design
The big idea of this site
The idea is to combine into one wiki lots of semi-standardized information about the development of aeronautics and aviation from 1800ish to 1916ish. It is a period when we can watch, through these data streams, the evolution of aviation from being a dubious dream toward being a science and technology then an industry. The data in every table is incomplete and needs lots of work but the wiki will be somewhat better than spreadsheets for improving it. Every table needs lots of unstructured wiki-text space, which might be longhand wikilinks will be valuable there. Every table will have some dates or years, and some kind of language or country identifier. I want the kinds of features in WikiPapers and DiscourseDB plus another one I can show.
Patent table
One table has information on each of many patents (>13,000) related to this subject. This is the one I can send soon. It has about 40 columns, basically all text strings or numbers. Only a few of them are semantically significant in the sense that other pages need to query them. There are dates but not all known or formatted yet so I am inclined to treat them as text in the near run.
- for updates on what actually was implemented, see Template:Patent and the structured data on patents linked from the Main page.
- PageTitle = GlobalPatentId: country-grantyear-patentnumber, *
- FYear -- date, for year-filed -- just number
- GYear -- date, for year-granted
- Office -- Page -- a country, whose patent offices will be described on that page
- PatNum -- string
- Inventors -- list of person pages, separated by semicolons
- InventorCountry -- list of country pages
- ApplicantPerson -- list of person pages
- ApplicantFirm -- list of organization pages
- ApplicantType -- string
- ApplicantIsInventor -- Boolean
- OriginalTitle -- string099
- EnglishTitle -- string99
- Techfields -- list of TechType pages
- FilingDate -- date
- FullSpecificationFiledDate -- date
- Appnum -- string
- GrantDate -- date
- Granted -- boolen
- PublicationDate -- date
- Supplementary_to_patent -- list of Patent pages separated by semicolons
- RelatedToAircraft float 0/.5/1
- SerialNum -- string -- a US-only item, I think
- PatentAgent — a firm page
- AssignedTo — a person or firm page. (if ambiguity impossible, can make this a URL or string
- NationalTechCats — list of Techtype pages
- IPCs — list of Techtype pages
- CPCs — list of Techtype pages
- FamilyYear — integer
- FirstFiling — boolean
- Citations — integer
- CitedBy — list of patent pages separated by semicolons
- ApplnId— string40
- InpadocFamilyId — string40
- TextPages -- integer
- DiagramPages -- integer
- FiguresCount -- integer
- ClaimsCount -- integer
- includes: URL sources (espacenet, google patents, whatever), Citedby references
- possible improvements to apply as of Jan 2023:
- eliminate "Applicant type" field
- switch from 'list of pages' to 'list of strings'
- switch cited-since-1930 to cited-since-1948
- add field for "Date-inventor-signed" for the sometimes-distinctive date an inventor leaves that is before the filing dates
Publications table
One table will have information on each of >30,000 publications in this early period related to aeronautics and aviation, computerized mainly from published bibliographies at that time. There will be many fields here, probably 25 or more. This is not ready to upload but it will be the biggest table.
- Page title = Title of the publication, plus year and author if needed for disambiguation
- Notes and sources -- page section
- many more fields, later, once these other tables are working. This data is not ready.
Organizations
A table will have lists of aeronautics-related clubs, firms, government labs, military organizations and other organizations. There are something like 760 clubs, and xxx firms, and only a few that are not of one of those types.
- Page title = Best known or longest term name of the club, standardized in English
- Organization_names = string with many names, notably its name in the native language
- Entity_type -- string (club, firm, military, university lab, government lab, research group/network, other)
- Country -- page
- City -- string
- Affiliated_with -- list of page names ; list of other Organizations, to be used for org members of international federation or multinational conglomerates
- Scope -- string ; a de facto geographical domain, e.g. national club versus sub-national region ; a national Wright Company ; blank if unlimited
- Started_aero -- date ; date founded, or date of first investment into aircraft and products services
- Ended_aero-- date ; date closed or merged into another org or ceased in aircraft business
- Key_people -- List of person pages separated by semicolons (founders, designers -- details will be in Notes and sources section)
- Notes_and_sources -- page section
- This will include scope of description, membership, employment size, addresses, phone, cable address, and someday can break out geo coords for locations
Exhibitions and conferences
A table of events, perhaps 100-200 entries.
- Page_title = Short unambiguous English name of the event, with year if necessary for disambiguation
- Event_names -- string, with various names of the event, including in the native language
- Event_type -- string (with keywords, not all determined yet)
- Country -- page
- Location -- string
- Start_date -- date
- Number_of_days -- integer
- Tech_focus -- Page
- Notes_and_sources -- page section
Letters
A table of known written communications between the inventors in this area. This is not complicated and can be uploaded soon.
- Page title -- has from, to, and date for now. Will be renamed for brevity or clarity
- Sender -- Person-page
- Recipient -- Person page
- Date_sent -- date
- From_location -- string
- To_location -- string
- Communication_type -- string (letter, telegram, cable, other)
- Language -- string (a short one, as on Wikimedia en, fr, de, it, ru, hu, with an "other" type when unknown)
- Refers_to_flight? -- boolean (0 if the text does not refer to flight technologies; such a letter might be included in this wiki because of the sender or recipient)
- Length -- integer (estimated number of characters in the communication)
- Notes_and_sources -- page section, with footnotes -- will sometimes include text of letter
People
These are individual inventors, authors, or others referred to in the other tables (Less central examples: persons in a firm or club, witnesses to a patent, editors of a journal).
The people in the table appear in the other records. And there should be automatic queries from the inventor table to show, for example, a list of publications and patents by each of these individuals.
- Fields
- Page name: name of person with last name in caps for the moment
- First names: string
- Last name: string
- Country1: string
- Location1: string
- Country2: string
- Location2: string
- Birthdate: date
- Affiliated with: list of Page names (Persons or Organizations)
- Notes and sources: page section -- will include addresses and locations
- Queries for patents, publications, clubs, firms that refer to this person.
Periodical
Journal -- or Publication series Some wiki pages will describe periodicals of the time (before 1916) but I don't think these have to be queried. There will be wiki pages for specific articles published in those periodicals, which can just wikilink to the page about the periodical.
- Page title -- Best known title
- Start date -- date
- Notes_and_sources -- page section
Source -- Some wiki pages will describe a source of information used in the other tables. Together these make up a bibliography, plus extra information about where the sources were physically used from. Page numbers won't mainly be here but in notes/footnotes in the entries in other tables. I don't think there are any semantics needed here -- these pages do not need to be queried by other pages, just hyperlinked to.
Techtype
Technological categories and topics that publications, patents, and exhibitions can focus on.
- Page title = Term categorizing flight technologies, or category of patents from the CPC, IPC, or national patent category systems
- Enclosing_categories -- list of pages from this same Techtype table
- Affiliated_concepts -- list of pages, mostly in this same Techtype table
- Notes and sources -- page section -- explains the term/category, identifies the overall classification system it's in (CPC, IPC, US, etc) and links to sources
Queryable concepts
Wiki pages can query other pages about these elements:
- Year filed or written (patent/publication concepts)
- Year granted or published (a slightly later event)
- Year born or established (for clubs, firms, and persons)
- Year ended (for clubs, firms, and persons)
- A single generic concept of "year" would be okay for the first round
- Date (generally in format 1906-03-30 but perhaps should be a string so it can include ranges, and ? for uncertainty). Someday I will want to do date math (e.g. "date granted minus date filed") but that is some ways off
- Language, most importantly en, fr, de. Also significant: it, sv, ru, hu, and several others. The abbreviations can be the same as on Wikimedia.
- Country: Most importantly: US, GB, FR, DE. Also IT, RU. There is some uncertainty about how to classify others in this period. I think I want to have AH for Austria-Hungary, but also AT and HU and other categories when more precise information is known. I would be willing to start with a known set of categories for countries now, and add richness for the countries of the historical period as time permits. Most wiki pages on any topic will refer to a country. More than one country can appear on wiki page.
- Technology type. These overlap a lot, and a particular item in the wiki may have several techtype labels. But technologies include: LTA (=lighter-than-air), dirigible, airfoil, intrastructure, instruments, control, navigation, propulsion, helicopter, rocket, other ... and a dozen more. Technology type can appear on a patent or publication, or (less often) on a letter, club, or firm.
Tasks for Peter
- See and imitate Sample queries
- make sure all firms are out of the inventor field into the applicants field. In the inventor field there will be a list of person-pages ; in the applicant field there will be a is a list of person-pages and organization-pages
- names of long wiki text: section -- one or two for sources and notes
- data types: integer float string section
- things that are links to pages will be of type page
- list of page, with delimiter specified – ; is fine
- tech type can be a list-of-page type
- Categories – don't set them. The item's type takes care of that.
- what is the standard name of a patent? or a letter? etc
- one table column must be called Page Title
- he'll use Data Transfer extension to upload the data
- cargo field <==> can be retrieved by queries
- those queries can search by page title, not sure what else.
- there will be one form per concept. it is not realistic to have several different forms for entering patent data
- for record linkage paper
- part of what it means to link the records together is not just to link them directly but identify useful categories in which to combine them. thus the effort to link patent classifications across systems and time is important. discuss the patent classifications