Tables design

From Inventing aviation
Revision as of 12:37, 3 November 2016 by Econterms (talk | contribs) (more page titles)
Jump to navigation Jump to search

The big idea of this site

The idea is to combine into one wiki lots of semi-standardized information about the development of aeronautics and aviation from 1800ish to 1916ish. It is a period when we can watch, through these data streams, the evolution of aviation from being a dubious dream toward being a science and technology then an industry. The data in every table is incomplete and needs lots of work but the wiki will be somewhat better than spreadsheets for improving it. Every table needs lots of unstructured wiki-text space, which might be longhand wikilinks will be valuable there. Every table will have some dates or years, and some kind of language or country identifier. I want the kinds of features in WikiPapers and DiscourseDB plus another one I can show.

Patent table

One table has information on each of many patents (>13,000) related to this subject. This is the one I can send soon. It has about 40 columns, basically all text strings or numbers. Only a few of them are semantically significant in the sense that other pages need to query them. There are dates but not all known or formatted yet so I am inclined to treat them as text in the near run.

  • Page title = GlobalPatentId: country-filingyear-patentnumber, *
  • FYear -- date, for year-filed -- just number
  • GYear -- date, for year-granted
  • Office -- list of country pages
  • PatNum -- string
  • Inventors -- list of person pages
  • InventorCountry -- list of country pages
  • ApplicantPerson -- list of person pages
  • ApplicantFirms -- list of organization pages
  • Applicant_type -- string12
  • Appt_is_invt -- Boolean
  • OriginalTitle -- string099
  • EnglishTitle -- string99
  • Techfields -- list of TechType pages
  • FilingDate -- date
  • FullSpecificationFiledDate -- date
  • Appnum -- string
  • GrantDate -- date
  • Granted -- boolen
  • PubDate -- date
  • Supplementary_to_patent -- list of patent pages separated by semicolons
  • RelatedToAircraft float 0/.5/1 (blank means don't know)
  • SerialNum -- string19
  • PatentAgent — a firm page
  • AssignedTo — a person or firm page. (if ambiguity impossible, can make this a URL or string)
  • NationalTechCat — list of Techtype pages
  • IPCs — list of Techtype pages
  • CPCs — list of Techtype pages
  • FamilyYear — integer
  • FirstFiling — boolean
  • Citations — integer
  • ApplnId— string40
  • Inpadoc_family_id — string40
  • TextPgs -- integer
  • DiagramPgs -- integer
  • FiguresCount -- integer
  • ClaimsCount -- integer
  • Notes and sources -- page section
  • includes: URL sources (espacenet, google patents, whatever), Citedby references
  • InventorAddr -- move to person info

Publications table

One table will have information on each of >30,000 publications in this early period related to aeronautics and aviation, computerized mainly from published bibliographies at that time. There will be many fields here, probably 25 or more. This is not ready to upload but it will be the biggest table.

  • Page title = Title of the publication, plus year and author if needed for disambiguation
  • Notes and sources -- page section
  • many more fields, later, once these other tables are working. This data is not ready.

Organizations

A table will have lists of aeronautics-related clubs, firms, government labs, military organizations and other organizations. There are something like 760 clubs, and xxx firms, and only a few that are not of one of those types.

  • Page title = Best known or longest term name of the club, standardized in English
  • Organization_names = string with many names, notably its name in the native language
  • Entity_type -- string (club, firm, military, university lab, government lab, other)
  • Country -- Page
  • Affiliated_with -- list of page names of other Organizations (notably to be used for org members of international federation
  • Country -- page
  • City -- string
  • Start_year -- date (of founding)
  • End_year-- date (ended)
  • Started_investing -- date ; first date of investment into aircraft products (blank for nonprofit clubs that just have members)
  • Ended_investing -- date ; date ceased in aircraft business
  • Key_persons -- List of person pages (founders, designers)
  • Notes and sources -- page section
  • This will include scope of description, membership, employment size, addresses, phone, cable address, and someday can break out geo coords for locations

Exhibitions and conferences

A table of events, perhaps 100-200 entries.

  • Page title = Short unambiguous English name of the event, with year if necessary for disambiguation
  • Event_names -- string, with various names of the vent, notably in the native language
  • Country -- page
  • Start date -- date
  • Number of days -- integer
  • Notes and sources -- page section

Letters

A table of known written communications between the inventors in this area. This is not complicated and can be uploaded soon.

  • Page title = ... to be decided
  • Sender -- Person-page
  • Recipient -- Person page
  • Date_sent -- date
  • From_location -- string
  • To_location -- string
  • Length -- integer (estimated number of characters in the communication)
  • Refers_to_flight? -- boolean
  • Communication_type -- string (letter, telegram, cable, other)
  • Language -- string (a short one, as on Wikimedia en, fr, de, it, ru, hu, with an "other" type when unknown)
  • Notes and sources -- page section, with footnotes -- will sometimes include text of letter

Persons

These are individual inventors, authors, or others referred to in the other tables (Less central examples: persons in a firm or club, witnesses to a patent, editors of a journal).

This is one is not mainly to be uploaded but to be built by human curation as we relate and link a list of people who appear in the other records and discuss their history with a bibliography. And there should be automatic queries from the inventor table to show, for example, a list of publications and patents by each of these individuals.

Fields
  • Page name: name of person with last name in caps for the moment
  • First names: string
  • Last name: string
  • Birth year: integer
  • Notes and sources: page section
  • Queries for patents, publications, clubs, firms that refer to this person.

Periodical

Journal -- or Publication series Some wiki pages will describe periodicals of the time (before 1916) but I don't think these have to be queried. There will be wiki pages for specific articles published in those periodicals, which can just wikilink to the page about the periodical.

  • Page title = Last known title
  • Notes and sources -- page section

Source -- Some wiki pages will describe a source of information used in the other tables. Together these make up a bibliography, plus extra information about where the sources were physically used from. Page numbers won't mainly be here but in notes/footnotes in the entries in other tables. I don't think there are any semantics needed here -- these pages do not need to be queried by other pages, just hyperlinked to.

Queryable concepts

Wiki pages can query other pages about these elements:

  • Year filed or written (patent/publication concepts)
  • Year granted or published (a slightly later event)
  • Year born or established (for clubs, firms, and persons)
  • Year ended (for clubs, firms, and persons)
  • A single generic concept of "year" would be okay for the first round
  • Date (generally in format 1906-03-30 but perhaps should be a string so it can include ranges, and ? for uncertainty). Someday I will want to do date math (e.g. "date granted minus date filed") but that is some ways off
  • Language, most importantly en, fr, de. Also significant: it, sv, ru, hu, and several others. The abbreviations can be the same as on Wikimedia.
  • Country: Most importantly: US, GB, FR, DE. Also IT, RU. There is some uncertainty about how to classify others in this period. I think I want to have AH for Austria-Hungary, but also AT and HU and other categories when more precise information is known. I would be willing to start with a known set of categories for countries now, and add richness for the countries of the historical period as time permits. Most wiki pages on any topic will refer to a country. More than one country can appear on wiki page.
  • Technology type. These overlap a lot, and a particular item in the wiki may have several techtype labels. But technologies include: LTA (=lighter-than-air), dirigible, airfoil, intrastructure, instruments, control, navigation, propulsion, helicopter, rocket, other ... and a dozen more. Technology type can appear on a patent or publication, or (less often) on a letter, club, or firm.

Tasks for Peter

  • make sure all firms are out of the inventor field into the applicants field. In the inventor field there will be a list of person-pages ; in the applicant field there will be a is a list of person-pages and organization-pages
  • names of long wiki text: section -- one or two for sources and notes
  • data types: integer float string section
  • things that are links to pages will be of type page
  • list of page, with delimiter specified – ; is fine
  • tech type can be a list-of-page type
  • Categories – don't set them. The item's type takes care of that.
  • what is the standard name of a patent? or a letter? etc
  • one table column must be called Page Title
  • he'll use Data Transfer extension to upload the data
  • cargo field <==> can be retrieved by queries
  • those queries can search by page title, not sure what else.
  • there will be one form per concept. it is not realistic to have several different forms for entering patent data