|[software] [catdoc] [tcl] [geography] [old things]|
Data modelGIS is a software system for processing spatial data. So, adequate model of spatial phenomena is most important thing for GIS.
It should provide way to represent spatial phenomena in computer memory, allow to perform desired operation on this representation and let user see the results in form, he used to. Ideally, GIS system should hide complicated issues of internal data storage from user as well as text processor hides questions of font rendering or kerning or SQL database hides actual file layout and search technologies, providing simple, but powerful relational operations instead.
Many modern GIS systems, especially vector based, like ARC/Info, try to represent map of spatial phenomena rather than spatial phenomena itself. It leads to overcomplication of storage format and processing algorithms, and makes user worry about such technical things as polygon topology, which are completely irrelevant to his problem (say geology or soil science), as font rendering hints and kerning is irrelevant to contents of article, typesetted with some partcular font. Maps are tool for analyse spatial data, widely used, but no more than tool. GIS system should deal with them, becouse it is neccesary to use existing data, which are represented on maps, and present results to user in understandable form of maps, but while processing data we should take into account properties of actual phenomena, rather then properties of chartographic representation like polygons.
Functional modelIn f(GIS) we use term layer to denote computer representation of spatial phenomena. We define layer as function which maps geographical coordinates to value of some property. Closest analogue of our layer is spatial variable in geostatistics.
Layer values can be either real numbers or elements of some finite sets. If you want to study more complicated spatial phenomena, it is better to describe it as set of layers rather then individual layer with structured value. Obvoisly you'll not need values of all attributes in question for all desired calculations, and separating them makes your actions more clear.
Layer classificationLayers can be classified by their area of definition and their set of values. By area of definition we can distinguish between:
Implementation of data modelSpatial phenomena seldom can be expressed by some mathematical equation. Even if they can, finding of this equation is usially aim of analysis, not a starting point. So, we need to store values of layers in any point they are defined. Raster is natural way to store data for two-dimensional layers.
(Raster is just big matrix of numeric values, stored in special format to reduce storage space. If raster is used in GIS processing, it should be known, how to find row and column numbers given real word coordinates and vice versa)
f(GIS) uses raster data format developed for EPPL7 GIS system. This format have several advantages - it is compressed and allows random access at the same time and it is able to deal with very fine resolution. For example Landscape map of exUSSR with spatial resolution (raster cell size) 500m and more than 3000 distinct kinds of landscapes occupies about 9MB of disk space. Due to such properties of data format, it is advisable to work with raster cell size significantly less then known accuracy of data. Resolution of maps can be compatible with resolution of your scanner and printer - modern processors are powerful enough to bear it, so raster doesn't mean loss of precession.
This data format is able to hold values in range 0..65535. While it is always sufficient for classification layers, it can look that for numeric layers it is better to use real numbers. But data always have finite accuracy, which is usially less than 1/65535 of total range, and even if we can take measurements with larger precession, we should take into account spatial variability within one raster cell.
For example, if we have map of relief of Russia with 500 meter cell, we need to represent range from -28 (Caspian coast) to 5642 (Elbrus) meters above sea level. Thus smallest usable unit is about 10 cm. Some points' altitude may be measured with more accuracy (for example, triangualtion points), but each raster cell represents 500x500 meters square which always would have more than 10cm of variability. Even if value of our layer should have more precession in some part of its range, we could use non-linear (for instance logarithmic) mapping of raster cell values to layer values.
But even with compression, raster files occupy significant storage space. So, we should avoid duplication of them if possible. Thus we introduce concept of reclass tables. Reclass table maps values of raster cell to another set of integer in arbitrary order. Don't mix reclass table with mapping function which is used for convert raster cell values to real units of numeric layer. For example if we have statistical data of populations by county and want to create population them as map, we can use reclass table over county map. Several counties with different names, which have distinct values in county map raster, can be mapped to same class in population density map if their population density is same.
Point layer is just list of triplets < X, Y, Value >. Typically point layer doesn't contain more than few thousands of points, so there is no need to optimize performance or storage space.
Natural storage form of one-dimensional layer is vector format. It is most questionable area in current fGIS design. There are a lot of advantages of EPPL7 vector format (compactness, speed of processing), but it have only one drawback, which overcomes them all - it can associate only one value with whole vector object (polyline). But if we are talking about the function, defined on set of lines, whe should be prepared that this function (stream depth for instance) would vary from one end of line to other.
It is also a question how intersections and joints of lines should be stored/interpreted, becouse most interesting network analysis algorithmes require ability to cross joints and intersections.
Regions and chartographic projectionStudy area usially have hierarchical structure. For example Russia can be subdivided to administrative regions, which consists of districts. United States consists of states, which are divided into counties. Often study is concerned only with one of such hierarchy levels, but there are opposite examples.
Each hierarchy level have its typical data accuracy (which is rough representation of map scale in GIS world, becouse GIS maps can be arbitrarily scaled, but only certain scale range make sense for particular data accuracy), chartographic projection (especially significant for large areas like whole country or continent). On thematic maps like soils or vegetation, different classifications can be used in different scales.
So, f(GIS) uses concept of regions. Region is set of layers, which cover almost same territory, have exactly same projection and simular spatial resolution. Regions can be nested, i.e. region of Russia can have several subregions of administrative regions, which have subregions of districts etc. In this case there should be base layer which have subregion names as values. When copiing data between regions f(GIS) authomatically performs neccessary projection and resolution conversion using base layer as reference. Classification conversion, if neccessary, should be performed by user, becouse it requires knowledge in problem area.
Program designf(GIS) is designed as set of extensions to Tcl programming language and set of independent utilities, which perform most time consuming raster and vector processing tasks. Thus long operations can be launched in background as separate while user continues to view/analyze data in main program.
From users point of view, fGIS is Tcl application which allows him to operate with set of layers from GUI as well as from Tcl command line. It is essential design constraing that there should be no operation, which can be performed from GUI, but couldn't be from Tcl script. There should be way to automate everything. Other way around is enusred by very nature of Tcl. Nothing prevent user, which have direct access to Tcl interpreter from creating new button or menu item and binding any Tcl command to it.
From programmers point of view, fGIS consists of several abstraction levels, all available for extension and modification. And I think that every fGIS user can eventually become programmer, if he discoveres need to implement some, just invented, data analysis algorithm, or customize graphical user interface to his needs. Relationship between fGIS abstraction levels is shown on this figure.
Layer as Tcl objectLayers in fGIS behave like objects in object-oriented programming language. Once created with layer command they become tcl commands itself (i.e. name of layer can be used as Tcl command), just like Tk widget. Options of layer command allow to manipulate properties of layer and store layer definition to file. This file is just Tcl script which creates neccessary subobjects and invokes appropriate command to create layer.
Layer have following properties
Planchet - object for displaying mapsAnother type of object which is essential for fGIS user is planchet. It is Tk widget like canvas (and actially derived from canvas) which has chartographic projection and real-world coordinates. It is used for displaying layers and picking points on them. Becouse it has real-world coordinates and physical size on the screen, it always knows its scale. When scale is changed (via zoom or window resize operation), all layers currently displayed on planchet are redrawn appropriately.
Planchet also have look feature. If right mouse button is pressed on some point in planchet, it displays values of several layers in this point in pop-up window.
There can be also "friend widgets" like status line which display current coordinates if mouse is over planchet or zoom/unzoom buttons which change its state depending of current state of planchet.
Drawing modes for raster layerf(GIS) supports three drawing modes for raster layers - color, pattern and symbol mode.
Low level objectsThere are additional objects like rasters, palettes and pattern sets. But user seldom need to operate on them directly. They are primarily for developers of new layer types.
GIS operationGIS operation like calculationg buffer zones or computing new layer from several existing are performed by separate utilities running in background. For user convinience there are tcl procedures which take one or more layer names as arguments and call appropriate utility.
Example of such procedure is interregion copy command, which tooks layer name and name of target region, determines projections and calls projection conversion program.
UtilitiesGIS processing utilities are more general than fGIS. They use just data files and user-supplied arguments. So they can be used separately from fGIS, for example by users of EPPL7 GIS. Utilities are designed for batch environment, so they use exit codes to report status and stdin/stdout to recieve and return values which are not fit in command line. Important concept of these utilities is that user shouldn't worry about raster cell size. All utilites which operate on several raster files are able to deal with files with different cell sizes as long as there is non-empty intersection in terms of real-world coordinates.
Data access libraryBoth low-level Tcl objects (rasters, vectors) and utilites use common C library to access data files. This library provides appropriately high-level framework for those who want implement own data analysis algorithmes. For example it includes iterator routines, which recieve user-written function and open raster file and perform this function on every cell of given file. While library operates primarily in terms of raster cells (which can be important for cellular automata algorithmes, which need to distinguish between ``this cell'' and ``neighbouring cell'') it provides ways to process files with different cell sizes simulateously.