linux versions:
The software similarity tester SIM version 2.12 was used for source code comparison.
Since minix is distributed as a full OS, whereas linux is only a kernel, comparisons were restricted to the kernel portion of minix; however, source code line counts are provided for both the minix kernel and the entire minix distribution.
| OS | lines of c | lines of c and assembly | lines of c and assembly, counting blank lines |
| linux-0.01 | 7574 | 8933 | 9877 |
| linux-0.11 | 10232 | 11453 | 12666 |
| linux-0.12 | 14059 | 15420 | 16914 |
| linux-0.96c | 29418 | 29719 | 32943 |
| minix-1.1 (kernel) | 11386 | 12780 | 14886 |
| minix-1.1 (whole) | 26960 | 29171 | 33670 |
| minix-1.2 (kernel) | 10800 | 36007 | 37894 |
| minix-1.2 (whole) | 19052 | 59431 | 62520 |
line count of (and links to) raw comparison data:
| VS | minix-1.1 | minix-1.2 |
| linux-0.01 | 1503 | 1496 |
| linux-0.11 | 1673 | 1666 |
| linux-0.12 | 1888 | 1881 |
| linux-0.96c | 6318 | 6310 |
| VS | linux-0.01 | linux-0.11 | linux-0.12 | linux-0.96c |
| minix-1.1 | 1804 | 1804 | 1870 | 2500 |
| minix-1.2 | 2014 | 2014 | 2073 | 2810 |
comparison analysis:
The raw comparison files are very large, but mostly full of false
positives. This is due to the way SIM handles lists of constants and SIM's
inability to distinguish between function calls and certain elements of
syntax.
Only 4 actual similarities were found. They are excerpted in whole,
with reference to the respective source files, and discussed. Since the
similar code sections are fairly invariant over all versions of minix and
linux compared, excerpts will be taken from linux-0.96c and minix-1.2.
#define _U 0x01 /* upper */ #define _L 0x02 /* lower */ #define _D 0x04 /* digit */ #define _C 0x08 /* cntrl */ #define _P 0x10 /* punct */ #define _S 0x20 /* white space (space/lf/tab) */ #define _X 0x40 /* hex digit */ #define _SP 0x80 /* hard space (0x20) */ #define isalnum(c) ((_ctype+1)[c]&(_U|_L|_D)) #define isalpha(c) ((_ctype+1)[c]&(_U|_L)) #define iscntrl(c) ((_ctype+1)[c]&(_C)) #define isdigit(c) ((_ctype+1)[c]&(_D)) #define isgraph(c) ((_ctype+1)[c]&(_P|_U|_L|_D)) #define islower(c) ((_ctype+1)[c]&(_L)) #define isprint(c) ((_ctype+1)[c]&(_P|_U|_L|_D|_SP)) #define ispunct(c) ((_ctype+1)[c]&(_P)) #define isspace(c) ((_ctype+1)[c]&(_S)) #define isupper(c) ((_ctype+1)[c]&(_U)) #define isxdigit(c) ((_ctype+1)[c]&(_D|_X))
#define _U 0001 #define _L 0002 #define _N 0004 #define _S 0010 #define _P 0020 #define _C 0040 #define _X 0100 #define isalpha(c) ((_ctype_+1)[c]&(_U|_L)) #define isupper(c) ((_ctype_+1)[c]&_U) #define islower(c) ((_ctype_+1)[c]&_L) #define isdigit(c) ((_ctype_+1)[c]&_N) #define isxdigit(c) ((_ctype_+1)[c]&(_N|_X)) #define isspace(c) ((_ctype_+1)[c]&_S) #define ispunct(c) ((_ctype_+1)[c]&_P) #define isalnum(c) ((_ctype_+1)[c]&(_U|_L|_N)) #define isprint(c) ((_ctype_+1)[c]&(_P|_U|_L|_N)) #define iscntrl(c) ((_ctype_+1)[c]&_C) #define isascii(c) ((unsigned)(c)<=0177)
These are the 'character type' macros. They predate both minix and linux, and are a part of the majority of C libraries. They are specified in the ANSI C standard (ANSI X3.159-1989), and are referred to in most C textbooks (i.e. "C++ How to Program" H. M. Deitel, P. J. Deitel --2nd ed. ISBN 0-13-528910-6).
#define S_IFMT 00170000 #define S_IFSOCK 0140000 #define S_IFLNK 0120000 #define S_IFREG 0100000 #define S_IFBLK 0060000 #define S_IFDIR 0040000 #define S_IFCHR 0020000 #define S_IFIFO 0010000 #define S_ISUID 0004000 #define S_ISGID 0002000 #define S_ISVTX 0001000
#define S_IFMT 0170000 /* type of file */ #define S_IFDIR 0040000 /* directory */ #define S_IFCHR 0020000 /* character special */ #define S_IFBLK 0060000 /* block special */ #define S_IFREG 0100000 /* regular */ #define S_ISUID 04000 /* set user id on execution */ #define S_ISGID 02000 /* set group id on execution */ #define S_ISVTX 01000 /* save swapped text even after use */ #define S_IREAD 00400 /* read permission, owner */ #define S_IWRITE 00200 /* write permission, owner */ #define S_IEXEC 00100 /* execute/search permission, owner */
Both the names and values of these constants are specified by the POSIX standard.
switch (origin) {
case 0:
tmp = offset;
break;
case 1:
tmp = file->f_pos + offset;
break;
case 2:
if (!file->f_inode)
return -EINVAL;
tmp = file->f_inode->i_size + offset;
break;
}
if (tmp < 0)
return -EINVAL;
file->f_pos = tmp;
switch(whence) {
case 0: pos = offset; break;
case 1: pos = rfilp->filp_pos + offset; break;
case 2: pos = rfilp->filp_ino->i_size + offset; break;
default: return(EINVAL);
}
if (pos < (file_pos) 0) return(EINVAL);
rfilp->filp_ino->i_seek = ISEEK; /* inhibit read ahead */
rfilp->filp_pos = pos;
The behavior of the lseek system call is specified by POSIX. Since it is so simple, practically all implementations will be highly similar.
s->s_imap[0]->b_data[0] |= 1;
s->s_zmap[0]->b_data[0] |= 1;
sp->s_imap[0]->b_int[0] |= 3; /* inodes 0, 1 busy */ sp->s_zmap[0]->b_int[0] |= 1; /* zone 0 busy */
This operation is required in order to correctly mount the minix file system. All implementations would need this or equivalent code.
Since, out of thousand of lines of code, only 4 small segments were found to be similar, and since in each case the similarity was required by external factors (the C standard, the POSIX standard, the minix filesystem format), it is highly unlikely that any source code was copied either from minix to linux or vice-versa.