Diff for ObjectSpaces' guide

Tue, 10/26/2010 - 19:41 by Gwenael CasaccioMon, 12/20/2010 - 10:25 by Gwenael Casaccio
Changes to Body
Line 16 Line 16
spaces as well as access to reflective features. spaces as well as access to reflective features.
-== Security issues in dynamic and reflective languages == +== ObjectSpaces ==
-With dynamic and reflective languages like Smalltalk, Python, or Ruby, it is possible to +- \OSes are groups of objects. \OSes are organized in a n-tree thus an \OS has one parent and zero or more children, only the root \OS has no parent.
-inspect any objects in the memory, and modify them on the fly +- An object belongs to one and only one \OS.
-\cite{Foot89a,Riva96b,Duca99a}. In this section, we present the security issues raised +- Intra \OSes messages are normal messages sent, while messages between objects residing in different spaces are intercepted and controlled.
-by reflective features. +- Each \OS can have its own kernel of classes and reflective features use are restricted at the granularity of the \OS.
-Using reflection allows very powerful features such as on the fly debugging, simpler +Since each object belongs to an ObjectSpace we've added an hidden ivar, this is not the most
-tools building, new language semantics \cite{Riva96b}, proxies \cite{Benn87a}. However +efficient way to store the OS information. A nice way is to have an ObjectMemory for each ObjectSpaces.
-this power comes with a price: we can get all the objects in the memory, +Process class is extended with a new instance variable : objectSpace.
-access their instance variables and modify them. However, reflection was not designed in a +
-moment where security was a major concern and as such reflection introduces some severe security +
-challenges. Mirrors \cite{Brac04a} separates the base from the meta level and tries to propose an +
-architecture where reflective objects called mirrors implement the behavior of reflective features, +
-our implementation uses mirrors as explained in +
-Rivard \cite{Riva96b} gives a deep analysis of reflective mechanisms available in +Each newly created object will belong to the process OS. We don't share objects (now we share the classes
-Smalltalk. Here we present some of the key problems raised, and stress some requirements +but we shouldn't: a method change or class change shouldn't impact the other objectSpaces).
-towards a more reflective but secure system. +The reflection is restricted to the space of the process, we cannot break the object encapsulations or enumerate the
 +full memory.
-== Object manipulation == +Communications between two ObjectSpaces is done with sharedQueue in an async mode or can be sync. All the arguments
- +and results are deep copied and never shared (lazy copy is a possible optimization).
-When an attacker gets access to an object, he can manipulate the object: send message to +
-it using \ct{perform:} or inspect object structure but as well modify its instance +
-variable contents using \ct{instVarAt:} and \ct{instVarAt:put:} and breaks the +
-encapsulation. +
- +
-For example: +
-<pre> +
- Object instVarAt: 1 put: Object +
-</pre> +
- +
-If we create a new instance of \ct{Object} and send a message that doesn't exist that +
-will do an infinite lookup in the virtual machine or can crash the \vm. The same will +
-happens if you replace the method dictionary, or a compiled method. +
- +
-\emph{Object manipulation is a powerful mechanism but shouldn't be applied to all the +
-objects. Some objects should only be manipulated by trusted objects} +
- +
-== Reference manipulation == +
- +
-References can be swapped using the primitives \ct{become:} or \ct{becomeForward:}. The +
-expression \ct{o1 become: o2} swaps the references to \ct{o1} to point to \ct{o2}. +
- +
-Again we clearly see the main problem comes from the lack of boundaries between objects. +
-An object shouldn't have the possibility to manipulate other objects that are not in the +
-same cluster. +
- +
-For example: +
-<pre> +
- smallArray become: biggerArray +
-</pre> +
- +
-<pre> +
- thisContext become: nil. +
-</pre> +
- +
-In the first example we 're doing a migration of an array with a new sized array. Which +
-show us that become is really powerful for objects migration. But in the second example we +
-can see how dangerous is the become primitive. +
- +
-\emph{Reference manipulation should only be allowed on some rare circumstances and used +
-by trusted objects.} +
- +
-== Memory iteration == +
- +
-As mentioned earlier, reflection gives the ability to a user to inspect the +
-whole memory and grab all the objects. (In Smalltalk messages such +
-\ct{nextObject} return the next object directly accessible in the memory. With that +
-message it is possible to implement the method \ct{allInstances} which returns all the +
-instances of a class. This is a current restriction in dynamic languages, objects have no +
-boundaries. We don't know who is the owner of the object and thus anybody has access to +
-any object. The \ct{allOwners} message gives all the owners of the receiver object. +
- +
-We should restrict memory access to the same cluster of objects. +
-Accessing all living objects is a trap and a mistake. This implies that an object +
-shouldn't be able to get all the instances of a class outside its cluster. Clustering +
-objects allow us to restrict the communication between these clusters: a cluster can talk +
-or not to another cluster. For instance in Smalltalk we don't have the idea of processes, +
-that means we could use the reflection to change the objects from another process. +
-Isolate a process against to other processes is needed. +
- +
-<pre> +
- BankAccount allInstances +
-</pre> +
- +
-With that small example, we have access to all of the accounts of the bank, and by using +
-the reflection messages seen in the previous section. We can change any of them. Here we +
-clearly see the lacks of isolation in dynamic languages. +
- +
-\emph{A runtime should provide ways to isolate cluster to objects. An object should not +
-be allowed to access all the objects in memory.} +
- +
-== Changing object behavior == +
- +
-A class can be changed and thus an attacker can spy all the message sends \cite{Duca99a}. +
-Using various means it is possible to trap and change all the methods of a class. +
- +
-\emph{Controlling the definition of new method in a class is a key points} +
- +
- +
-== Classes == +
- +
-A class is an object which format is well known by the virtual machine. For instance, in +
-smalltalk, a class stores its superclass the method dictionary, the instance +
-specification. An attacker can change the superclass by any object. When the \vm will do +
-the look up it will crash. Classes couldn't only be changed by a safe class builder. +
- +
-Other problems can come from the metaclasses, they have access to the instance variables +
-of the superclasses (since in Smalltalk the inheritance for the instance variable is +
-protected) and so the method dictionary instance, the superclass, or the instance +
-variables array. +
- +
-The only way to protect the system against bad well formed classes, is to provide a safe +
-class builder. That tool will be the only one to change classes, it works with safe +
-classes and will produce safe classes. Metaclasses should be protected to the +
-instance variable inheritance behavior should be changed to private, a metaclass couldn't +
-access to the method dictionary or the instance specification. A safe class builder +
-should also protect the system from bad changes, because some objects format are known by +
-the \vm, such as compiled method, block closure, class. Changing their format would break +
-the \vm. +
- +
-<pre> +
- Foo class >> new +
- +
- methodDictionary := 123. +
- superClass := Bar. +
- ^ super new +
-</pre> +
- +
-In that small example we see the metaclass is really doing bad stuffs with the method +
-dictionary instance variable. +
- +
-\emph{Controlling classes and metaclasses manipulation is important.} +
- +
-== Method dictionary and compiled code == +
- +
-Method dictionary and compiled code (method or blocks) like classes are special objects +
-because their format is known by the \vm. Changing a compiled code and inserting an +
-object in a method dictionary that is not a compiled code will break the virtual machine. +
-A compiled code should only be produced by a safe compiler, only it could changes the +
-method dictionary to insert a compiled code. A safe compiler is needed because a bad +
-compiled code could have unbalanced stack, and produce a stack overflow, or also have bad +
-bytecode. +
- +
-<pre> +
- (String>>#,) at: 1 put: 255 +
-</pre> +
- +
-This code will replace the first bytecode by an undefined bytecode. +
- +
-\emph{Controlling compiled methods manipulation is important.} +
- +
-== Global state == +
- +
-All the global state should be removed \gcnote{add a ref why gs sucks probably from E}, +
-in smalltalk when a method access to a class, the compiler do a lookup in the environment +
-and creates a class binding. +
- +
-There are some others security issues with globals, it is possible to change the global +
-for example Object := nil is possible in Smalltalk, unfortunately it will break the +
-\vm. Another problem is the classes are hard linked to the code, if we want to remove +
-the access to a class we cannot do that. That's why we need a dynamic lookup for the +
-classes. +
- +
-The shared pools are kind of globals but for the classes and the metaclasses, and the +
-class variables are global instance variables for the metaclasses and sub metaclasses. +
-This is really unsecure if a metaclass A define a class variable \#foo all the sub +
-metaclasses have access to the content and can change the class variable. +
- +
-<pre> +
- Object := nil. +
-</pre> +
- +
-<pre> +
- Bar >> removeAccounts +
- Accounts remove: #Java +
-</pre> +
- +
-In those two snippets we see the danger of the global state, we have no control on the +
-accessors. Global state should be suppress that could be done automatically with a +
-rewrite tool. +
- +
-\emph{We have no control over global state and should remove it.} +
- +
-== Stack execution == +
- +
-In Smalltalk or Ruby it is possible to retrieve the current execution context. Such +
-stack reification is really powerful since it is used to implement the complete exception +
-handling mechanism from within the language. All the debugger runtime (i.e. displaying +
-current method name, restarting on the fly modified methods now called hot debugging), is +
-also based on the interpretation of such information. By using context we can change +
-temporary variables, message receiver, stack pointer. +
-In addition it supports continuation manipulation and is the basis for advanced dynamic +
-web application framework \cite{IEEESeasidePaper}] +
- +
-Context manipulation should be done in a careful way, because it is possible to get the +
-stack of the execution and in dynamic languages we don't have a separation between the +
-kernel area and the user area. When a user of the context iterate over it, we should +
-check if it has the right to access some part of the context - that can be kind of system +
-calls. +
- +
-\emph{Only privileged objects should have the right to get access to contexts and +
-manipulate them.} +
- +
-== Virtual machine primitives == +
- +
-Primitives are functions that cannot be expressed in the language paradigm or are +
-optimized version of frequently used methods. For example, accessing the element of an +
-object is a primitive which from the language calls the C underlying implementation. Point +
-creation is a primitive for speed reason even though they can be expressed at the OOP +
-level. Primitives are used to call virtual machine function: they allow the creation of +
-new instances, implementation of reflection, saving or manipulating the content of the +
-memory. But they also provide ways to communicate with the operating system: +
-manipulating files, loading dynamic libraries and calling their functions. +
- +
-While they offer a lot of interesting possibilities, they are also potentially very +
-dangerous. Anybody can load dynamically any shared libraries accessible in the system of +
-the user. And thus can call any of their functions. There is also another risk +
-with the virtual machine primitives; when we can access the memory, launch the garbage +
-collector - the risk here is to use a lot of the virtual machine CPU resources. +
- +
-The primitives should only be called by a \vm primitives class like that we could +
-create a capability and restrict the primitives. +
- +
-\emph{Primitive calls should be restricted. For example a plugin shouldn't have the +
-ability to launch a garbage collector run or to load dynamically new shared libraries.} +
- +
-== Garbage collection == +
- +
-It exists multiples primitives that allow to interact with the garbage collection, some +
-primitives allow to execute a garbage collection, or to change the garbage collector +
-parameters. +
- +
-But others primitives allow to change the generation of the object, or to make it fixed +
-it won't move across the garbage collections. Finally, when an object is garbage it will +
-call the finalize method on it, this is also trend to security issues if the object inject +
-it self inside another object. This lead to an undefined behavior for the \vm. +
- +
-<pre> +
- Foo >> finalize +
- +
- Smalltalk at: #DeadObject put: self +
-</pre> +
- +
-This code will add a reference to the dead object inside the Smalltalk namespace. If +
-someone access to it the \vm will access to a wrong and bad formed object. +
- +
-== Resources management == +
- +
-An application allocates memory, opens files, uses sockets, and CPU. An attacker can +
-allocate too much memory and trying to do a denial of services. By doing this attack the +
-virtual machine won't run as efficiently as it should be, or in the worst case the virtual +
-machine will crash due to the lack of memory for example. A way to prevent from this kind +
-of attack is to limit the resource usage. The memory footprint of a program will be +
-restricted to an amount of memory, if the program tries to overpass that limitation it +
-will be automatically killed. In the ObjectSpaces we have implemented a few resources +
-management but we won't present them in this paper. +
== Modules == == Modules ==

Revision of Mon, 12/20/2010 - 10:25:

ObjectSpaces' guide

Introduction

ObjectSpaces are group of objects with a global environment and their own memory area.
ObjectSpaces are organized in a n-tree thus an ObjectSpace has one parent and zero or ore children.

An object belongs to one and only one ObjectSpace. The reflection is restricted to the current ObjectSpace an object couldn't inject an object in an instance variable to a external object, or swap references of two objects that don't belong to the current ObjectSpace

Security

Applications use more and more external plugins to add new functionalities to them. A
plugin is added inside an application and accesses all of the application objects. This is
a severe security problem. However, in a dynamically typed language and in presence of
reflective features, such approaches cannot deliver their promises. In this paper we
present ObjectSpaces. An ObjectSpace is a space containing objects with restricted reflection.
ObjectSpaces provide a natural way to scope objects and control communication between object
spaces as well as access to reflective features.

ObjectSpaces

- \OSes are groups of objects. \OSes are organized in a n-tree thus an \OS has one parent and zero or more children, only the root \OS has no parent.
- An object belongs to one and only one \OS.
- Intra \OSes messages are normal messages sent, while messages between objects residing in different spaces are intercepted and controlled.
- Each \OS can have its own kernel of classes and reflective features use are restricted at the granularity of the \OS.

Since each object belongs to an ObjectSpace we've added an hidden ivar, this is not the most
efficient way to store the OS information. A nice way is to have an ObjectMemory for each ObjectSpaces.
Process class is extended with a new instance variable : objectSpace.

Each newly created object will belong to the process OS. We don't share objects (now we share the classes
but we shouldn't: a method change or class change shouldn't impact the other objectSpaces).
The reflection is restricted to the space of the process, we cannot break the object encapsulations or enumerate the
full memory.

Communications between two ObjectSpaces is done with sharedQueue in an async mode or can be sync. All the arguments
and results are deep copied and never shared (lazy copy is a possible optimization).

Modules

Abstract of modules papers

Modules: Encapsulating Behavior in Smalltalk

This article proposes a new view of modules - this is done in Smalltalk. Modules are a way to control the visibility of shared names. Modules also provide a way to hide the detailed collaborations among a group of Smalltalk classes. Modules can also be used to safely extend existing baseline classes.

Smalltalk can encapsulate the state of their instances, they don't encapsulate the behavior of their instances. Classes are globals in the Smalltalk system dictionary, they are all visible to all other classes.

Definition of Modules

Allen Wirfs-Brock and Brian Wilkerson in Modular Smalltalk describe features of modules:

Modules are program units that manage the visibility and accessibility of names...

A module typically groups a set of classes definitions and objects to implement some service or abstraction. A module will frequently be the unit of division responsibility within a programming team...

A module provides an independent naming environment that is separate from other modules within the program...

Modules support team engineering by providing isolated name...

In Modular Smalltalk modules are not first-class objects. It uses modules only for organizational purposes.

Modules for Smalltalk

The way to create classes in Smalltalk:

Object subclass: #File
  instanceVariableNames: 'directory fileId name'
  classVariableNames: 'PageSize'
  poolDictionaries: 'CharacterConstants'

They create a new module:

Object moduleSubclass: #InventoryManager
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''

Add a private class in the domain:

Object subclass: #InventoryItem in: #InventoryManager
  instanceVariableNames: 'partNumber partName quantity'
  classVariableNames: ''
  poolDictionaries: ''

Encapsulating Private Behavior

Modules provides 3 ways of encapsulating private behavior:

  • Class groups
  • Baseline Class Extensions
  • Private Methods
Extending Baseline Smalltalk Classes

Modules provide a safe way to extend and package changes to the baseline classes in the Smalltalk system domain.

Object moduleSubclass: #ModuleA
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''

Object subclass: #SubclassB in: #ModuleA
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''

String subclass: #String in: #ModuleA
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''

The private String class extensions are visible to methods in both ModuleA and SubclassB, but not to classes outside of ModuleA in the Smalltalk system domain.

One drawback in that example the compiler create literals using the baseline classes: SmallInteger, String, Float, Symbol and Array.

Encapsulating Private Methods

Modules can be used to hide the private methods of a class. To do this, a public module is created with the public interface and the private method are hidden inside a private class inside the public domain.

Object moduleSubclass: #ClassA
  instanceVariableNames: 'privateSelf'
  classVariableNames: ''
  poolDictionaries: ''

Object subclass: #ClassA in: #ClassA
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''

The ClassA module has a single instance variable: privateSelf. This is a simple proxy that hides the private behavior of the module class, all the messages are delegated to privateSelf.

Adding Modules to Smalltalk

A module class uses a ModuleDictionary for its domain, it is similar to the SystemDictionary class. Each class contained inside a module domain is associated with an EncapsulatedMetaClass rather than a MetaClass, it stores a reference to the module.

Resolving Shared Names

The visibility of the names depends on where they are located in the system. Shared names can be found in class variable pools, global pool dictionaries, and the Smalltalk system dictionary. During the method compilation, references to shared names are resolved by searching dictionaries in the following order:

class variable pool of the class and its superclasses.
pool dictionaries to which the class subscribes in the modul domains enclosing to the Smalltalk system domain.
Module domains enclosing the class up through the Smaltalk system domain

Breaking and Enforcing Module Encapsulation

Modules enclose and encapsulate their private classes, the programming tools need a way to break the encapsulation of the module to create a new class inside the module.

The method #doesNotUnderstand: aMessage to see if it is the name of a private class inside the module. This service breaks the encapsulation but it is needed by the compiler and development tools.

In order to enforce the encapsulation of a module, it can be close:

ModuleA closeModule

Module Interfaces

A module can provide an access to a group of private classes, by providing an accessing message as a part of the public interface.

Class Naming and Privacy in Smalltalk

Abstract

Smalltalk lacks mechanisms for defining private classes and private methods. Without private classes, class naming conflicts can occur. Without private methods, encapsulation suffers. While global name spaces can help resolve class naming conflicts, first-class subsystems with private classes can resolve both problems.

The Name Space problem

Classes are globals in Smalltalk, they are all visible to all other classes, this is excessive.

SomeSuperclass subsystem: #SomeSubsystem
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''

AnotherSuperclass subsystem: #SomePrivateClass in: #SomeSubsystem
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''

SomeSubsystem @ #SomePrivateClass subclass: #SomePrivateSubclass
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''

SomeSubsystem @ #SomePrivateClass subsystem: #SomePrivateSubsystem
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''

The binary symbol @ is used as a scope resolution operator.

Private Methods and Client Contracts

In Smalltalk all the methods of a class are effectively public. Smalltalk developer indicates that a method is private by setting the category as private.

  Access    Implied Client Specification
  Private   Only the implementing class
  Protected The implementing class and all derived classes
  Promised  Some specific collaborating class
  Public    Any class

The classes that collaborate closely with a subsystem often exhibit promised behavior. Thus, it would be advantageous to object system designer if object languages incorporated and enforced access mechanisms based on client specification to establish such formal contracts. Private, protected and public access can be conceived as specific kinds of promised contracts in the following manner.

  Access                 Equivalent Contract
  ServerClass private    ServerClass promisedTo: ServerClass only
  ServerClass protected  ServerClass promisedTo: ServerClass any
  ServerClass public     ServerClass promisedTo: nil

nil is used because all the root classes are derived from nil.

Private Methods and Client Contracts

This is mostly the same as the Modules: Encapsulating Behavior in Smalltalk paper. A public module with an instance variable privateSelf and an inner private class. The module provide interface like a proxy to the real instance.

Bibliography

  1. Class naming and privacy in smalltalk www.educery.com/papers/subsys/subsys.htm
  2. Modules: Encapsulating behavior in smalltalk http://www.educery.com/papers/modules/

User login